jam-cloud/agent-tasks/client-simulator/webrtc-web-client-plan/progress.md

93 lines
5.2 KiB
Markdown

# WebRTC Web Client Plan Progress
## Goal
Implement browser-native WebRTC media for "web client mode" while preserving the existing frontend/backend control-plane behavior (bridge call sequencing, REST patterns, websocket message flow, and session state transitions) as closely as possible to native.
## Baseline Artifacts
- Native 2-party references:
- `web/ai/native-client-2p-recording/20260228-210957-seth-native-2p-mute-volume.log`
- `web/ai/native-client-2p-recording/20260228-211006-david-native-2p-mute-volume.log`
- High-signal baseline from these logs:
- `join-dedupe.create` once and `join-dedupe.release` once per join.
- one `POST /api/sessions/:id/participants` per client join.
- repeated `PUT /api/sessions/:id/tracks` during mute/volume operations.
- heavy `jamClient` polling/queries (`getConnectionDetail`, `SessionGetAllControlState`, `SessionSetControlState`, `FTUEGetChannels`, `P2PMessageReceived`) that inform adapter contract priorities.
## Compatibility Contract (for web-client mode)
- Preserve:
- action-level call sequence (`SessionActions` -> `SessionStore` -> rest/websocket/adapter)
- REST endpoint/method cadence for session join + track control
- websocket gateway usage and event routing pattern
- participant/session state behavior in frontend stores
- Allow controlled divergence:
- bridge methods that are native-only may return `null`/stub values in web mode if callers are guarded by `isWebClient`/`isNativeClient` aware logic.
- media transport and signaling payload internals may differ for web-only peers.
## Execution Plan
### 1) Contract Matrix (bridge + REST + websocket) [done]
- Build a method matrix from recordings:
- Tier A (must implement behavior): session join/leave, participant lifecycle, mute/volume/control-state methods, callback registration methods used in session runtime.
- Tier B (must exist, can be stubbed): profile/device/vst/native diagnostics that do not affect core web-to-web media.
- Tier C (deferred): legacy recording/VST branches being deprecated.
- Output: `web/ai/native-client-2p-recording/contract-matrix.md`.
### 2) WebJamClient Core (control plane parity)
- Expand `WebJamClient` to provide Tier A method behavior with deterministic return shapes.
- Maintain adapter-level instrumentation in `jamClientAdapter` for parity diffing.
- Keep method names and call sites unchanged wherever possible.
### 3) WebRTC Session Media Engine (data plane)
- Add web-client media manager:
- local mic capture lifecycle
- peer connection lifecycle keyed by participant `client_id`
- mute/volume mapping between UI track state and WebRTC tracks/gain nodes
- VU meter signal path for local/remote tracks
- Use existing gateway/P2P channel pattern for signaling envelopes in web mode.
### 4) Track-Control Parity
- Ensure UI interactions (self mute, other mute, self volume, other volume) continue to drive:
- same store/action flow
- same `PUT /api/sessions/:id/tracks` pattern
- same session refresh behavior (`track_changes_counter` driven updates)
### 5) Automated Compatibility Assertions
- Add a test harness that compares captured run artifacts against compatibility rules:
- join count and participant POST cardinality
- REST endpoint/method sequence constraints (tolerant matching where needed)
- required bridge method call presence/order windows for core flows
- required websocket event classes
- Keep this as "pattern parity" (not strict payload byte-equality).
### 6) End-to-End Validation Runs
- Run 2-party web-client scenario (same script as native baseline):
- self mute/unmute
- other mute/unmute
- self volume min/max
- other volume min/max
- Compare artifacts and close gaps before widening feature scope.
## Status
- [x] Baseline native 2-party logs captured and reviewed
- [x] Contract matrix drafted (Tier A/B/C)
- [~] WebJamClient Tier A behavior implemented (initial control-state + session/participant + p2p/webrtc signaling scaffolding in place)
- [ ] WebRTC peer/media manager integrated
- [ ] Track control parity verified against baseline patterns
- [ ] Compatibility assertions added
- [ ] 2-party web-client parity run completed
## 2026-03-01 Implementation Update
- Implemented first Tier A `WebJamClient` pass in `web/app/assets/javascripts/webJamClient.js`:
- Added control-state cache for `SessionGetAllControlState(master|personal)` seeded from delegate then locally updated.
- Added `SessionSetControlState` local mutation from `trackVolumeObject` and broadcast-aware `MixerActions.syncTracks()` scheduling to preserve REST `/tracks` flow in web mode.
- Added session/participant lifecycle state updates in:
- `UpdateSessionInfo`
- `ParticipantJoined` / `ParticipantLeft`
- `ClientJoinedSession` / `ClientLeftSession`
- Added initial WebRTC signaling scaffolding (feature-flagged):
- peer connection registry
- offer/answer/candidate handling over existing websocket-gateway P2P path (`JK.JamServer.sendP2PMessage`)
- inbound signal handling in `P2PMessageReceived`
- local mic stream bootstrap for web-client WebRTC mode.
- This is intentionally incremental; next step is validating 2-party web-client flow and closing gaps in remote audio rendering/mix handling.