Tags: coder/mux
Tags
🤖 feat: add per-workspace heartbeat messages (#3105) Summary Add a per-workspace heartbeat message override to the existing Configure heartbeat modal and use it when backend heartbeats run. Background Workspace heartbeats were already persisted per workspace, but they always used a fixed default prompt body. This change keeps the existing Workspace Heartbeats experiment gate while letting each workspace optionally customize the heartbeat instruction body without affecting other workspaces. Implementation - extended the workspace heartbeat schema/router payload with an optional `message` - normalized whitespace-only values to clear the override and fall back to the default body - updated `WorkspaceHeartbeatModal` and `useWorkspaceHeartbeat` to edit, restore, and clear the workspace-scoped message - composed heartbeat prompts from a fixed idle-duration lead-in plus either the saved custom body or the default body - preserved stored heartbeat messages when `/heartbeat` commands only change cadence or disable the feature - added focused modal, slash-command, and backend heartbeat tests Validation - `make static-check` - `make test` - `bun test src/node/services/heartbeatService.test.ts src/browser/utils/chatCommands.test.ts src/browser/components/WorkspaceHeartbeatModal/WorkspaceHeartbeatModal.test.tsx` - Dogfooded the modal flow in `make dev-server-sandbox`, including persistence across reopen and no cross-workspace bleed; captured screenshots and a local WebM recording during the run Risks The main regression risk is compatibility between the modal flow, backend heartbeat execution, and slash-command writes. The change stays within the existing workspace heartbeat model and adds tests for fallback/custom prompt composition plus message preservation across `/heartbeat` updates. --- <details> <summary>📋 Implementation Plan</summary> # Plan: per-workspace configurable heartbeat message ## Goal Add a **per-workspace** configurable heartbeat message through the existing **Configure heartbeat** modal, while keeping the existing **Workspace Heartbeats** experiment as the feature gate only. ## Recommended approach **Option A (selected): extend the existing workspace-scoped heartbeat flow** — **net +70 to +110 product LoC**. - Keep `Settings -> Experiments -> Workspace Heartbeats` as the on/off gate. - Do **not** add a global textbox in `ExperimentsSection`; that screen is app-scoped and intentionally lacks selected-workspace context. - Add an optional per-workspace heartbeat message field to the existing heartbeat settings model and modal. - Treat the configured value as the **instruction body** for the heartbeat, not a raw full prompt template: - keep the fixed `[Heartbeat]` prefix - keep the computed idle-duration sentence - append either the custom message body or the existing default instruction body - Blank/whitespace input clears the override and falls back to the built-in default message. ### Scope decisions - **In scope:** per-workspace persistence, workspace modal UI, backend prompt selection, compatibility updates for existing `/heartbeat` command paths. - **Out of scope:** global experiment-level textbox, placeholder/template syntax such as `{idleDuration}`, scheduling changes, new docs pages. <details> <summary>Why this matches current repo patterns</summary> - `src/common/orpc/schemas/workspace.ts` already stores heartbeat settings as workspace metadata. - `src/browser/components/WorkspaceHeartbeatModal/WorkspaceHeartbeatModal.tsx` is the existing per-workspace edit surface. - `src/browser/features/Settings/Sections/ExperimentsSection.tsx` supports nested controls for app-scoped experiments (for example, configurable bind URL), but that pattern is a poor fit for workspace-specific data because the settings route does not carry selected-workspace context. - `src/node/services/workspaceService.ts#executeHeartbeat()` currently builds the heartbeat prompt in one place, so prompt selection can stay centralized. </details> ## Implementation plan ### Phase 1 — Extend the heartbeat settings shape and persistence **Files / symbols** - `src/common/orpc/schemas/workspace.ts` - `WorkspaceHeartbeatSettingsSchema` - `src/common/orpc/schemas/api.ts` - `workspace.heartbeat.get` - `workspace.heartbeat.set` - `src/node/orpc/router.ts` - `workspace.heartbeat.set` - `src/node/services/workspaceService.ts` - `setHeartbeatSettings()` - `getHeartbeatSettings()` **Changes** - Add an optional `message` field to `WorkspaceHeartbeatSettingsSchema` with a conservative max length (recommend **1000 chars** for v1). - Thread that field through the heartbeat ORPC schema and router so `input.message` reaches `setHeartbeatSettings()`. - In `setHeartbeatSettings()`: - assert that `message` is either absent or a string - trim it - normalize empty/whitespace-only values to `undefined` - include `message` in the persisted object and in change detection - Preserve current backward compatibility: old workspaces without `message` continue to read as default/fallback behavior. **Defensive-programming notes** - Keep explicit assertions next to the existing `enabled` / `intervalMs` assertions. - Prefer a normalized local variable before constructing `nextSettings` so changed detection and persistence compare the same shape. **Quality gate** - `make typecheck` - targeted heartbeat schema/API tests if needed after the shape change lands ### Phase 2 — Add the per-workspace message field to the existing modal **Files / symbols** - `src/browser/components/WorkspaceHeartbeatModal/WorkspaceHeartbeatModal.tsx` - `src/browser/hooks/useWorkspaceHeartbeat.ts` **Changes** - Extend the modal draft state with `draftMessage`. - Re-sync the draft message on open/workspace-switch alongside `enabled` and `intervalMs`. - Add a simple styled `<textarea>` to the modal instead of introducing a new shared component; keep the diff small. - Show helper text such as: **“Leave empty to use the default heartbeat message.”** - Use the current default instruction body as placeholder/help copy rather than pre-populating the field. - Preserve the saved custom message even if the user temporarily disables heartbeats. - Pass `message` through `save()`; send `undefined` when the trimmed draft is empty so clearing the field removes the override. **UI behavior choice** - Keep the existing heartbeat modal as the single edit surface for workspace-scoped settings. - Show the new message field **only when heartbeat is enabled in the modal**, to mirror the user’s requested “toggle on -> reveal configurable value” interaction, but do **not** clear the persisted value when toggled off. **Quality gate** - local UI validation for required trim/max-length behavior - component test for save + re-open behavior before moving on ### Phase 3 — Centralize prompt composition with fallback behavior **Files / symbols** - `src/node/services/workspaceService.ts` - `executeHeartbeat()` **Changes** - Fetch the saved heartbeat settings inside `executeHeartbeat()` using the existing workspace-scoped settings path. - Split the prompt into: 1. a fixed preamble with `[Heartbeat]` and the computed idle duration 2. an instruction body that comes from `settings.message` when present, otherwise the current built-in default body - Keep `muxMetadata.displayStatus`, `synthetic: true`, and `requireIdle: true` unchanged. **Target shape** ```ts const heartbeatLead = `[Heartbeat] This workspace has been idle for approximately ${idleDuration}.`; const heartbeatBody = customMessage ?? DEFAULT_HEARTBEAT_BODY; const heartbeatPrompt = `${heartbeatLead} ${heartbeatBody}`; ``` **Why this shape** - preserves useful runtime context without introducing templating syntax - avoids requiring the user to remember/include `idleDuration` - keeps the fallback message unchanged for all existing workspaces **Quality gate** - targeted backend test proving both the custom-message and fallback paths ### Phase 4 — Preserve custom message across secondary write paths **Files / symbols** - `src/browser/utils/chatCommands.ts` - `processSlashCommand()` heartbeat-set branch - `src/browser/utils/chatCommands.test.ts` **Changes** - Update the `/heartbeat` command path so changing cadence or toggling heartbeats off does **not** accidentally clear the saved custom message. - Preserve the stored `message` the same way the current code already preserves the stored interval when disabling heartbeats. - Keep the API write semantics explicit at the callsite rather than relying on hidden merge behavior. **Why this matters** - today `workspace.heartbeat.set()` is called from both the modal hook and the slash-command path - without a compatibility update, `/heartbeat 30` or `/heartbeat off` would overwrite the saved heartbeat message **Quality gate** - targeted slash-command tests for both enable and disable flows ## Test plan ### Update existing tests - `src/node/services/heartbeatService.test.ts` - extend the existing prompt assertions to cover: - custom message body is used when stored - fallback default body is used when no custom message exists - `src/browser/utils/chatCommands.test.ts` - assert `/heartbeat off` preserves the stored `message` - assert `/heartbeat <minutes>` preserves the stored `message` when only cadence changes ### Add focused UI coverage - Add `src/browser/components/WorkspaceHeartbeatModal/WorkspaceHeartbeatModal.test.tsx` - renders the new message field when enabled - saves a custom message - reopens with the saved message restored - clearing the field removes the override instead of persisting whitespace ### Validation commands - `make typecheck` - `make lint` - targeted `bun test` for the touched suites above - `make test` before claiming completion if the targeted runs stay green and runtime budget allows ## Acceptance criteria - The heartbeat modal exposes a per-workspace message field behind the existing workspace heartbeat flow. - Saving the modal persists `message` alongside `enabled` and `intervalMs` for that workspace only. - Reopening the modal for the same workspace restores the saved message; a different workspace does not inherit it. - Clearing the message field removes the override and reverts to the built-in default heartbeat body. - Heartbeat execution uses the custom message body when present and the current default body otherwise. - Existing `/heartbeat` commands do not wipe the saved custom message. - No new global experiment textbox is added. ## Dogfooding and review artifacts ### Setup - Start an isolated sandbox with `make dev-server-sandbox`. - Use the sandbox's printed Vite URL rather than your normal local mux instance so the dogfood run stays isolated. - Drive the sandboxed UI with `agent-browser` against that URL and capture screenshots/video from that session. - If sandbox seeding would make the test noisy, use `DEV_SERVER_SANDBOX_ARGS="--clean-projects"` (and `--clean-providers` too if you want a fully blank environment). ### Manual dogfood flow 1. Enable the **Workspace Heartbeats** experiment if it is not already on. 2. Open a real workspace. 3. Open **Configure heartbeat** from the workspace UI. 4. Enable heartbeats and enter: - a valid interval - a custom heartbeat message 5. Save, reopen the modal, and confirm the message persists. 6. Switch to a second workspace and confirm it still shows the default/empty message state. 7. Clear the custom message, save again, reopen, and confirm fallback behavior. ### Artifact requirements - Capture **at least 3 screenshots**: 1. the modal showing the new message field 2. the modal reopened with the persisted custom message 3. a second workspace showing no cross-workspace bleed - Capture **one short video** showing: open modal -> enter message -> save -> reopen -> clear -> save again. - If practical during the same session, capture one extra screenshot of a heartbeat transcript/log path using the custom body; otherwise rely on the targeted backend test for prompt verification and note that the live scheduler minimum makes full end-to-end timing slower. ## Risks / watchpoints - The router currently forwards only `enabled` and `intervalMs`; forgetting to thread `message` there would silently drop the new field. - `chatCommands.ts` is a compatibility trap because it already writes heartbeat settings outside the modal. - Avoid inventing placeholder/template syntax in v1; it adds UX and validation surface area without being necessary for this request. ## Handoff note If implementation starts later in Exec mode, keep the diff focused on the files above and avoid adding a second settings surface unless product requirements explicitly change toward a global default. </details> --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$18.31`_ <!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=18.31 -->
🤖 fix: cap expanded queued message height and make it scrollable (#3057) ## Summary Cap the expanded queued message content at `40vh` and make it scroll on overflow, so long queued messages no longer push the composer off-screen. ## Background When a queued message is expanded and its content exceeds viewport height, the entire `ChatInputPane` grows unbounded—the Edit/Send now buttons and the composer itself get displaced below the fold. This is the same class of overflow that `AttachedReviewsPanel` already handles with a viewport-relative height cap. ## Implementation Wrapped `<UserMessageContent>` inside the expanded `QueuedMessageCard` in an inner `<div className="max-h-[40vh] overflow-y-auto">` scroll container. The action row (`Edit` / `Send now`) stays outside that wrapper so controls remain always visible while the message body scrolls independently. --- _Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking: `xhigh` • Cost: `$2.02`_ <!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh costs=2.02 -->
🤖 fix: remove executeBash host-workspace escape hatch for devcontaine… …rs (#3018) ## Summary This follow-up to #2889 removes the temporary `executeBash` `executionTarget` / `host-workspace` escape hatch now that devcontainer lazy-start and passive runtime gating are in place, and aligns passive branch/git metadata refreshes with that runtime-aware contract without regressing the newer multi-project git status flow. ## Background PR #2889 added lazy-start on demand and gated passive PR/fetch work so stopped devcontainers stay asleep. The remaining `executionTarget` path still let the browser ask the backend to run some git commands directly against the host worktree, which a security scanner flagged as a client-controlled isolation bypass. With passive runtime gating available, that fallback is no longer needed. While rebasing this follow-up onto current `main`, the new multi-project git status path also required a small reconciliation so the single-workspace runtime gating change kept the existing cached-state behavior for offline runtimes. ## Implementation - remove `executionTarget` from the `workspace.executeBash` schema, router forwarding, and backend `WorkspaceService.executeBash()` path - delete the BranchSelector mount-time `git rev-parse --abbrev-ref HEAD` probe that would otherwise wake stopped devcontainers - gate passive single-workspace git status refreshes on runtime eligibility and reuse the existing one-shot retry so refresh resumes when the runtime becomes `running` - keep the current multi-project cached-state behavior while updating tests and stories to match the new passive-refresh contract ## Validation - `make static-check` - `bun test src/node/services/workspaceService.test.ts --timeout 30000` - `bun test src/browser/stores/GitStatusStore.test.ts --timeout 30000` - `bun test src/browser/stores/PRStatusStore.test.ts --timeout 30000` ## Risks The main regression risk is around passive branch/status freshness for stopped devcontainers and the interaction between the single-project and multi-project git status paths. The change does not affect explicit user-triggered git operations, and the touched store/service paths have focused regression coverage. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$N/A`_ <!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=N/A -->
🤖 refactor: reuse standard task report reminders (#2993) ## Summary This follow-up to #2986 keeps the existing `awaiting_report` recovery behavior but trims one special case: waiter-triggered recovery now reuses the standard completion reminder instead of carrying a separate waiter-only prompt variant. ## Background PR #2986 introduced a few paths that can re-prompt an `awaiting_report` task. The waiter-specific reminder text and enum branch were the odd ones out, because they duplicated the same completion-tool guidance with different copy. Reusing the normal reminder keeps the recovery path while simplifying the state machine. ## Implementation - waiter-triggered recovery still nudges `awaiting_report` tasks, but now reuses the default completion reminder path - dropped the now-unused `"waiter"` completion-recovery reason and prompt string - added a regression test that proves `waitForAgentReport()` no longer emits distinct waiter-only reminder copy ## Validation - `bun test src/node/services/taskService.test.ts` - `make static-check` ## Risks Low. This keeps the waiter-side recovery hook intact; it only removes the separate waiter-only reminder variant. --- _Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` • Cost: `$146.35`_ <!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=146.35 -->
🤖 feat: vendor agent-browser for local execution (#2971) ## Summary Vendor the `agent-browser` CLI as a runtime dependency so Mux can resolve and spawn it without requiring a global install. Wire the vendored binary through `BrowserSessionBackend`, shell wrappers, and bash tool PATH enrichment so all three execution paths (internal browser sessions, agent bash tools, child processes) use the same vendored native binary. Additionally, bridge agent `agent-browser` CLI sessions with the Browser tab sidebar so they share a deterministic session name, and auto-start the Browser tab monitoring when the experiment is enabled. ## Background Previously, `agent-browser` was a `devDependency` only and every execution path (`BrowserSessionBackend.spawn()`, agent bash tool calls) relied on a globally installed `agent-browser` binary being on PATH. This was fragile across: - Packaged Electron builds (no global install available) - Nix devshell PATH rewriting (`.mux/tool_env` replaces PATH entirely) - Fresh user environments without a global install ## Implementation ### CLI Vendoring (core) - Move `agent-browser` from `devDependencies` → `dependencies` so packaged builds include it - Add `asarUnpack` glob for native binaries so Electron `spawn()` works outside ASAR - New `agentBrowserLauncher.ts` resolves the platform-specific native binary, handles ASAR path rewriting, and generates shell wrappers - `BrowserSessionBackend` now uses `resolveAgentBrowserBinary()` instead of bare `spawn("agent-browser")` - `main.ts` materializes `~/.mux/bin/agent-browser` wrapper at startup and prepends it to PATH - `.mux/tool_env` restores vendored bin dir after nix PATH replacement ### Session Bridging (Browser tab) - Deterministic session ID: `mux-<workspaceId>` shared between `BrowserSessionBackend` and the wrapper - New `MUX_BROWSER_SESSION` env var exported to bash tool processes; wrapper auto-injects `--session` when set - `BrowserSessionBackend.start()` detects existing daemon sessions and attaches instead of navigating to about:blank - `BrowserTab.tsx` auto-starts agent-owned monitoring when experiment is enabled (no manual "Start" click) ## Validation - `make typecheck` ✅ - `make lint` ✅ - `bun test agentBrowserLauncher.test.ts` — 8/8 pass - `bun test initHook.test.ts` — pass - `bun test browserSessionBackend.test.ts` — pass - Live dogfood in dev-server-sandbox: - Agent bash tool `which agent-browser` → vendored path ✅ - Agent `agent-browser open` + `screenshot` via bash tool ✅ - Browser tab auto-starts with "Live · Agent owned" status ✅ - `MUX_BROWSER_SESSION` env var reaches bash tool environment ✅ - Wrapper session injection verified ✅ ## Risks - **Browser tab live frame**: The polling-based session monitoring (`BrowserSessionBackend.refreshMetadata()`) doesn't yet detect the agent's navigation in real-time. The infrastructure (deterministic session ID, wrapper injection, auto-start) is in place, but the daemon-level session sharing between the backend's `--json --session` polling and the agent's CLI invocations needs further investigation. This is a pre-existing limitation of the Browser tab experiment, not a regression. - **Remote runtime parity**: Vendored binaries only solve local execution. SSH/remote runtimes still need a separate story. - **Windows**: Wrapper generates `.cmd` files but hasn't been tested on Windows. --- <details> <summary>📋 Implementation Plan</summary> # Implementation Plan: Vendored `agent-browser` availability in Mux ## Goal Make the repo-vendored `agent-browser` dependency usable across **local** Mux execution paths without requiring a **global** `agent-browser` install: 1. **Mux-internal browser sessions** (`BrowserSessionBackend`) 2. **Mux bash tool calls** where agents type `agent-browser ...` 3. **Other local child processes** that inherit Mux's PATH ### Non-goals for this change - **SSH / remote runtime parity**: a vendored dependency inside the local Electron app cannot automatically appear on a remote host. Remote runtimes should remain explicitly out of scope unless we design a separate proxy/sync story. - **A generic vendored-CLI framework for every package**: implement the pattern for `agent-browser` first; only generalize where it falls out naturally. --- ## Verified facts driving the design - `package.json` currently lists `agent-browser` in **`devDependencies`** (`package.json:183`) and in `trustedDependencies` (`package.json:319`). - `src/node/services/browserSessionBackend.ts:353` currently does a bare `spawn("agent-browser", ...)`, so it depends on the host PATH. - `src/desktop/main.ts:11-18` currently fixes PATH on macOS via `fix-path`, but does **not** add vendored CLI locations. - `.mux/tool_env` is sourced before trusted bash tool calls, and its nix path logic can **replace PATH entirely**, which means a parent-process PATH prepend alone is not enough. - `node_modules/.bin/agent-browser` exists and points to `../agent-browser/bin/agent-browser.js`. - `node_modules/agent-browser/bin/agent-browser.js` is only a thin Node wrapper: it maps `process.platform` / `process.arch` to a native filename like `agent-browser-darwin-arm64` or `agent-browser-win32-x64.exe`, then spawns that native binary. - The installed package already contains native binaries under `node_modules/agent-browser/bin/` for the supported platforms in this release. - Because `node_modules/.bin/agent-browser` points to the JS wrapper, **PATH-only exposure still depends on a host Node interpreter**. - Electron's ASAR docs say `child_process.spawn` cannot execute binaries inside an ASAR archive; unpacked executable paths are required for packaged `spawn()` flows. - electron-builder packaging currently bundles production dependencies, not devDependencies, so packaged runtime support requires moving `agent-browser` to `dependencies`. <details> <summary>Why the earlier “all three changes” idea is useful but needs one important correction</summary> The three earlier ideas all still make sense: - **Direct internal launch** fixes Mux's own use of `agent-browser`. - **`main.ts` PATH enrichment** helps most local subprocesses. - **`.mux/tool_env` PATH enrichment** preserves CLI availability when nix/tool env rewriting would otherwise drop it. The correction is that Mux should treat the **native binary** as the real runtime artifact, not the JS shim. That changes the design in two important ways: - for **Mux-internal browser sessions**, the right move is to resolve and spawn the vendored **native executable** directly, bypassing both PATH and host `node` - for **shell contexts**, Mux should expose a stable `agent-browser` command that points at that resolved native executable; raw `node_modules/.bin/agent-browser` can remain a development convenience, but it is not the robust packaged solution So the comprehensive plan should center on **native-binary resolution + PATH exposure**, with raw `.bin` access treated as secondary. </details> --- ## Recommended implementation approaches ### Approach A — **Direct native vendoring** (recommended) **Net product LoC estimate:** **+80 to +150** (excludes tests, docs, `package.json` edits) This approach fully removes the need for a **global `agent-browser` install** for local Mux usage, and also avoids a host `node` dependency in the main local execution paths. **Core idea:** - Ship `agent-browser` as a runtime dependency. - Resolve the vendored **platform-specific native binary** directly from the installed package. - Add a small Mux helper that rewrites packaged paths to the unpacked executable location when needed. - Materialize a small wrapper in a Mux-managed bin directory (for bash/tool usage) that forwards to the resolved native binary. - Prepend that wrapper dir to PATH in both the app process and `.mux/tool_env`. - Keep project `node_modules/.bin` on PATH where useful for development ergonomics, but do not rely on it for packaged correctness. ### Approach B — **Minimal PATH-only vendoring** (fallback / dev-only shortcut) **Net product LoC estimate:** **+25 to +50** (excludes tests, docs, `package.json` edits) This approach is smaller, but only fully works when the host already has a usable `node` on PATH. **Core idea:** - Move `agent-browser` to `dependencies`. - Stop using a global install for `BrowserSessionBackend`. - Prepend the app's `node_modules/.bin` / project `node_modules/.bin` to PATH in `main.ts` and `.mux/tool_env`. **Why this is not the main recommendation:** - packaged users can still fail if they do not have host `node` - bash/tool shell use still depends on the JS shim finding `node` - packaged `spawn()` flows still need unpacked native executables anyway **Decision:** implement **Approach A**, while preserving the useful PATH improvements from the earlier three-part idea. --- ## Detailed plan for Approach A ## 1) Make `agent-browser` a runtime-shipped dependency ### Files - `package.json` - lockfile (as generated by bun) ### Work - Move `agent-browser` from `devDependencies` to `dependencies`. - Leave `trustedDependencies` intact unless the install flow proves it can be simplified. ### Why - electron-builder packages runtime dependencies, not devDependencies. - Without this, packaged builds can never rely on the vendored copy. ### Acceptance notes - Development still works unchanged. - Packaged builds now have the package available for runtime resolution. ### Size - **Product LoC:** **+0** (config-only) --- ## 2) Add an `agent-browser` native binary resolver ### Files - **New:** `src/node/services/agentBrowserLauncher.ts` (or similarly named agent-browser-specific helper) - `src/node/services/browserSessionBackend.ts` - Potentially `src/common/constants/` for wrapper-dir naming if the string appears in multiple places ### Work Introduce a small helper that resolves the absolute native executable path for the current platform. Responsibilities: - resolve the installed package root via `require.resolve("agent-browser/package.json")` or an equivalent package-root lookup - mirror the upstream wrapper's platform/arch mapping to compute the native filename: - `darwin` / `linux` / `win32` - `x64` / `arm64` - `.exe` suffix on Windows - build the absolute path to `node_modules/agent-browser/bin/<platform-binary>` - when running from a packaged app, rewrite the resolved path to the unpacked location (for example from `app.asar/...` to `app.asar.unpacked/...`) so `spawn()` targets a real executable on disk - expose a stable env var for shell consumers (for example `MUX_VENDORED_BIN_DIR`) so `.mux/tool_env` does not need to guess platform-specific config-root paths ### Why - bypasses both PATH lookup and host `node` for Mux's own browser-session feature - matches the actual package layout: the native binary is the real runtime artifact, while the JS file is only a selector shim - gives one canonical place for platform mapping, packaged-path rewriting, and error reporting ### Defensive-programming requirements - assert the resolved package root is non-empty and absolute - assert the platform/arch mapping is supported before building the filename - assert the final binary path exists; on POSIX, also ensure it is executable or can be chmodded similarly to the upstream wrapper - provide distinct error messages for unsupported platform, missing vendored package, and missing/unpacked binary ### Size - **Product LoC:** **+25 to +45** --- ## 3) Add shell-wrapper generation helpers around the resolved native binary ### Files - `src/node/services/agentBrowserLauncher.ts` - `src/desktop/main.ts` - Potentially `src/common/constants/` for wrapper-dir naming if the string appears in multiple places ### Work Alongside the resolver, expose helper(s) that compute the Mux-managed wrapper dir and render wrapper contents from the same canonical binary path. Suggested responsibilities: - compute the Mux-managed bin/wrapper directory (prefer `config.rootDir/bin` or a similarly stable Mux-owned directory) - expose a stable env var for shell consumers (for example `MUX_VENDORED_BIN_DIR`) so `.mux/tool_env` does not need to guess platform-specific config-root paths - generate a POSIX `agent-browser` wrapper that `exec`s the absolute native binary path - generate a Windows `agent-browser.cmd` (or equivalent) wrapper that forwards arguments to the resolved `.exe` - keep wrapper contents derived from the same resolver used by `BrowserSessionBackend`, so there is one source of truth for path computation ### Why a helper is worthwhile - keeps `browserSessionBackend.ts`, `main.ts`, and `.mux/tool_env` aligned - avoids duplicating platform mapping and packaged-path logic across multiple call sites - gives one place to centralize assertions, stale-wrapper rewrites, and path drift handling after upgrades ### Defensive-programming requirements - assert wrapper dir paths are absolute and non-empty - assert wrapper targets are absolute native binary paths, never bare command names - rewrite wrappers whenever the computed target changes instead of trusting stale contents - provide a clear local error string when wrapper generation cannot be prepared ### Size - **Product LoC:** **+20 to +35** --- ## 4) Switch `BrowserSessionBackend` off PATH lookup ### Files - `src/node/services/browserSessionBackend.ts` ### Work Replace the bare: ```ts spawn("agent-browser", ["--json", "--session", this.sessionId, ...args], ...) ``` with the resolved native-binary helper from steps 2–3, so browser sessions no longer depend on: - a global install - PATH resolution - the host shell being configured correctly Keep the existing timeout, stdout/stderr parsing, and JSON handling behavior unchanged. ### Error handling updates - replace the current missing-binary guidance (`install it with: bun install -g ...`) - new messaging should distinguish between: - vendored native binary missing, unsupported, or not unpacked correctly - unexpected spawn/runtime failure once the binary has been resolved - in dev checkouts, the remediation can mention `bun install` - in packaged apps, the remediation should prefer “reinstall/update Mux” rather than asking for a global install ### Why This is the highest-value change: it makes the actual product feature self-contained first. ### Size - **Product LoC:** **+10 to +20** --- ## 5) Materialize a Mux-managed shell wrapper for `agent-browser` ### Files - `src/desktop/main.ts` - likely `src/node/services/agentBrowserLauncher.ts` (shared wrapper-content builder) - potentially a tiny file utility if an existing atomic-write helper is not already available ### Work Write a small wrapper into a Mux-owned bin directory such as: - `config.rootDir/bin/agent-browser` on POSIX - `config.rootDir/bin/agent-browser.cmd` (and/or `.bat`) on Windows Wrapper behavior: - point at the resolved native binary from steps 2–3, not the JS shim in `node_modules/.bin` - on POSIX, `exec` the absolute native binary path and forward `"$@"` - on Windows, invoke the resolved `.exe` and forward `%*` - keep the command name stable as `agent-browser` even though the underlying vendored binary is platform-suffixed ### Lifecycle strategy Use a **self-healing** model: - best-effort creation/update during app startup - always safe to overwrite when paths drift after upgrades or dev rebuilds - if startup creation fails, do **not** crash app startup; log and allow later use-sites to surface a targeted runtime error ### Why This is the piece that makes `agent-browser` available to **shell contexts** without requiring a global install. ### Defensive-programming requirements - atomically rewrite wrappers when contents differ - ensure POSIX wrappers get executable bits - on Windows, generate the wrapper format the platform actually resolves - keep the wrapper generation idempotent ### Size - **Product LoC:** **+25 to +45** --- ## 6) Prepend the Mux-managed wrapper dir to PATH for local app subprocesses ### Files - `src/desktop/main.ts` ### Work After the existing macOS `fix-path` block, prepend the Mux-managed wrapper dir to `process.env.PATH` using `path.delimiter`. Also publish the resolved directory into `process.env` (for example `MUX_VENDORED_BIN_DIR`) so `.mux/tool_env` and any other child-process setup can reuse the exact same location. Recommended ordering: 1. **Mux-managed wrapper dir** (`~/.mux/bin` via config root) 2. existing/inherited PATH from `fix-path` / OS Optional but useful extension: - also prepend the app's own vendored `node_modules/.bin` in **development** so raw vendored CLIs remain easy to discover during local repo development - do **not** rely on that raw bin as the primary packaged solution; the wrapper is the robust path ### Why - ensures most local child processes inherit a working `agent-browser` command - covers untrusted projects too, since `.mux/tool_env` is only sourced for trusted repos - complements the direct `BrowserSessionBackend` launch instead of replacing it ### Startup constraint Per repo guidance, startup-time init must not crash the app: - wrapper creation / PATH enrichment should be wrapped in try/catch - failures should degrade to debug logging + later runtime error surfaces ### Size - **Product LoC:** **+10 to +20** --- ## 7) Preserve `agent-browser` inside `.mux/tool_env`, even when nix rewrites PATH ### Files - `.mux/tool_env` ### Work Update `.mux/tool_env` so that after it determines the **base PATH** (nix-derived or inherited), it **prepends**: 1. the Mux-managed wrapper dir from a stable exported env var (for example `${MUX_VENDORED_BIN_DIR}`), with `~/.mux/bin` only as an explicit fallback if the repo still standardizes on that path 2. the project-local `node_modules/.bin` when `${MUX_PROJECT_PATH}/node_modules/.bin` exists ### Important ordering requirement Do **not** prepend these dirs only before the nix helper runs. Because `.mux/tool_env` can replace PATH with a cached nix PATH, the wrapper/project bin prepend must happen **after** the final base PATH is known, or be factored into the code path that exports the final PATH value. ### Why keep both wrapper dir and project `node_modules/.bin` - the **wrapper dir** is the robust local command for packaged/no-global-install cases - the **project `node_modules/.bin`** keeps local repo development ergonomics good and matches the user's original idea ### Scope note This only solves **local** bash-tool usage. Remote/SSH runtimes would still need a separate design. ### Size - **Product LoC:** **+10 to +20** --- ## 8) Explicitly unpack vendored `agent-browser` native binaries for packaged builds ### Files - `package.json` build config - packaged-path logic in `src/node/services/agentBrowserLauncher.ts` ### Work Add a narrow packaging rule so the vendored native executable is available outside the ASAR archive in packaged apps. Recommended shape: - extend `asarUnpack` with a focused pattern covering the vendored agent-browser native binaries, such as `**/node_modules/agent-browser/bin/agent-browser*` or the smallest equivalent glob that reliably includes the current-platform executable - make the resolver from step 2 translate packaged paths to their unpacked location when Mux is running from `app.asar` ### Why - Electron's ASAR docs state that `child_process.spawn` cannot execute binaries inside ASAR archives - `BrowserSessionBackend` currently uses `spawn`, so packaged correctness requires a real executable path on disk - this is a native-executable packaging concern, not a generic JS-module concern ### Explicit validation gate During packaged smoke testing, verify that: - the resolved path points at an unpacked executable, not a path inside `app.asar` - `BrowserSessionBackend` can spawn the vendored binary successfully in the packaged app - the Mux-managed shell wrapper still resolves to the same unpacked executable path ### Fallback if the first `asarUnpack` glob is too broad or too narrow Use the smallest rule that: - includes the current-platform executable - avoids unpacking unrelated dependency trees - keeps the resolver logic simple and deterministic ### Size - **Product LoC:** **+0 to +5** (config-only unless packaged-path helper needs a tiny extension) --- ## Tests and validation plan ## Automated coverage ### 1) New unit tests for launcher/wrapper logic **Likely new test file:** `src/node/services/agentBrowserLauncher.test.ts` Cover: - native binary path resolution for supported `platform` / `arch` combinations - packaged-path rewriting from `app.asar/...` to the unpacked executable location - wrapper content generation for POSIX and Windows around the resolved native binary - stale-wrapper rewrite detection / idempotent update behavior - clear failure when the vendored native binary is missing, unsupported, or not unpacked correctly Use `src/node/services/desktop/PortableDesktopSession.ts` and `PortableDesktopSession.test.ts` as the pattern reference for: - PATH-vs-fallback binary logic - wrapper script testing - cross-platform launcher expectations ### 2) Update / add browser session backend tests **Likely file:** `src/node/services/browserSessionBackend.test.ts` (create if absent) Cover: - `BrowserSessionBackend` no longer calls bare `agent-browser` - `spawn()` receives the resolved native binary path rather than a PATH-dependent command name - timeout / invalid JSON behavior stays intact - missing-binary errors become actionable and no longer mention global install as the primary fix ### 3) Test PATH composition in a pure helper where possible If `main.ts` startup logic is too awkward to test directly, extract the PATH-composition / wrapper-dir calculation into a small helper and unit test that helper instead of over-testing Electron startup code. ### 4) `.mux/tool_env` validation Prefer a lightweight smoke/integration check over brittle shell-script golden tests unless an existing harness makes this easy. At minimum, validate that a trusted local workspace with nix path rewriting still resolves `agent-browser` after the change. --- ## Dedicated dogfooding plan Dogfooding is required for this change because it affects both a **CLI/runtime path** and a **Mux UI-backed workflow**. The dogfooding pass should follow the spirit of the `dogfood` skill: create a small evidence bundle, exercise the change like a real user, and leave behind reviewer-friendly proof. ### Dogfooding evidence bundle (required) Create a dedicated output directory such as `./dogfood-output/cli-vendoring/` with: - `report.md` — scenario-by-scenario notes, commands/prompts used, expected vs actual results, and links to artifacts - `screenshots/` — annotated screenshots for key setup and result states - `videos/` — short `.webm` or desktop screencast recordings for each interactive flow Reviewer handoff requirements: - attach representative screenshots with `attach_file` - attach at least one video per interactive flow with `attach_file` when the final verification write-up is produced - include the exact scenario steps in the report so a reviewer can replay them without guessing ### Dogfooding setup 1. **Prove vendoring, not host-global fallback.** - Run the dogfood flows in an environment where a global `agent-browser` is absent from the effective PATH, or otherwise prove that `command -v agent-browser` resolves to the Mux-managed wrapper rather than a host-global install. - Keep normal repo dependencies installed so the vendored package is present. 2. **Use the `dev-server-sandbox` skill for isolated Mux UI verification.** - Start an isolated backend/web instance with `make dev-server-sandbox`. - Record the emitted `BACKEND_PORT`, `VITE_PORT`, and sandbox `MUX_ROOT` in the dogfood report. - Use `KEEP_SANDBOX=1` if preserving the sandbox root helps post-failure debugging. 3. **Use `agent-browser` directly, never `npx agent-browser`.** - This matches both the `dogfood` and `agent-browser` skills and ensures the test path exercises the vendored fast/native CLI path. 4. **Capture proof continuously, not at the end.** - For each interactive scenario, start video recording before reproducing the flow. - Take an annotated screenshot at the initial state and the final result state. - Append notes to `report.md` immediately after each scenario rather than batching findings later. ### Dogfooding scenario A — direct vendored CLI availability **Goal:** prove local shell usage works without a global install and without routing through `npx`. Suggested flow: 1. Start a short desktop/terminal recording before the first command. 2. Run `command -v agent-browser` and `agent-browser --help`; record the observed command path/output in `report.md`. 3. Exercise a real browser command with the direct CLI, for example: - `agent-browser --session mux-cli-vendoring open https://example.com` - `agent-browser --session mux-cli-vendoring wait --load networkidle` - `agent-browser --session mux-cli-vendoring snapshot -i` - `agent-browser --session mux-cli-vendoring screenshot --annotate ./dogfood-output/cli-vendoring/screenshots/cli-direct-success.png` 4. Stop the recording and save the artifact under `videos/`. 5. Close the session and log the exact steps/result in `report.md`. **Required proof:** - one terminal/desktop video showing command invocation - one annotated screenshot from the browser session - the resolved command path and command output copied into the dogfood report ### Dogfooding scenario B — Mux bash tool availability **Goal:** prove Mux bash tools inherit a working vendored `agent-browser` command. Suggested flow: 1. Start `make dev-server-sandbox` and note the UI URL in the report. 2. Use `agent-browser` to open the sandboxed Mux UI directly: - `agent-browser open http://127.0.0.1:<VITE_PORT>` - `agent-browser wait --load networkidle` - `agent-browser screenshot --annotate ./dogfood-output/cli-vendoring/screenshots/mux-initial.png` 3. Use the core `agent-browser` workflow while navigating Mux: - `snapshot -i` to identify controls - interact with refs to enter a prompt that triggers a bash tool call such as `command -v agent-browser && agent-browser --help` - re-snapshot after each UI change - use `diff snapshot` after the tool run if it helps confirm the expected state change in the UI 4. Start a video before submitting the prompt and stop it after the successful tool output is visible. 5. Capture an annotated screenshot of the successful bash-tool result and save the prompt/output summary into `report.md`. **Required proof:** - one UI video of the prompt-to-tool-result flow - annotated screenshots of the Mux UI before and after the tool result - the exact prompt used and the relevant tool output in `report.md` ### Dogfooding scenario C — BrowserSessionBackend / Browser tab workflow **Goal:** prove Mux's internal browser-session feature resolves and launches the vendored native binary rather than relying on a global PATH entry. Suggested flow: 1. In the sandboxed or local dev instance, navigate to the Browser-related UI/workflow that triggers `BrowserSessionBackend`. 2. Start a video before the action that launches the browser session. 3. Use `agent-browser` against the Mux UI to drive the workflow, re-snapshotting after each navigation or DOM change. 4. Capture: - an annotated screenshot before launch - an annotated screenshot showing the live browser session / Browser tab after launch succeeds - any visible status text or UI state that proves the session is live 5. Stop the recording and save the exact steps/result in `report.md`. **Required proof:** - one video of the browser-session launch flow - annotated screenshots before/after launch - a short note in the report confirming this was run without relying on a global `agent-browser` ### Dogfooding scenario D — packaged-build regression pass **Goal:** prove the packaged app can find and spawn the unpacked vendored native binary. Suggested flow: 1. Build a local packaged app. 2. Launch it in an environment where global `agent-browser` is unavailable or clearly not the resolved command. 3. Repeat at least scenario A (direct CLI wrapper exposure, if applicable) and scenario C (browser-session launch). 4. Capture a desktop video and annotated screenshots of the packaged flow. 5. Record the observed wrapper/binary resolution behavior in `report.md`, including whether the resolved path points at the unpacked packaged location. **Required proof:** - one packaged-app video - one packaged-app annotated screenshot - a report note confirming packaged resolution succeeded without a global install ### Dogfooding scenario E — PATH edge cases **Goal:** prove the PATH story holds across the environments this change explicitly targets. Run a focused check for each of these and capture at least one screenshot/video pair for the most failure-prone cases: - **trusted repo with `.mux/tool_env` / nix path rewriting** — verify `agent-browser` still resolves after tool-env PATH rewriting - **untrusted repo** — verify inherited app PATH still exposes the Mux-managed wrapper even when `.mux/tool_env` is not sourced - **Windows sanity pass** — verify wrapper filename/format resolution, `path.delimiter` handling, and no accidental POSIX-only assumptions ### Dogfooding method notes from the skills - From `dogfood`: document each scenario as you go, and collect screenshots/videos before moving on. - From `agent-browser`: use the core loop of `open` → `snapshot -i` → interact via `@eN` refs → re-snapshot after page changes. - From `agent-browser`: use `screenshot --annotate` for reviewer-friendly evidence, and `diff snapshot` when it helps prove a UI state change happened. - From `dev-server-sandbox`: prefer an isolated sandbox instance over reusing your default dev root so the verification environment is reproducible and easier to debug. --- ## Acceptance criteria - `BrowserSessionBackend` no longer depends on `spawn("agent-browser", ...)`. - A **local** Mux install can use browser sessions without a global `agent-browser` install. - Mux bash tools can resolve `agent-browser` in local workspaces through a Mux-managed PATH entry. - nix/tool_env PATH rewriting does not remove `agent-browser` availability. - Packaged builds ship the dependency because it is no longer in `devDependencies`. - Packaged builds can resolve and spawn an unpacked vendored native `agent-browser` executable. - User-facing errors stop instructing users to globally install `agent-browser` as the default remedy. - Startup remains resilient: wrapper/native-binary setup failures do not crash app launch. - The dedicated dogfooding pass is completed for the relevant scenarios (direct CLI, Mux bash tool flow, Browser session flow, packaged regression where available). - Dogfooding artifacts include a reviewer-readable `report.md`, annotated screenshots, and video recordings for the interactive flows. - The final verification handoff attaches representative screenshots/videos with `attach_file` so reviewers can audit what was tested. --- ## Risks / watchpoints - **Remote runtime ambiguity:** local vendoring does not solve SSH-host execution. - **Packaged-path drift:** wrappers that embed old app paths must be regenerated automatically. - **ASAR alignment risk:** the `asarUnpack` glob and the packaged-path rewrite must stay aligned so the resolver never points `spawn()` at a path still inside `app.asar`. - **Windows wrapper semantics:** plan for both shell and spawn realities; do not assume the POSIX wrapper alone is enough. - **Over-generalization risk:** keep the first version agent-browser-specific unless a generic abstraction is obviously justified by the implementation. --- ## Suggested implementation order 1. Move `agent-browser` to `dependencies`. 2. Add the native-binary resolver and packaged-path rewrite helper. 3. Add the shell-wrapper generation helper. 4. Switch `BrowserSessionBackend` to the resolved native binary. 5. Add `asarUnpack` coverage for the vendored native binary. 6. Add wrapper materialization and PATH enrichment in `main.ts`. 7. Update `.mux/tool_env` so nix PATH rewriting preserves the wrapper / local `.bin` entries. 8. Update error strings and any agent-browser usage docs/skill text that still imply global install. 9. Run automated validation, then execute the dedicated dogfooding plan above and attach the resulting evidence. --- ## Validation commands to run during implementation - `make lint` - `make typecheck` - targeted tests for the new launcher/wrapper and browser-session code - `make test` if touched areas do not have sufficiently narrow coverage - `make dev-server-sandbox` for the isolated UI-backed dogfood run - one local packaged smoke pass (`make dist` or equivalent local packaging workflow on the current platform) - the dedicated dogfooding evidence capture described above (`report.md` + screenshots + videos) --- ## Optional follow-up (only after this lands cleanly) - extract the resolver/wrapper machinery into a small reusable “vendored native CLI” helper if another dependency needs the same treatment - design a separate remote-runtime story if `agent-browser` must become available in SSH workspaces too </details> --- _Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking: `xhigh` • Cost: `$27.73`_ <!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh costs=27.73 --> --------- Signed-off-by: Thomas Kosiewski <tk@coder.com>
🤖 feat: add browser sidebar tab for live agent-browser viewing (#2951) ## Summary Add a **Browser tab** to Mux's right sidebar that shows what agent-browser is doing in real time — live screenshots, URL/title tracking, action timeline, and session lifecycle controls. ## Background When an agent uses `agent-browser` to automate web pages, users currently have no visibility into what's happening. This feature adds a workspace-scoped Browser tab in the right sidebar so users can watch the agent navigate, see which page it's on, and understand the sequence of actions — without leaving the Mux interface. ## Implementation ### Architecture - **CLI-backed approach** — agent-browser has no library API, so we manage it as a subprocess - **ORPC for all transport** — screenshot polling (~2s) fits the existing event-stream model; no separate WebSocket needed - **One session per workspace** — enforced by the service layer ### Backend (4 new files + 6 modified) - `src/common/types/browserSession.ts` — shared types (`BrowserSession`, `BrowserAction`, `BrowserSessionEvent`) - `src/common/orpc/schemas/api.ts` — Zod schemas for `getActive`, `start`, `stop`, `subscribe` - `src/node/services/browserSessionBackend.ts` — CLI adapter managing the agent-browser subprocess, screenshot polling, metadata extraction, external-close detection - `src/node/services/browserSessionService.ts` — EventEmitter service with workspace-scoped events (mirrors DevToolsService pattern) - ORPC routes with async-generator subscription (snapshot-first, queue+resolveNext) - Service container wiring + disposal ### Frontend (4 new files + 5 modified) - `src/browser/features/RightSidebar/BrowserTab/BrowserTab.tsx` — main tab component with idle/starting/live/error/ended states, screenshot viewer, action timeline, Start/Stop/Restart controls - `src/browser/features/RightSidebar/BrowserTab/useBrowserSessionSubscription.ts` — ORPC subscription hook (mirrors `useDevToolsSubscription`) - Tab type registration, config, label in the right-sidebar infrastructure - Layout migration so existing workspaces get the Browser tab ### Key design decisions - **`sharp` lazy-loaded** — prevents Bun test env crashes from the native dependency - **External close detection** — if URL transitions from a real page to `about:blank`, the backend infers the browser was closed externally and surfaces an error - **Session controls** — Start/Restart (idle/ended/error states) and Stop (live/starting states) are mutually exclusive header buttons ## Validation - `make static-check` passes (typecheck + lint + shellcheck + prettier + docs) - `bun test src/browser/utils/rightSidebarLayout.test.ts` — 23/23 pass - `bun test src/cli/cli.test.ts src/cli/server.test.ts` — 28/28 pass - **3 full dogfood runs** with agent-browser driving the Mux frontend: 1. Initial dogfood: found 3 issues (button truncation, missing stop button, no close detection) 2. Fix verification dogfood: confirmed all 3 fixes 3. Final comprehensive dogfood: all 6 test scenarios passed (idle state, session start, navigation, tab switching, stop button, external close detection, restart cycle) ## Risks - **`sharp` in production Electron** — works in Node.js but may need packaging attention; fallback to raw PNG if unavailable - **Poll-based screenshots** — 2-second interval means the viewer lags slightly behind real-time; acceptable for MVP - **Single-session limit** — only one browser session per workspace; multi-session is a deliberate follow-up --- <details> <summary>📋 Implementation Plan</summary> # Integrate agent-browser into Mux right sidebar — implementation plan ## Objective Ship a **workspace-scoped Browser tab** in Mux’s right sidebar so users can watch an agent-driven browser session live, understand the agent’s current step, and eventually take over input when needed. The implementation should: - feel native to Mux’s existing right-sidebar/tab model, - preserve existing tool/chat UX, - avoid risky DOM embedding of arbitrary pages, - scale from a fast MVP to a first-class browser tool/session model. ## Verified repo and product constraints ### agent-browser capabilities confirmed from official research - agent-browser can expose a **live browser stream over WebSocket**. - The stream carries **base64 JPEG frames + viewport metadata** and accepts **mouse/keyboard/touch input** back over the socket. - agent-browser can also connect to an existing browser via **CDP**. - There is a documented **programmatic BrowserManager API** as well as a CLI path. - It supports recording sessions, which is useful for dogfooding/review artifacts. ### Relevant Mux integration points already in the repo - Right sidebar container: `src/browser/features/RightSidebar/RightSidebar.tsx` - Right-sidebar tab types: `src/browser/types/rightSidebar.ts` - Right-sidebar tab registry/config: `src/browser/features/RightSidebar/Tabs/registry.ts` - Existing live session/tab reference: `src/browser/features/RightSidebar/TerminalTab.tsx` - Existing live debug subscription pattern: `src/browser/features/RightSidebar/DevToolsTab/useDevToolsSubscription.ts` - Shared cross-boundary types convention: `src/common/types/` - ORPC schemas convention: `src/common/orpc/schemas/api.ts` - Backend router: `src/node/orpc/router.ts` - Service injection context: `src/node/orpc/context.ts` - Backend streaming/tooling adjacency: `src/node/services/streamManager.ts`, `src/node/services/mcpServerManager.ts` - Electron main process entry: `src/desktop/main.ts` ## Recommended delivery strategy ### Approach A — CLI-backed Browser tab behind a stable service interface **Summary:** Mux launches agent-browser as a managed subprocess, uses the documented stream port/WebSocket viewer, and renders the live viewport in a new Browser tab. - **Net LoC estimate (product code only):** **+650 to +900 LoC** - **Why choose it:** fastest path to a visible, testable MVP - **Primary risk:** process management and installation/runtime packaging - **Recommendation:** **fallback path** if the BrowserManager/API spike fails or packaging friction is high ### Approach B — BrowserManager-backed Mux-native browser session service **Summary:** Mux owns a `BrowserSessionService` and a native agent-browser backend adapter, with structured session state, action events, and a right-sidebar viewer. - **Net LoC estimate (product code only):** **+950 to +1400 LoC** - **Why choose it:** cleanest long-term architecture; strongest control over lifecycle, state, and UI sync - **Primary risk:** unknown effort to wire the agent-browser runtime/library cleanly into Mux’s desktop build/runtime - **Recommendation:** **preferred target architecture** if the initial spike proves viable within 1 day ### Approach C — Human takeover, recording, and replay-friendly session history **Summary:** Add explicit user takeover, input arbitration, recording hooks, and lightweight persisted session/action history. - **Net LoC estimate (product code only):** **+350 to +650 LoC incremental** - **Why choose it:** completes the “watch + intervene + review” story - **Primary risk:** agent/user race conditions and more UX/state complexity - **Recommendation:** **phase 3 follow-up**, not part of the first merge ## Recommended execution decision Execute a **1-day spike** first, then branch: 1. Try to stand up a minimal `BrowserSessionBackend` using the **BrowserManager/API path**. 2. If that spike proves stable in Mux’s Node/Electron runtime, proceed with **Approach B**. 3. If it does not, deliver **Approach A** behind the **same `BrowserSessionBackend` interface**, so the UI and ORPC contracts remain stable. This keeps the team moving while avoiding a throwaway UI. ## Non-negotiable architectural invariants 1. **Do not embed arbitrary web content as live DOM** in the renderer. - No `dangerouslySetInnerHTML`, `webview`, or loose iframe-based browsing surface for remote pages. - Render the viewer as **frames/images/canvas only**. 2. **Keep high-frequency frames off ORPC if possible.** - Use ORPC for session lifecycle, status, action timeline, and errors. - Prefer a dedicated viewer WebSocket endpoint (returned by the backend) for frame transport. 3. **Make the browser viewer workspace-scoped.** - MVP should support **one active browser session per workspace**. - Do not start with `browser:${sessionId}` multi-instance tabs. 4. **Lazy initialization only.** - Browser session services must not start on app boot. - Missing binaries/install issues must surface as recoverable UI errors, not startup crashes. 5. **Single-source shared types.** - Cross-boundary types go in `src/common/types/browserSession.ts`. - ORPC validation schemas stay in `src/common/orpc/schemas/api.ts`. 6. **Defensive programming at every boundary.** - Validate session IDs, workspace IDs, viewer URLs, stream payloads, and action events. - Assert impossible states in dev/test builds; degrade gracefully in user-facing UI. 7. **Never leak secrets into the timeline.** - Filled credentials or vault-backed values must be redacted in browser action summaries and never persisted. ## Proposed architecture ### Top-level model ```ts export interface BrowserSession { id: string; workspaceId: string; status: "idle" | "starting" | "live" | "paused" | "ended" | "error"; ownership: "agent" | "user"; backend: "agent-browser-cli" | "agent-browser-manager"; viewerUrl: string | null; title: string | null; url: string | null; lastAction: BrowserAction | null; lastError: string | null; startedAt: string; endedAt?: string; } export type BrowserSessionEvent = | { type: "snapshot"; session: BrowserSession | null; recentActions: BrowserAction[] } | { type: "session-updated"; session: BrowserSession } | { type: "action"; action: BrowserAction } | { type: "session-ended"; sessionId: string } | { type: "error"; sessionId: string; message: string }; ``` ### Core components/services - **`BrowserSessionService`** (`src/node/services/browserSessionService.ts`) - Owns the active browser session per workspace. - Emits workspace-scoped events using the same broad pattern as `DevToolsService`. - Tracks session state, recent actions, errors, and viewer endpoint. - **`BrowserSessionBackend`** (`src/node/services/browserSessionBackends/BrowserSessionBackend.ts`) - Stable internal interface so the runtime can swap between CLI and BrowserManager implementations. - **`BrowserTab`** (`src/browser/features/RightSidebar/BrowserTab.tsx`) - Subscribes to low-frequency state via ORPC. - Connects directly to the viewer WebSocket for frames/input when available. - Keeps frame rendering local to the leaf component so the rest of the sidebar does not rerender. ### Transport split - **ORPC stream:** status, title, URL, current/last action, errors, ownership, start/stop lifecycle. - **Viewer socket:** image frames + viewport metadata + optional input injection. <details> <summary>Why split transport instead of streaming frames through ORPC?</summary> Mux already has a clean ORPC event-stream pattern for subscription-style data, but browser frames are much higher-frequency than devtools/tool state updates. Sending base64 JPEG frames through ORPC would increase serialization pressure, trigger avoidable rerenders, and tie the viewer’s frame rate to the app’s control plane. A dedicated viewer socket keeps the control plane small and typed while letting the Browser tab own the rendering loop. </details> ## Execution phases and agent workstreams ## Phase 0 — 1-day architecture spike and runtime decision **Owner:** Backend/platform agent **Parallelizable:** no; everything else depends on this answer **Goal:** choose the backend implementation path without blocking the rest of the team for more than 1 day ### Tasks 1. Prove whether agent-browser’s **BrowserManager/API path** can run inside Mux’s runtime. 2. If yes, confirm how Mux obtains: - frame stream, - input injection, - page metadata, - shutdown hooks. 3. If no, prove the **CLI + stream port** path works reliably from Mux’s backend. 4. Decide whether the renderer can connect directly to a local viewer socket or whether Mux needs a relay. 5. Determine installation/runtime posture for dev builds: - external binary on PATH, - configured binary path, - or managed local install. ### Deliverables - Decision doc-in-code comment in the new backend interface/service explaining chosen backend. - Minimal spike proof (branch-local, not productionized) showing a session can start and produce viewer data. - Clear go/no-go verdict: **Approach B** or **Approach A fallback**. ### Exit criteria - Team knows which backend adapter to implement. - Team knows whether viewer transport is **direct renderer socket** or **backend relay**. - Team knows how missing binary/install will surface in the UI. ### Quality gate - Capture **1 screenshot** and **1 short video** of the spike showing a live browser image stream and a start/stop cycle. --- ## Phase 1 — Shared contracts, service skeleton, and ORPC surface **Owner:** Shared-contracts agent **Parallelizable with:** frontend shell work once types stabilize **Primary files:** - `src/common/types/browserSession.ts` (new) - `src/common/orpc/schemas/api.ts` - `src/node/orpc/context.ts` - `src/node/services/browserSessionService.ts` (new) - `src/node/orpc/router.ts` ### Tasks 1. Add shared types in `src/common/types/browserSession.ts`: - `BrowserSession` - `BrowserAction` - `BrowserSessionEvent` - viewer metadata types if needed 2. Add ORPC schemas in `src/common/orpc/schemas/api.ts` for: - `browserSession.getActive` - `browserSession.start` - `browserSession.stop` - `browserSession.subscribe` - optionally `browserSession.clearRecentActions` if the UI needs it later 3. Add `browserSessionService` to `ORPCContext` in `src/node/orpc/context.ts`. 4. Implement `BrowserSessionService` as a workspace-scoped `EventEmitter` service. 5. Mirror the `DevToolsService`/`useDevToolsSubscription` model: - snapshot-first subscription - queue buffering - listener cleanup on abort/disconnect 6. Define a strict policy for session ownership and lifecycle: - **one active session per workspace** - starting a new session either reuses or explicitly replaces the old one - hiding the Browser tab does **not** stop the session ### Acceptance criteria - The backend can create a placeholder browser session and stream typed lifecycle updates over ORPC. - The service cleans up listeners on unsubscribe/abort. - The types are shared and not duplicated in browser/node code. ### Defensive programming requirements - Assert that a workspace-scoped session cannot belong to a different workspace. - Assert that `viewerUrl` is null unless the session is starting/live/paused. - Reject malformed session transitions early. ### Estimated product code - **+180 to +280 LoC** --- ## Phase 2 — Backend adapter implementation and lifecycle management **Owner:** Backend integration agent **Parallelizable with:** Phase 3 UI shell once contracts are stable **Primary files:** - `src/node/services/browserSessionBackends/BrowserSessionBackend.ts` (new) - `src/node/services/browserSessionBackends/AgentBrowserManagerBackend.ts` (new, preferred) - `src/node/services/browserSessionBackends/AgentBrowserCliBackend.ts` (new, fallback) - `src/node/services/browserSessionService.ts` - `src/node/services/streamManager.ts` (only if shared run lifecycle hooks are needed) - `src/node/services/mcpServerManager.ts` (only if a bridge is needed later; avoid coupling MVP to this) ### Tasks 1. Create a backend interface with methods like: - `startSession(...)` - `stopSession(sessionId)` - `getViewerEndpoint(sessionId)` - `onAction(...)` - `onSessionUpdate(...)` 2. Implement the chosen backend adapter. 3. Add lazy runtime preflight: - binary/library available? - browser install available? - helpful error message if not. 4. Allocate an ephemeral viewer port or equivalent viewer endpoint. 5. Convert raw backend events into redacted, user-readable `BrowserAction` entries. 6. Ensure hard cleanup on: - workspace close, - session replacement, - app shutdown, - backend crash/disconnect. 7. Keep raw frames ephemeral; do **not** persist them to disk. 8. Persist only lightweight session/action metadata **if** it materially improves recovery/debuggability; otherwise keep MVP in-memory. ### Explicit scope control - **Do not** build generalized support for arbitrary external MCP browser tools in this phase. - **Do not** parse ad-hoc shell logs in the renderer. - If the CLI adapter is used, parsing/translation belongs inside the backend adapter only. ### Acceptance criteria - Starting a browser session returns typed session state plus a viewer endpoint. - Stopping a session cleans up all child resources and emits final state. - Missing dependency/install errors are shown as session errors, not crashes. - The service never leaves orphan processes/sockets after stop or shutdown. ### Defensive programming requirements - Use explicit disposables/cleanup guards for child processes and sockets. - Assert one session per workspace for MVP. - Redact or omit sensitive input payloads in action events. ### Estimated product code - **Approach B path:** **+260 to +420 LoC** - **Approach A fallback path:** **+220 to +340 LoC** --- ## Phase 3 — Right-sidebar Browser tab shell and layout integration **Owner:** Frontend/right-sidebar agent **Parallelizable with:** Phase 2 once contracts are stable enough to mock **Primary files:** - `src/browser/types/rightSidebar.ts` - `src/browser/features/RightSidebar/Tabs/registry.ts` - `src/browser/features/RightSidebar/Tabs/TabLabels.tsx` - `src/browser/features/RightSidebar/RightSidebar.tsx` - `src/browser/features/RightSidebar/BrowserTab.tsx` (new) - `src/browser/utils/rightSidebarLayout.ts` - `src/browser/utils/uiLayouts.ts` (only if layout presets need updating) ### Tasks 1. Add `"browser"` to the right-sidebar base tab model. 2. Register a `BROWSER_TAB_CONFIG` in the right-sidebar registry. 3. Add a `BrowserTabLabel` showing: - browser icon, - live/error state, - subtle activity indicator if a session is active. 4. Implement `BrowserTab.tsx` with the following UI regions: - header: title, URL, session status, backend type - main viewer region: live browser image/canvas - status strip: last action, ownership, errors - empty/error/install state 5. Keep frame rendering local to `BrowserTab`: - do not push raw frames into right-sidebar layout state, - do not rerender the entire sidebar on each frame. 6. Decide Browser tab visibility UX: - **recommended:** do not force it into every default layout, - automatically insert/select it when a browser session starts, - persist user layout choices afterward. 7. Ensure the tab participates cleanly in existing right-sidebar layout operations. ### UX requirements - The Browser tab should feel like a sibling of Terminal/Output/Debug, not a separate product. - Starting a browser-backed task should auto-focus or at least visibly surface the Browser tab. - Hiding the tab should not kill the browser session. - If the session ends, the tab should show a stable ended/error state rather than disappearing abruptly. ### Acceptance criteria - A browser session can appear in the right sidebar and remain visible while the agent works. - The rest of the app remains responsive while frames are arriving. - Layout persistence and tab switching still work. ### Defensive programming requirements - Validate viewer connection state transitions. - Clamp or ignore nonsensical frame metadata. - Ensure null session state renders cleanly without throwing. ### Estimated product code - **+220 to +320 LoC** --- ## Phase 4 — Viewer transport, frame rendering, and performance hardening **Owner:** Frontend performance/interaction agent **Parallelizable with:** late Phase 2 / late Phase 3 **Primary files:** - `src/browser/features/RightSidebar/BrowserTab.tsx` - optionally `src/browser/features/RightSidebar/BrowserViewer.tsx` (new, only if extraction materially reduces complexity) - optionally `src/common/types/browserSession.ts` for frame metadata types ### Tasks 1. Implement the viewer transport in the tab: - connect to the backend-provided viewer endpoint, - receive frames and metadata, - render without React-wide churn. 2. Start with the simplest correct renderer: - imperative `<img>` or canvas update loop, - only introduce a separate extracted viewer component if the code becomes hard to follow. 3. Add frame management: - keep only the latest frame, - drop stale queued frames, - optionally decode on `requestAnimationFrame`. 4. Handle viewer disconnects and reconnect states. 5. Show visible empty/loading overlays while the first frame is pending. 6. Preserve aspect ratio and pointer coordinate mapping data for future takeover. ### Acceptance criteria - The Browser tab can display a steady live viewport without noticeably degrading the rest of the right sidebar. - Frame delivery failure yields a recoverable error/reconnect state. - The renderer is not flooded with state updates from every frame. ### Estimated product code - **+140 to +240 LoC** --- ## Phase 5 — Action timeline and tool synchronization **Owner:** Tooling/instrumentation agent **Parallelizable with:** Phase 4 once the action model is stable **Primary files:** - `src/common/types/browserSession.ts` - `src/node/services/browserSessionService.ts` - backend adapter file from Phase 2 - `src/browser/features/RightSidebar/BrowserTab.tsx` - optionally `src/browser/features/RightSidebar/DevToolsTab/*` if cross-linking is added later ### Tasks 1. Define a compact `BrowserAction` model for user-facing steps: - `navigate` - `click` - `fill` - `type` - `scroll` - `snapshot` - `wait` - `error` 2. Emit these actions from the backend adapter in a way that does **not** depend on renderer-side log parsing. 3. Add a recent-action list to the Browser tab. 4. Redact sensitive values: - passwords, - secrets, - auth tokens, - vault-backed values. 5. Keep raw detailed logs in existing Output/Debug surfaces where appropriate, but do not block MVP on deep fusion with those tabs. ### Acceptance criteria - The user can understand “what the agent is doing” from the Browser tab itself. - Sensitive inputs are not displayed. - Browser action state stays consistent with session lifecycle state. ### Estimated product code - **+120 to +220 LoC** --- ## Phase 6 — Human takeover and collaboration controls (follow-up) **Owner:** Interaction/UX agent **Parallelizable with:** after Phases 2–5 stabilize **Primary files:** - `src/browser/features/RightSidebar/BrowserTab.tsx` - viewer transport/helper file if extracted - backend adapter file from Phase 2 - `src/common/types/browserSession.ts` - `src/common/orpc/schemas/api.ts` (only if extra control procedures are needed) ### Tasks 1. Add explicit **Take over** / **Return control** affordances. 2. When user takeover starts: - flip `ownership` from `agent` to `user`, - pause or gate agent input, - show a visible banner. 3. Translate pointer/keyboard events using frame metadata. 4. Prevent agent/user race conditions. 5. Add a timeout/release policy for abandoned user takeover. ### Acceptance criteria - The user can click/type into the live browser view when takeover is active. - Agent and user inputs never race silently. - Ownership state is always visible. ### Estimated product code - **+180 to +320 LoC** --- ## Phase 7 — Testing, stories, and rollout hardening **Owner:** QA/verification agent **Parallelizable with:** all later phases **Primary files:** - `tests/ipc/browserSession.test.ts` (new) - `tests/ui/browserTab.test.ts` (new) - colocated pure tests only if new pure helpers are extracted - `src/browser/stories/App.BrowserTab.stories.tsx` or the nearest existing full-app story file that should absorb the new states ### Test plan 1. **IPC/integration tests** (`tests/ipc`) - start/stop lifecycle - snapshot-first subscription behavior - workspace isolation - replacement/cleanup behavior - missing dependency error state 2. **UI integration tests** (`tests/ui`) - browser tab appears and renders idle/loading/error states - session start auto-surfaces the tab - recent actions/status text update correctly - app remains navigable while the tab is active 3. **Pure unit tests** (colocated) only for extracted pure helpers such as: - coordinate mapping, - frame metadata normalization, - action redaction. 4. **Storybook/full-app story** - idle state - live state with recent actions - ended/error/install-missing state 5. **Targeted e2e** (`tests/e2e`) only if happy-dom is insufficient for validating the viewer transport or takeover behavior. ### Validation commands - `make typecheck` - `make static-check` - targeted IPC/UI/e2e tests for touched areas ### Rollout posture - Ship behind an **experimental flag** or equivalent internal-only exposure first. - Keep the feature off by default until dogfooding is stable. - Log session start/stop/error paths with the repo’s `log` helper on the backend. ### Acceptance criteria - New code paths have targeted coverage. - Browser tab UI states are captured in stories. - Experimental rollout path is defined. ### Estimated product code - **+60 to +140 LoC** ## Cross-cutting design decisions the team should follow ### 1. Keep browser integration separate from generic MCP integration for the first delivery Mux already has `MCPServerManager`, but the first delivery should not try to unify every possible browser MCP server under one viewer abstraction. Build a Mux-owned browser session service first; if future MCP tools want to publish into it, add a bridge later. ### 2. Treat the Browser tab as a first-class right-sidebar resident The Browser tab should live beside Terminal, Output, and Debug; it should not open in a separate window for the MVP unless the spike proves the in-sidebar viewer is impossible. ### 3. Never persist raw frames Persist, at most, lightweight metadata and redacted action history. Raw image streams are too large and too risky to store casually. ### 4. Prefer local viewer transport over backend relaying If Electron/network policy allows it, the renderer should connect directly to the locally managed viewer socket. Only add a relay if direct connection is blocked or unsafe. ### 5. Avoid hook proliferation Colocate live viewer logic with `BrowserTab.tsx`. Extract only the pieces that are genuinely reusable or become too complex to read. ## Parallelization map for a team of agents | Workstream | Can start when | Suggested owner | | --- | --- | --- | | Phase 0 spike | immediately | backend/platform agent | | Phase 1 contracts/service shell | after Phase 0 decision is mostly clear | shared-contracts agent | | Phase 3 right-sidebar shell with mocked state | after Phase 1 type shape stabilizes | frontend/right-sidebar agent | | Phase 2 backend adapter | after Phase 0 backend choice | backend integration agent | | Phase 4 viewer transport | after Phase 2 returns a viewer endpoint contract | frontend performance agent | | Phase 5 action timeline | after Phase 2 emits structured actions | tooling/instrumentation agent | | Phase 7 tests/stories | begins with Phase 1 and expands as each phase lands | QA/verification agent | | Phase 6 takeover | after MVP is stable | interaction/UX agent | ## Dogfooding plan (required) ### Dogfooding principles to follow This plan should absorb the core discipline from the repo’s `dogfood` and `agent-browser` skills: - Treat dogfooding as **structured exploratory QA**, not a casual smoke test. - Use **repro-first evidence**: when something breaks, stop and document it immediately before moving on. - For **interactive/behavioral issues**, capture a **video plus step-by-step screenshots**. - For **static/visible-on-load issues**, capture a **single annotated screenshot** instead of wasting time on video. - Use **`agent-browser` directly, never `npx agent-browser`**. - Use **named sessions** so multiple agents can dogfood in parallel without stepping on each other. - Follow the core agent-browser loop: **open → wait → snapshot -i → interact → re-snapshot**. - After any navigation or major DOM change, **re-snapshot** before taking the next action. - Prefer **explicit waits** such as `wait --load networkidle` or element/url waits; only use sleeps to make repro videos human-watchable. - Check **console/errors** periodically; some regressions will not be visible in the viewport. - Append findings **incrementally** to a dogfood report so an interrupted run still leaves usable evidence. ### Dogfooding harness and setup Each dogfood run should create an isolated run ID, session name, and evidence directory. 1. Launch the normal local Mux development flow (`make dev` or the team’s standard desktop/Electron dev path). 2. Enable the experimental Browser tab feature flag. 3. Prepare a deterministic target site or sites: - at least one simple navigation target, - one form-interaction target, - one failure-path target if available. 4. Create an isolated output directory per run, for example: - `./dogfood-output/browser-tab/<run-id>/screenshots` - `./dogfood-output/browser-tab/<run-id>/videos` - `./dogfood-output/browser-tab/<run-id>/report.md` 5. Start a **named** agent-browser session for the target browsing workload. 6. If authentication is required, prefer one of these, in order: - saved session/profile/state, - auth vault, - one-time manual login with saved state. 7. Where feasible, constrain the run with: - a domain allowlist, - content boundaries, - a deterministic viewport/device preset. ### Recommended command pattern for browser-side dogfooding Use the `agent-browser` skill’s proven workflow for the site being driven inside the Browser tab. ```bash RUN_ID=browser-tab-<timestamp> SESSION=mux-browser-tab-${RUN_ID} OUT=./dogfood-output/browser-tab/${RUN_ID} mkdir -p ${OUT}/screenshots ${OUT}/videos agent-browser --session ${SESSION} open <target-url> && \ agent-browser --session ${SESSION} wait --load networkidle && \ agent-browser --session ${SESSION} screenshot --annotate ${OUT}/screenshots/initial.png && \ agent-browser --session ${SESSION} snapshot -i ``` For authenticated or recurring scenarios, prefer `--session-name`, `--profile`, or saved state so reruns are fast and reproducible. ### Structured dogfooding workflow #### 1. Initialize - Create the run directory and report file. - Start the named session. - Capture an initial annotated screenshot and interactive snapshot. - Record the initial Mux state showing whether the Browser tab is hidden, visible, empty, or already active. #### 2. Authenticate (if needed) - Authenticate once using a repeatable approach. - Save state if the scenario will be rerun. - Never expose raw credentials in artifacts or the report. #### 3. Orient - Map the top-level Mux workflow for this feature: - how a browser session starts, - how the Browser tab appears, - how the user sees status/action text, - how the session ends or errors. - Map the target site’s main interactive elements using `snapshot -i`. - Capture a baseline annotated screenshot of the target page and a screenshot of the Mux Browser tab. #### 4. Explore systematically Test the feature like a real user, page by page and workflow by workflow. At a minimum, cover: 1. **Session start / tab surfacing** - starting a browser-backed task creates or reuses the Browser tab, - the tab becomes visible enough that the user notices it, - the initial loading state is sane. 2. **Watch-only navigation** - navigate across multiple pages, - click links/buttons, - confirm the Browser tab stays live and visually synchronized. 3. **Form interaction + redaction** - fill inputs and submit a harmless form, - confirm the recent-action list matches what happened, - confirm sensitive values are redacted from visible action text and persisted artifacts. 4. **Layout and right-sidebar behavior** - resize the sidebar, - switch tabs away and back, - collapse/reopen if supported, - confirm the session survives UI movement. 5. **Interrupt / cleanup / replacement** - stop a run mid-session, - start another session in the same workspace, - switch workspaces if the product allows it, - confirm there are no orphaned or cross-wired sessions. 6. **Error and dependency handling** - test missing runtime / failed startup / disconnected viewer paths, - confirm the app shows a recoverable error state instead of crashing. 7. **Performance / backpressure** - keep the Browser tab open during a longer run, - confirm the rest of the right sidebar remains responsive. 8. **Takeover flow** (Phase 6 only) - take control, click/type, return control, - confirm ownership is explicit and agent/user inputs do not race. During exploration, use the agent-browser workflow rigorously: - `snapshot -i` before discovering refs, - interact via refs, - `wait --load networkidle` or element/url waits after major actions, - **re-snapshot** after navigation or DOM mutation, - check `errors` and `console` periodically, - optionally use `diff snapshot` when validating that an action changed the page as expected. ### Repro-first issue documentation rules When a bug is found, stop exploring and document it immediately. #### Interactive / behavioral issues Examples: wrong action log, frozen stream, mismatched viewport, takeover race, session cleanup bug, visible console error after an action. Required evidence: 1. Start a repro video **before** reproducing. 2. Reproduce at human pace. 3. Capture a screenshot for each significant step. 4. Pause on the broken state and capture an **annotated** screenshot. 5. Stop the video. 6. Append the issue to `report.md` immediately with: - issue ID (`ISSUE-001`, etc.), - severity, - exact repro steps, - expected result, - actual result, - screenshot/video filenames. When typing is part of the observable repro, prefer `type` over `fill` so the video is understandable. #### Static / visible-on-load issues Examples: clipped text, wrong icon/state, bad empty state copy, layout overlap, stale title/url, immediately visible console error. Required evidence: 1. Capture a single annotated screenshot. 2. Append a concise issue entry to `report.md` immediately. 3. Mark repro video as `N/A`. ### Evidence requirements per milestone For every milestone review, provide both broad milestone evidence and issue-specific evidence. #### Broad milestone evidence - **At least 2 screenshots**: - one of the Browser tab during a live session, - one of an ended/error/install-missing state. - **At least 1 short video** showing the agent actively browsing while the Browser tab is visible in Mux. #### Issue-specific evidence - Every reproducible interactive issue gets: - one repro video, - step-by-step screenshots, - one annotated result screenshot. - Every reproducible static issue gets: - one annotated screenshot. Where practical, capture both: - the **Mux-side evidence** (the Browser tab visible in the app), and - the **browser-side evidence** (agent-browser screenshots/video of the underlying session). ### Wrap-up procedure At the end of each dogfood run: 1. Re-read the report and make sure summary counts match the actual issue list. 2. Explicitly note whether the run found: - blocking issues, - moderate issues, - minor issues, - or no additional reproducible issues. 3. Close the named agent-browser session. 4. Preserve all artifacts; do not delete screenshots, videos, or reports mid-run. 5. Attach screenshots and the key video to the implementation handoff/review. ### Phase quality gates tied to dogfooding - **After Phase 0 spike:** one live-session screenshot, one short start/stop video, one note on runtime/install friction. - **Before Milestone M1 sign-off:** complete one structured exploratory run with evidence across start, navigation, form interaction, resize/tab switching, interrupt, and error handling. - **Before Milestone M2 sign-off:** complete one structured run specifically validating action-log fidelity and redaction behavior. - **Before Milestone M3 sign-off:** complete one structured run specifically validating takeover ownership, input arbitration, and recovery. ### Parallel-team guidance If multiple agents dogfood simultaneously: - each agent must use a unique session name, - each agent must write to a separate run directory, - each agent must append findings to its own report first, then merge findings into the shared review summary. Aim for the depth of coverage that would normally yield **5–10 well-documented findings’ worth of exploration**. If fewer issues are found, state explicitly that no additional reproducible issues were observed rather than inventing weak findings. ## Final milestone definitions ### Milestone M1 — Visible viewer MVP Includes Phases 0–4 and the Phase 7 test/story minimums. **Success means:** - a right-sidebar Browser tab exists, - a live browser session can be viewed there, - the UI stays stable, - lifecycle errors are recoverable. ### Milestone M2 — “See what the agent is doing” product pass Adds Phase 5 and expands verification. **Success means:** - the Browser tab shows live viewport + current/recent actions, - the user can correlate the visible browser with agent intent, - sensitive actions are redacted correctly. ### Milestone M3 — Human collaboration pass Adds Phase 6. **Success means:** - the user can safely take over and hand control back, - ownership is explicit, - sessions remain stable under collaboration. ## Recommended first implementation order 1. Phase 0 spike 2. Phase 1 shared contracts and ORPC shell 3. Phase 3 Browser tab shell using mocked/stubbed session state 4. Phase 2 real backend adapter and lifecycle 5. Phase 4 live viewer transport/perf hardening 6. Phase 5 action timeline sync 7. Phase 7 full verification/story coverage 8. Phase 6 takeover only after M1/M2 are solid ## What not to do in the first pass - Do not start with multi-session browser tabs. - Do not make this a separate popout-only window. - Do not route raw frame streams through generic chat/tool message rendering. - Do not block the feature on deep Debug/Output/MCP unification. - Do not attempt full browser replay/history storage. - Do not ship without screenshots/video from dogfooding. </details> --- _Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking: `xhigh` • Cost: `$52.11`_ <!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh costs=52.11 -->
🤖 refactor: auto-cleanup (#2942) ## Summary Periodic auto-cleanup: removes the dead `setPRStatusStoreInstance` export from `PRStatusStore.ts`. ## Background The function was exported but never imported or called anywhere in the codebase. The getter (`getPRStatusStoreInstance`) creates the singleton on demand; no test or production code ever needed to inject a custom instance via the setter. It is the only `set*StoreInstance` pattern across all stores, so there is no convention to maintain. ## Validation - `make typecheck` — passes - `make lint` — passes - `make fmt-check` — passes - `bun test src/browser/stores/PRStatusStore` — 12/12 pass - Grep confirms zero references outside the definition site Auto-cleanup checkpoint: ff743d1 --- _Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking: `xhigh` • Cost: `$0.00`_ <!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh costs=0.00 --> Co-authored-by: mux-bot[bot] <264182336+mux-bot[bot]@users.noreply.github.com>
PreviousNext