Skip to content

Tags: coder/mux

Tags

v0.22.1-nightly.24

Toggle v0.22.1-nightly.24's commit message

Verified

This commit was created on GitHub.com and signed with GitHub���s verified signature.
🤖 feat: add per-workspace heartbeat messages (#3105)

Summary
Add a per-workspace heartbeat message override to the existing Configure
heartbeat modal and use it when backend heartbeats run.

Background
Workspace heartbeats were already persisted per workspace, but they
always used a fixed default prompt body. This change keeps the existing
Workspace Heartbeats experiment gate while letting each workspace
optionally customize the heartbeat instruction body without affecting
other workspaces.

Implementation
- extended the workspace heartbeat schema/router payload with an
optional `message`
- normalized whitespace-only values to clear the override and fall back
to the default body
- updated `WorkspaceHeartbeatModal` and `useWorkspaceHeartbeat` to edit,
restore, and clear the workspace-scoped message
- composed heartbeat prompts from a fixed idle-duration lead-in plus
either the saved custom body or the default body
- preserved stored heartbeat messages when `/heartbeat` commands only
change cadence or disable the feature
- added focused modal, slash-command, and backend heartbeat tests

Validation
- `make static-check`
- `make test`
- `bun test src/node/services/heartbeatService.test.ts
src/browser/utils/chatCommands.test.ts
src/browser/components/WorkspaceHeartbeatModal/WorkspaceHeartbeatModal.test.tsx`
- Dogfooded the modal flow in `make dev-server-sandbox`, including
persistence across reopen and no cross-workspace bleed; captured
screenshots and a local WebM recording during the run

Risks
The main regression risk is compatibility between the modal flow,
backend heartbeat execution, and slash-command writes. The change stays
within the existing workspace heartbeat model and adds tests for
fallback/custom prompt composition plus message preservation across
`/heartbeat` updates.

---

<details>
<summary>📋 Implementation Plan</summary>

# Plan: per-workspace configurable heartbeat message

## Goal

Add a **per-workspace** configurable heartbeat message through the
existing **Configure heartbeat** modal, while keeping the existing
**Workspace Heartbeats** experiment as the feature gate only.

## Recommended approach

**Option A (selected): extend the existing workspace-scoped heartbeat
flow** — **net +70 to +110 product LoC**.

- Keep `Settings -> Experiments -> Workspace Heartbeats` as the on/off
gate.
- Do **not** add a global textbox in `ExperimentsSection`; that screen
is app-scoped and intentionally lacks selected-workspace context.
- Add an optional per-workspace heartbeat message field to the existing
heartbeat settings model and modal.
- Treat the configured value as the **instruction body** for the
heartbeat, not a raw full prompt template:
  - keep the fixed `[Heartbeat]` prefix
  - keep the computed idle-duration sentence
- append either the custom message body or the existing default
instruction body
- Blank/whitespace input clears the override and falls back to the
built-in default message.

### Scope decisions

- **In scope:** per-workspace persistence, workspace modal UI, backend
prompt selection, compatibility updates for existing `/heartbeat`
command paths.
- **Out of scope:** global experiment-level textbox,
placeholder/template syntax such as `{idleDuration}`, scheduling
changes, new docs pages.

<details>
<summary>Why this matches current repo patterns</summary>

- `src/common/orpc/schemas/workspace.ts` already stores heartbeat
settings as workspace metadata.
-
`src/browser/components/WorkspaceHeartbeatModal/WorkspaceHeartbeatModal.tsx`
is the existing per-workspace edit surface.
- `src/browser/features/Settings/Sections/ExperimentsSection.tsx`
supports nested controls for app-scoped experiments (for example,
configurable bind URL), but that pattern is a poor fit for
workspace-specific data because the settings route does not carry
selected-workspace context.
- `src/node/services/workspaceService.ts#executeHeartbeat()` currently
builds the heartbeat prompt in one place, so prompt selection can stay
centralized.
</details>

## Implementation plan

### Phase 1 — Extend the heartbeat settings shape and persistence

**Files / symbols**
- `src/common/orpc/schemas/workspace.ts`
  - `WorkspaceHeartbeatSettingsSchema`
- `src/common/orpc/schemas/api.ts`
  - `workspace.heartbeat.get`
  - `workspace.heartbeat.set`
- `src/node/orpc/router.ts`
  - `workspace.heartbeat.set`
- `src/node/services/workspaceService.ts`
  - `setHeartbeatSettings()`
  - `getHeartbeatSettings()`

**Changes**
- Add an optional `message` field to `WorkspaceHeartbeatSettingsSchema`
with a conservative max length (recommend **1000 chars** for v1).
- Thread that field through the heartbeat ORPC schema and router so
`input.message` reaches `setHeartbeatSettings()`.
- In `setHeartbeatSettings()`:
  - assert that `message` is either absent or a string
  - trim it
  - normalize empty/whitespace-only values to `undefined`
  - include `message` in the persisted object and in change detection
- Preserve current backward compatibility: old workspaces without
`message` continue to read as default/fallback behavior.

**Defensive-programming notes**
- Keep explicit assertions next to the existing `enabled` / `intervalMs`
assertions.
- Prefer a normalized local variable before constructing `nextSettings`
so changed detection and persistence compare the same shape.

**Quality gate**
- `make typecheck`
- targeted heartbeat schema/API tests if needed after the shape change
lands

### Phase 2 — Add the per-workspace message field to the existing modal

**Files / symbols**
-
`src/browser/components/WorkspaceHeartbeatModal/WorkspaceHeartbeatModal.tsx`
- `src/browser/hooks/useWorkspaceHeartbeat.ts`

**Changes**
- Extend the modal draft state with `draftMessage`.
- Re-sync the draft message on open/workspace-switch alongside `enabled`
and `intervalMs`.
- Add a simple styled `<textarea>` to the modal instead of introducing a
new shared component; keep the diff small.
- Show helper text such as: **“Leave empty to use the default heartbeat
message.”**
- Use the current default instruction body as placeholder/help copy
rather than pre-populating the field.
- Preserve the saved custom message even if the user temporarily
disables heartbeats.
- Pass `message` through `save()`; send `undefined` when the trimmed
draft is empty so clearing the field removes the override.

**UI behavior choice**
- Keep the existing heartbeat modal as the single edit surface for
workspace-scoped settings.
- Show the new message field **only when heartbeat is enabled in the
modal**, to mirror the user’s requested “toggle on -> reveal
configurable value” interaction, but do **not** clear the persisted
value when toggled off.

**Quality gate**
- local UI validation for required trim/max-length behavior
- component test for save + re-open behavior before moving on

### Phase 3 — Centralize prompt composition with fallback behavior

**Files / symbols**
- `src/node/services/workspaceService.ts`
  - `executeHeartbeat()`

**Changes**
- Fetch the saved heartbeat settings inside `executeHeartbeat()` using
the existing workspace-scoped settings path.
- Split the prompt into:
  1. a fixed preamble with `[Heartbeat]` and the computed idle duration
2. an instruction body that comes from `settings.message` when present,
otherwise the current built-in default body
- Keep `muxMetadata.displayStatus`, `synthetic: true`, and `requireIdle:
true` unchanged.

**Target shape**

```ts
const heartbeatLead = `[Heartbeat] This workspace has been idle for approximately ${idleDuration}.`;
const heartbeatBody = customMessage ?? DEFAULT_HEARTBEAT_BODY;
const heartbeatPrompt = `${heartbeatLead} ${heartbeatBody}`;
```

**Why this shape**
- preserves useful runtime context without introducing templating syntax
- avoids requiring the user to remember/include `idleDuration`
- keeps the fallback message unchanged for all existing workspaces

**Quality gate**
- targeted backend test proving both the custom-message and fallback
paths

### Phase 4 — Preserve custom message across secondary write paths

**Files / symbols**
- `src/browser/utils/chatCommands.ts`
  - `processSlashCommand()` heartbeat-set branch
- `src/browser/utils/chatCommands.test.ts`

**Changes**
- Update the `/heartbeat` command path so changing cadence or toggling
heartbeats off does **not** accidentally clear the saved custom message.
- Preserve the stored `message` the same way the current code already
preserves the stored interval when disabling heartbeats.
- Keep the API write semantics explicit at the callsite rather than
relying on hidden merge behavior.

**Why this matters**
- today `workspace.heartbeat.set()` is called from both the modal hook
and the slash-command path
- without a compatibility update, `/heartbeat 30` or `/heartbeat off`
would overwrite the saved heartbeat message

**Quality gate**
- targeted slash-command tests for both enable and disable flows

## Test plan

### Update existing tests
- `src/node/services/heartbeatService.test.ts`
  - extend the existing prompt assertions to cover:
    - custom message body is used when stored
    - fallback default body is used when no custom message exists
- `src/browser/utils/chatCommands.test.ts`
  - assert `/heartbeat off` preserves the stored `message`
- assert `/heartbeat <minutes>` preserves the stored `message` when only
cadence changes

### Add focused UI coverage
- Add
`src/browser/components/WorkspaceHeartbeatModal/WorkspaceHeartbeatModal.test.tsx`
  - renders the new message field when enabled
  - saves a custom message
  - reopens with the saved message restored
- clearing the field removes the override instead of persisting
whitespace

### Validation commands
- `make typecheck`
- `make lint`
- targeted `bun test` for the touched suites above
- `make test` before claiming completion if the targeted runs stay green
and runtime budget allows

## Acceptance criteria

- The heartbeat modal exposes a per-workspace message field behind the
existing workspace heartbeat flow.
- Saving the modal persists `message` alongside `enabled` and
`intervalMs` for that workspace only.
- Reopening the modal for the same workspace restores the saved message;
a different workspace does not inherit it.
- Clearing the message field removes the override and reverts to the
built-in default heartbeat body.
- Heartbeat execution uses the custom message body when present and the
current default body otherwise.
- Existing `/heartbeat` commands do not wipe the saved custom message.
- No new global experiment textbox is added.

## Dogfooding and review artifacts

### Setup
- Start an isolated sandbox with `make dev-server-sandbox`.
- Use the sandbox's printed Vite URL rather than your normal local mux
instance so the dogfood run stays isolated.
- Drive the sandboxed UI with `agent-browser` against that URL and
capture screenshots/video from that session.
- If sandbox seeding would make the test noisy, use
`DEV_SERVER_SANDBOX_ARGS="--clean-projects"` (and `--clean-providers`
too if you want a fully blank environment).

### Manual dogfood flow
1. Enable the **Workspace Heartbeats** experiment if it is not already
on.
2. Open a real workspace.
3. Open **Configure heartbeat** from the workspace UI.
4. Enable heartbeats and enter:
   - a valid interval
   - a custom heartbeat message
5. Save, reopen the modal, and confirm the message persists.
6. Switch to a second workspace and confirm it still shows the
default/empty message state.
7. Clear the custom message, save again, reopen, and confirm fallback
behavior.

### Artifact requirements
- Capture **at least 3 screenshots**:
  1. the modal showing the new message field
  2. the modal reopened with the persisted custom message
  3. a second workspace showing no cross-workspace bleed
- Capture **one short video** showing: open modal -> enter message ->
save -> reopen -> clear -> save again.
- If practical during the same session, capture one extra screenshot of
a heartbeat transcript/log path using the custom body; otherwise rely on
the targeted backend test for prompt verification and note that the live
scheduler minimum makes full end-to-end timing slower.

## Risks / watchpoints

- The router currently forwards only `enabled` and `intervalMs`;
forgetting to thread `message` there would silently drop the new field.
- `chatCommands.ts` is a compatibility trap because it already writes
heartbeat settings outside the modal.
- Avoid inventing placeholder/template syntax in v1; it adds UX and
validation surface area without being necessary for this request.

## Handoff note

If implementation starts later in Exec mode, keep the diff focused on
the files above and avoid adding a second settings surface unless
product requirements explicitly change toward a global default.

</details>

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` •
Cost: `$18.31`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=18.31
-->

v0.22.0

Toggle v0.22.0's commit message
release: v0.22.0

v0.21.1-nightly.90

Toggle v0.21.1-nightly.90's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 fix: cap expanded queued message height and make it scrollable (#3057)

## Summary

Cap the expanded queued message content at `40vh` and make it scroll on
overflow, so long queued messages no longer push the composer
off-screen.

## Background

When a queued message is expanded and its content exceeds viewport
height, the entire `ChatInputPane` grows unbounded—the Edit/Send now
buttons and the composer itself get displaced below the fold. This is
the same class of overflow that `AttachedReviewsPanel` already handles
with a viewport-relative height cap.

## Implementation

Wrapped `<UserMessageContent>` inside the expanded `QueuedMessageCard`
in an inner `<div className="max-h-[40vh] overflow-y-auto">` scroll
container. The action row (`Edit` / `Send now`) stays outside that
wrapper so controls remain always visible while the message body scrolls
independently.

---

_Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking:
`xhigh` • Cost: `$2.02`_

<!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh
costs=2.02 -->

v0.21.1-nightly.77

Toggle v0.21.1-nightly.77's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
chore: update icon styles (#3041)

v0.21.1-nightly.68

Toggle v0.21.1-nightly.68's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 fix: remove executeBash host-workspace escape hatch for devcontaine…

…rs (#3018)

## Summary
This follow-up to #2889 removes the temporary `executeBash`
`executionTarget` / `host-workspace` escape hatch now that devcontainer
lazy-start and passive runtime gating are in place, and aligns passive
branch/git metadata refreshes with that runtime-aware contract without
regressing the newer multi-project git status flow.

## Background
PR #2889 added lazy-start on demand and gated passive PR/fetch work so
stopped devcontainers stay asleep. The remaining `executionTarget` path
still let the browser ask the backend to run some git commands directly
against the host worktree, which a security scanner flagged as a
client-controlled isolation bypass. With passive runtime gating
available, that fallback is no longer needed.

While rebasing this follow-up onto current `main`, the new multi-project
git status path also required a small reconciliation so the
single-workspace runtime gating change kept the existing cached-state
behavior for offline runtimes.

## Implementation
- remove `executionTarget` from the `workspace.executeBash` schema,
router forwarding, and backend `WorkspaceService.executeBash()` path
- delete the BranchSelector mount-time `git rev-parse --abbrev-ref HEAD`
probe that would otherwise wake stopped devcontainers
- gate passive single-workspace git status refreshes on runtime
eligibility and reuse the existing one-shot retry so refresh resumes
when the runtime becomes `running`
- keep the current multi-project cached-state behavior while updating
tests and stories to match the new passive-refresh contract

## Validation
- `make static-check`
- `bun test src/node/services/workspaceService.test.ts --timeout 30000`
- `bun test src/browser/stores/GitStatusStore.test.ts --timeout 30000`
- `bun test src/browser/stores/PRStatusStore.test.ts --timeout 30000`

## Risks
The main regression risk is around passive branch/status freshness for
stopped devcontainers and the interaction between the single-project and
multi-project git status paths. The change does not affect explicit
user-triggered git operations, and the touched store/service paths have
focused regression coverage.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` •
Cost: `$N/A`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=N/A -->

v0.21.1-nightly.38

Toggle v0.21.1-nightly.38's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 refactor: reuse standard task report reminders (#2993)

## Summary
This follow-up to #2986 keeps the existing `awaiting_report` recovery
behavior but trims one special case: waiter-triggered recovery now
reuses the standard completion reminder instead of carrying a separate
waiter-only prompt variant.

## Background
PR #2986 introduced a few paths that can re-prompt an `awaiting_report`
task. The waiter-specific reminder text and enum branch were the odd
ones out, because they duplicated the same completion-tool guidance with
different copy. Reusing the normal reminder keeps the recovery path
while simplifying the state machine.

## Implementation
- waiter-triggered recovery still nudges `awaiting_report` tasks, but
now reuses the default completion reminder path
- dropped the now-unused `"waiter"` completion-recovery reason and
prompt string
- added a regression test that proves `waitForAgentReport()` no longer
emits distinct waiter-only reminder copy

## Validation
- `bun test src/node/services/taskService.test.ts`
- `make static-check`

## Risks
Low. This keeps the waiter-side recovery hook intact; it only removes
the separate waiter-only reminder variant.

---

_Generated with `mux` • Model: `openai:gpt-5.4` • Thinking: `xhigh` •
Cost: `$146.35`_

<!-- mux-attribution: model=openai:gpt-5.4 thinking=xhigh costs=146.35
-->

v0.21.1-nightly.21

Toggle v0.21.1-nightly.21's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 feat: vendor agent-browser for local execution (#2971)

## Summary

Vendor the `agent-browser` CLI as a runtime dependency so Mux can
resolve and spawn it without requiring a global install. Wire the
vendored binary through `BrowserSessionBackend`, shell wrappers, and
bash tool PATH enrichment so all three execution paths (internal browser
sessions, agent bash tools, child processes) use the same vendored
native binary.

Additionally, bridge agent `agent-browser` CLI sessions with the Browser
tab sidebar so they share a deterministic session name, and auto-start
the Browser tab monitoring when the experiment is enabled.

## Background

Previously, `agent-browser` was a `devDependency` only and every
execution path (`BrowserSessionBackend.spawn()`, agent bash tool calls)
relied on a globally installed `agent-browser` binary being on PATH.
This was fragile across:
- Packaged Electron builds (no global install available)
- Nix devshell PATH rewriting (`.mux/tool_env` replaces PATH entirely)
- Fresh user environments without a global install

## Implementation

### CLI Vendoring (core)
- Move `agent-browser` from `devDependencies` → `dependencies` so
packaged builds include it
- Add `asarUnpack` glob for native binaries so Electron `spawn()` works
outside ASAR
- New `agentBrowserLauncher.ts` resolves the platform-specific native
binary, handles ASAR path rewriting, and generates shell wrappers
- `BrowserSessionBackend` now uses `resolveAgentBrowserBinary()` instead
of bare `spawn("agent-browser")`
- `main.ts` materializes `~/.mux/bin/agent-browser` wrapper at startup
and prepends it to PATH
- `.mux/tool_env` restores vendored bin dir after nix PATH replacement

### Session Bridging (Browser tab)
- Deterministic session ID: `mux-<workspaceId>` shared between
`BrowserSessionBackend` and the wrapper
- New `MUX_BROWSER_SESSION` env var exported to bash tool processes;
wrapper auto-injects `--session` when set
- `BrowserSessionBackend.start()` detects existing daemon sessions and
attaches instead of navigating to about:blank
- `BrowserTab.tsx` auto-starts agent-owned monitoring when experiment is
enabled (no manual "Start" click)

## Validation

- `make typecheck` ✅
- `make lint` ✅
- `bun test agentBrowserLauncher.test.ts` — 8/8 pass
- `bun test initHook.test.ts` — pass
- `bun test browserSessionBackend.test.ts` — pass
- Live dogfood in dev-server-sandbox:
  - Agent bash tool `which agent-browser` → vendored path ✅
  - Agent `agent-browser open` + `screenshot` via bash tool ✅
  - Browser tab auto-starts with "Live · Agent owned" status ✅
  - `MUX_BROWSER_SESSION` env var reaches bash tool environment ✅
  - Wrapper session injection verified ✅

## Risks

- **Browser tab live frame**: The polling-based session monitoring
(`BrowserSessionBackend.refreshMetadata()`) doesn't yet detect the
agent's navigation in real-time. The infrastructure (deterministic
session ID, wrapper injection, auto-start) is in place, but the
daemon-level session sharing between the backend's `--json --session`
polling and the agent's CLI invocations needs further investigation.
This is a pre-existing limitation of the Browser tab experiment, not a
regression.
- **Remote runtime parity**: Vendored binaries only solve local
execution. SSH/remote runtimes still need a separate story.
- **Windows**: Wrapper generates `.cmd` files but hasn't been tested on
Windows.


---

<details>
<summary>📋 Implementation Plan</summary>

# Implementation Plan: Vendored `agent-browser` availability in Mux

## Goal

Make the repo-vendored `agent-browser` dependency usable across
**local** Mux execution paths without requiring a **global**
`agent-browser` install:

1. **Mux-internal browser sessions** (`BrowserSessionBackend`)
2. **Mux bash tool calls** where agents type `agent-browser ...`
3. **Other local child processes** that inherit Mux's PATH

### Non-goals for this change

- **SSH / remote runtime parity**: a vendored dependency inside the
local Electron app cannot automatically appear on a remote host. Remote
runtimes should remain explicitly out of scope unless we design a
separate proxy/sync story.
- **A generic vendored-CLI framework for every package**: implement the
pattern for `agent-browser` first; only generalize where it falls out
naturally.

---

## Verified facts driving the design

- `package.json` currently lists `agent-browser` in
**`devDependencies`** (`package.json:183`) and in `trustedDependencies`
(`package.json:319`).
- `src/node/services/browserSessionBackend.ts:353` currently does a bare
`spawn("agent-browser", ...)`, so it depends on the host PATH.
- `src/desktop/main.ts:11-18` currently fixes PATH on macOS via
`fix-path`, but does **not** add vendored CLI locations.
- `.mux/tool_env` is sourced before trusted bash tool calls, and its nix
path logic can **replace PATH entirely**, which means a parent-process
PATH prepend alone is not enough.
- `node_modules/.bin/agent-browser` exists and points to
`../agent-browser/bin/agent-browser.js`.
- `node_modules/agent-browser/bin/agent-browser.js` is only a thin Node
wrapper: it maps `process.platform` / `process.arch` to a native
filename like `agent-browser-darwin-arm64` or
`agent-browser-win32-x64.exe`, then spawns that native binary.
- The installed package already contains native binaries under
`node_modules/agent-browser/bin/` for the supported platforms in this
release.
- Because `node_modules/.bin/agent-browser` points to the JS wrapper,
**PATH-only exposure still depends on a host Node interpreter**.
- Electron's ASAR docs say `child_process.spawn` cannot execute binaries
inside an ASAR archive; unpacked executable paths are required for
packaged `spawn()` flows.
- electron-builder packaging currently bundles production dependencies,
not devDependencies, so packaged runtime support requires moving
`agent-browser` to `dependencies`.

<details>
<summary>Why the earlier “all three changes” idea is useful but needs
one important correction</summary>

The three earlier ideas all still make sense:

- **Direct internal launch** fixes Mux's own use of `agent-browser`.
- **`main.ts` PATH enrichment** helps most local subprocesses.
- **`.mux/tool_env` PATH enrichment** preserves CLI availability when
nix/tool env rewriting would otherwise drop it.

The correction is that Mux should treat the **native binary** as the
real runtime artifact, not the JS shim.

That changes the design in two important ways:

- for **Mux-internal browser sessions**, the right move is to resolve
and spawn the vendored **native executable** directly, bypassing both
PATH and host `node`
- for **shell contexts**, Mux should expose a stable `agent-browser`
command that points at that resolved native executable; raw
`node_modules/.bin/agent-browser` can remain a development convenience,
but it is not the robust packaged solution

So the comprehensive plan should center on **native-binary resolution +
PATH exposure**, with raw `.bin` access treated as secondary.
</details>

---

## Recommended implementation approaches

### Approach A — **Direct native vendoring** (recommended)
**Net product LoC estimate:** **+80 to +150** (excludes tests, docs,
`package.json` edits)

This approach fully removes the need for a **global `agent-browser`
install** for local Mux usage, and also avoids a host `node` dependency
in the main local execution paths.

**Core idea:**
- Ship `agent-browser` as a runtime dependency.
- Resolve the vendored **platform-specific native binary** directly from
the installed package.
- Add a small Mux helper that rewrites packaged paths to the unpacked
executable location when needed.
- Materialize a small wrapper in a Mux-managed bin directory (for
bash/tool usage) that forwards to the resolved native binary.
- Prepend that wrapper dir to PATH in both the app process and
`.mux/tool_env`.
- Keep project `node_modules/.bin` on PATH where useful for development
ergonomics, but do not rely on it for packaged correctness.

### Approach B — **Minimal PATH-only vendoring** (fallback / dev-only
shortcut)
**Net product LoC estimate:** **+25 to +50** (excludes tests, docs,
`package.json` edits)

This approach is smaller, but only fully works when the host already has
a usable `node` on PATH.

**Core idea:**
- Move `agent-browser` to `dependencies`.
- Stop using a global install for `BrowserSessionBackend`.
- Prepend the app's `node_modules/.bin` / project `node_modules/.bin` to
PATH in `main.ts` and `.mux/tool_env`.

**Why this is not the main recommendation:**
- packaged users can still fail if they do not have host `node`
- bash/tool shell use still depends on the JS shim finding `node`
- packaged `spawn()` flows still need unpacked native executables anyway

**Decision:** implement **Approach A**, while preserving the useful PATH
improvements from the earlier three-part idea.

---

## Detailed plan for Approach A

## 1) Make `agent-browser` a runtime-shipped dependency

### Files
- `package.json`
- lockfile (as generated by bun)

### Work
- Move `agent-browser` from `devDependencies` to `dependencies`.
- Leave `trustedDependencies` intact unless the install flow proves it
can be simplified.

### Why
- electron-builder packages runtime dependencies, not devDependencies.
- Without this, packaged builds can never rely on the vendored copy.

### Acceptance notes
- Development still works unchanged.
- Packaged builds now have the package available for runtime resolution.

### Size
- **Product LoC:** **+0** (config-only)

---

## 2) Add an `agent-browser` native binary resolver

### Files
- **New:** `src/node/services/agentBrowserLauncher.ts` (or similarly
named agent-browser-specific helper)
- `src/node/services/browserSessionBackend.ts`
- Potentially `src/common/constants/` for wrapper-dir naming if the
string appears in multiple places

### Work
Introduce a small helper that resolves the absolute native executable
path for the current platform.

Responsibilities:
- resolve the installed package root via
`require.resolve("agent-browser/package.json")` or an equivalent
package-root lookup
- mirror the upstream wrapper's platform/arch mapping to compute the
native filename:
  - `darwin` / `linux` / `win32`
  - `x64` / `arm64`
  - `.exe` suffix on Windows
- build the absolute path to
`node_modules/agent-browser/bin/<platform-binary>`
- when running from a packaged app, rewrite the resolved path to the
unpacked location (for example from `app.asar/...` to
`app.asar.unpacked/...`) so `spawn()` targets a real executable on disk
- expose a stable env var for shell consumers (for example
`MUX_VENDORED_BIN_DIR`) so `.mux/tool_env` does not need to guess
platform-specific config-root paths

### Why
- bypasses both PATH lookup and host `node` for Mux's own
browser-session feature
- matches the actual package layout: the native binary is the real
runtime artifact, while the JS file is only a selector shim
- gives one canonical place for platform mapping, packaged-path
rewriting, and error reporting

### Defensive-programming requirements
- assert the resolved package root is non-empty and absolute
- assert the platform/arch mapping is supported before building the
filename
- assert the final binary path exists; on POSIX, also ensure it is
executable or can be chmodded similarly to the upstream wrapper
- provide distinct error messages for unsupported platform, missing
vendored package, and missing/unpacked binary

### Size
- **Product LoC:** **+25 to +45**

---

## 3) Add shell-wrapper generation helpers around the resolved native
binary

### Files
- `src/node/services/agentBrowserLauncher.ts`
- `src/desktop/main.ts`
- Potentially `src/common/constants/` for wrapper-dir naming if the
string appears in multiple places

### Work
Alongside the resolver, expose helper(s) that compute the Mux-managed
wrapper dir and render wrapper contents from the same canonical binary
path.

Suggested responsibilities:
- compute the Mux-managed bin/wrapper directory (prefer
`config.rootDir/bin` or a similarly stable Mux-owned directory)
- expose a stable env var for shell consumers (for example
`MUX_VENDORED_BIN_DIR`) so `.mux/tool_env` does not need to guess
platform-specific config-root paths
- generate a POSIX `agent-browser` wrapper that `exec`s the absolute
native binary path
- generate a Windows `agent-browser.cmd` (or equivalent) wrapper that
forwards arguments to the resolved `.exe`
- keep wrapper contents derived from the same resolver used by
`BrowserSessionBackend`, so there is one source of truth for path
computation

### Why a helper is worthwhile
- keeps `browserSessionBackend.ts`, `main.ts`, and `.mux/tool_env`
aligned
- avoids duplicating platform mapping and packaged-path logic across
multiple call sites
- gives one place to centralize assertions, stale-wrapper rewrites, and
path drift handling after upgrades

### Defensive-programming requirements
- assert wrapper dir paths are absolute and non-empty
- assert wrapper targets are absolute native binary paths, never bare
command names
- rewrite wrappers whenever the computed target changes instead of
trusting stale contents
- provide a clear local error string when wrapper generation cannot be
prepared

### Size
- **Product LoC:** **+20 to +35**

---

## 4) Switch `BrowserSessionBackend` off PATH lookup

### Files
- `src/node/services/browserSessionBackend.ts`

### Work
Replace the bare:

```ts
spawn("agent-browser", ["--json", "--session", this.sessionId, ...args], ...)
```

with the resolved native-binary helper from steps 2–3, so browser
sessions no longer depend on:
- a global install
- PATH resolution
- the host shell being configured correctly

Keep the existing timeout, stdout/stderr parsing, and JSON handling
behavior unchanged.

### Error handling updates
- replace the current missing-binary guidance (`install it with: bun
install -g ...`)
- new messaging should distinguish between:
- vendored native binary missing, unsupported, or not unpacked correctly
  - unexpected spawn/runtime failure once the binary has been resolved
- in dev checkouts, the remediation can mention `bun install`
- in packaged apps, the remediation should prefer “reinstall/update Mux”
rather than asking for a global install

### Why
This is the highest-value change: it makes the actual product feature
self-contained first.

### Size
- **Product LoC:** **+10 to +20**

---

## 5) Materialize a Mux-managed shell wrapper for `agent-browser`

### Files
- `src/desktop/main.ts`
- likely `src/node/services/agentBrowserLauncher.ts` (shared
wrapper-content builder)
- potentially a tiny file utility if an existing atomic-write helper is
not already available

### Work
Write a small wrapper into a Mux-owned bin directory such as:
- `config.rootDir/bin/agent-browser` on POSIX
- `config.rootDir/bin/agent-browser.cmd` (and/or `.bat`) on Windows

Wrapper behavior:
- point at the resolved native binary from steps 2–3, not the JS shim in
`node_modules/.bin`
- on POSIX, `exec` the absolute native binary path and forward `"$@"`
- on Windows, invoke the resolved `.exe` and forward `%*`
- keep the command name stable as `agent-browser` even though the
underlying vendored binary is platform-suffixed

### Lifecycle strategy
Use a **self-healing** model:
- best-effort creation/update during app startup
- always safe to overwrite when paths drift after upgrades or dev
rebuilds
- if startup creation fails, do **not** crash app startup; log and allow
later use-sites to surface a targeted runtime error

### Why
This is the piece that makes `agent-browser` available to **shell
contexts** without requiring a global install.

### Defensive-programming requirements
- atomically rewrite wrappers when contents differ
- ensure POSIX wrappers get executable bits
- on Windows, generate the wrapper format the platform actually resolves
- keep the wrapper generation idempotent

### Size
- **Product LoC:** **+25 to +45**

---

## 6) Prepend the Mux-managed wrapper dir to PATH for local app
subprocesses

### Files
- `src/desktop/main.ts`

### Work
After the existing macOS `fix-path` block, prepend the Mux-managed
wrapper dir to `process.env.PATH` using `path.delimiter`.
Also publish the resolved directory into `process.env` (for example
`MUX_VENDORED_BIN_DIR`) so `.mux/tool_env` and any other child-process
setup can reuse the exact same location.

Recommended ordering:
1. **Mux-managed wrapper dir** (`~/.mux/bin` via config root)
2. existing/inherited PATH from `fix-path` / OS

Optional but useful extension:
- also prepend the app's own vendored `node_modules/.bin` in
**development** so raw vendored CLIs remain easy to discover during
local repo development
- do **not** rely on that raw bin as the primary packaged solution; the
wrapper is the robust path

### Why
- ensures most local child processes inherit a working `agent-browser`
command
- covers untrusted projects too, since `.mux/tool_env` is only sourced
for trusted repos
- complements the direct `BrowserSessionBackend` launch instead of
replacing it

### Startup constraint
Per repo guidance, startup-time init must not crash the app:
- wrapper creation / PATH enrichment should be wrapped in try/catch
- failures should degrade to debug logging + later runtime error
surfaces

### Size
- **Product LoC:** **+10 to +20**

---

## 7) Preserve `agent-browser` inside `.mux/tool_env`, even when nix
rewrites PATH

### Files
- `.mux/tool_env`

### Work
Update `.mux/tool_env` so that after it determines the **base PATH**
(nix-derived or inherited), it **prepends**:
1. the Mux-managed wrapper dir from a stable exported env var (for
example `${MUX_VENDORED_BIN_DIR}`), with `~/.mux/bin` only as an
explicit fallback if the repo still standardizes on that path
2. the project-local `node_modules/.bin` when
`${MUX_PROJECT_PATH}/node_modules/.bin` exists

### Important ordering requirement
Do **not** prepend these dirs only before the nix helper runs.

Because `.mux/tool_env` can replace PATH with a cached nix PATH, the
wrapper/project bin prepend must happen **after** the final base PATH is
known, or be factored into the code path that exports the final PATH
value.

### Why keep both wrapper dir and project `node_modules/.bin`
- the **wrapper dir** is the robust local command for
packaged/no-global-install cases
- the **project `node_modules/.bin`** keeps local repo development
ergonomics good and matches the user's original idea

### Scope note
This only solves **local** bash-tool usage. Remote/SSH runtimes would
still need a separate design.

### Size
- **Product LoC:** **+10 to +20**

---

## 8) Explicitly unpack vendored `agent-browser` native binaries for
packaged builds

### Files
- `package.json` build config
- packaged-path logic in `src/node/services/agentBrowserLauncher.ts`

### Work
Add a narrow packaging rule so the vendored native executable is
available outside the ASAR archive in packaged apps.

Recommended shape:
- extend `asarUnpack` with a focused pattern covering the vendored
agent-browser native binaries, such as
`**/node_modules/agent-browser/bin/agent-browser*` or the smallest
equivalent glob that reliably includes the current-platform executable
- make the resolver from step 2 translate packaged paths to their
unpacked location when Mux is running from `app.asar`

### Why
- Electron's ASAR docs state that `child_process.spawn` cannot execute
binaries inside ASAR archives
- `BrowserSessionBackend` currently uses `spawn`, so packaged
correctness requires a real executable path on disk
- this is a native-executable packaging concern, not a generic JS-module
concern

### Explicit validation gate
During packaged smoke testing, verify that:
- the resolved path points at an unpacked executable, not a path inside
`app.asar`
- `BrowserSessionBackend` can spawn the vendored binary successfully in
the packaged app
- the Mux-managed shell wrapper still resolves to the same unpacked
executable path

### Fallback if the first `asarUnpack` glob is too broad or too narrow
Use the smallest rule that:
- includes the current-platform executable
- avoids unpacking unrelated dependency trees
- keeps the resolver logic simple and deterministic

### Size
- **Product LoC:** **+0 to +5** (config-only unless packaged-path helper
needs a tiny extension)

---

## Tests and validation plan

## Automated coverage

### 1) New unit tests for launcher/wrapper logic
**Likely new test file:**
`src/node/services/agentBrowserLauncher.test.ts`

Cover:
- native binary path resolution for supported `platform` / `arch`
combinations
- packaged-path rewriting from `app.asar/...` to the unpacked executable
location
- wrapper content generation for POSIX and Windows around the resolved
native binary
- stale-wrapper rewrite detection / idempotent update behavior
- clear failure when the vendored native binary is missing, unsupported,
or not unpacked correctly

Use `src/node/services/desktop/PortableDesktopSession.ts` and
`PortableDesktopSession.test.ts` as the pattern reference for:
- PATH-vs-fallback binary logic
- wrapper script testing
- cross-platform launcher expectations

### 2) Update / add browser session backend tests
**Likely file:** `src/node/services/browserSessionBackend.test.ts`
(create if absent)

Cover:
- `BrowserSessionBackend` no longer calls bare `agent-browser`
- `spawn()` receives the resolved native binary path rather than a
PATH-dependent command name
- timeout / invalid JSON behavior stays intact
- missing-binary errors become actionable and no longer mention global
install as the primary fix

### 3) Test PATH composition in a pure helper where possible
If `main.ts` startup logic is too awkward to test directly, extract the
PATH-composition / wrapper-dir calculation into a small helper and unit
test that helper instead of over-testing Electron startup code.

### 4) `.mux/tool_env` validation
Prefer a lightweight smoke/integration check over brittle shell-script
golden tests unless an existing harness makes this easy.

At minimum, validate that a trusted local workspace with nix path
rewriting still resolves `agent-browser` after the change.

---

## Dedicated dogfooding plan

Dogfooding is required for this change because it affects both a
**CLI/runtime path** and a **Mux UI-backed workflow**. The dogfooding
pass should follow the spirit of the `dogfood` skill: create a small
evidence bundle, exercise the change like a real user, and leave behind
reviewer-friendly proof.

### Dogfooding evidence bundle (required)

Create a dedicated output directory such as
`./dogfood-output/cli-vendoring/` with:
- `report.md` — scenario-by-scenario notes, commands/prompts used,
expected vs actual results, and links to artifacts
- `screenshots/` — annotated screenshots for key setup and result states
- `videos/` — short `.webm` or desktop screencast recordings for each
interactive flow

Reviewer handoff requirements:
- attach representative screenshots with `attach_file`
- attach at least one video per interactive flow with `attach_file` when
the final verification write-up is produced
- include the exact scenario steps in the report so a reviewer can
replay them without guessing

### Dogfooding setup

1. **Prove vendoring, not host-global fallback.**
- Run the dogfood flows in an environment where a global `agent-browser`
is absent from the effective PATH, or otherwise prove that `command -v
agent-browser` resolves to the Mux-managed wrapper rather than a
host-global install.
- Keep normal repo dependencies installed so the vendored package is
present.

2. **Use the `dev-server-sandbox` skill for isolated Mux UI
verification.**
- Start an isolated backend/web instance with `make dev-server-sandbox`.
- Record the emitted `BACKEND_PORT`, `VITE_PORT`, and sandbox `MUX_ROOT`
in the dogfood report.
- Use `KEEP_SANDBOX=1` if preserving the sandbox root helps post-failure
debugging.

3. **Use `agent-browser` directly, never `npx agent-browser`.**
- This matches both the `dogfood` and `agent-browser` skills and ensures
the test path exercises the vendored fast/native CLI path.

4. **Capture proof continuously, not at the end.**
- For each interactive scenario, start video recording before
reproducing the flow.
- Take an annotated screenshot at the initial state and the final result
state.
- Append notes to `report.md` immediately after each scenario rather
than batching findings later.

### Dogfooding scenario A — direct vendored CLI availability

**Goal:** prove local shell usage works without a global install and
without routing through `npx`.

Suggested flow:
1. Start a short desktop/terminal recording before the first command.
2. Run `command -v agent-browser` and `agent-browser --help`; record the
observed command path/output in `report.md`.
3. Exercise a real browser command with the direct CLI, for example:
- `agent-browser --session mux-cli-vendoring open https://example.com`
   - `agent-browser --session mux-cli-vendoring wait --load networkidle`
   - `agent-browser --session mux-cli-vendoring snapshot -i`
- `agent-browser --session mux-cli-vendoring screenshot --annotate
./dogfood-output/cli-vendoring/screenshots/cli-direct-success.png`
4. Stop the recording and save the artifact under `videos/`.
5. Close the session and log the exact steps/result in `report.md`.

**Required proof:**
- one terminal/desktop video showing command invocation
- one annotated screenshot from the browser session
- the resolved command path and command output copied into the dogfood
report

### Dogfooding scenario B — Mux bash tool availability

**Goal:** prove Mux bash tools inherit a working vendored
`agent-browser` command.

Suggested flow:
1. Start `make dev-server-sandbox` and note the UI URL in the report.
2. Use `agent-browser` to open the sandboxed Mux UI directly:
   - `agent-browser open http://127.0.0.1:<VITE_PORT>`
   - `agent-browser wait --load networkidle`
- `agent-browser screenshot --annotate
./dogfood-output/cli-vendoring/screenshots/mux-initial.png`
3. Use the core `agent-browser` workflow while navigating Mux:
   - `snapshot -i` to identify controls
- interact with refs to enter a prompt that triggers a bash tool call
such as `command -v agent-browser && agent-browser --help`
   - re-snapshot after each UI change
- use `diff snapshot` after the tool run if it helps confirm the
expected state change in the UI
4. Start a video before submitting the prompt and stop it after the
successful tool output is visible.
5. Capture an annotated screenshot of the successful bash-tool result
and save the prompt/output summary into `report.md`.

**Required proof:**
- one UI video of the prompt-to-tool-result flow
- annotated screenshots of the Mux UI before and after the tool result
- the exact prompt used and the relevant tool output in `report.md`

### Dogfooding scenario C — BrowserSessionBackend / Browser tab workflow

**Goal:** prove Mux's internal browser-session feature resolves and
launches the vendored native binary rather than relying on a global PATH
entry.

Suggested flow:
1. In the sandboxed or local dev instance, navigate to the
Browser-related UI/workflow that triggers `BrowserSessionBackend`.
2. Start a video before the action that launches the browser session.
3. Use `agent-browser` against the Mux UI to drive the workflow,
re-snapshotting after each navigation or DOM change.
4. Capture:
   - an annotated screenshot before launch
- an annotated screenshot showing the live browser session / Browser tab
after launch succeeds
   - any visible status text or UI state that proves the session is live
5. Stop the recording and save the exact steps/result in `report.md`.

**Required proof:**
- one video of the browser-session launch flow
- annotated screenshots before/after launch
- a short note in the report confirming this was run without relying on
a global `agent-browser`

### Dogfooding scenario D — packaged-build regression pass

**Goal:** prove the packaged app can find and spawn the unpacked
vendored native binary.

Suggested flow:
1. Build a local packaged app.
2. Launch it in an environment where global `agent-browser` is
unavailable or clearly not the resolved command.
3. Repeat at least scenario A (direct CLI wrapper exposure, if
applicable) and scenario C (browser-session launch).
4. Capture a desktop video and annotated screenshots of the packaged
flow.
5. Record the observed wrapper/binary resolution behavior in
`report.md`, including whether the resolved path points at the unpacked
packaged location.

**Required proof:**
- one packaged-app video
- one packaged-app annotated screenshot
- a report note confirming packaged resolution succeeded without a
global install

### Dogfooding scenario E — PATH edge cases

**Goal:** prove the PATH story holds across the environments this change
explicitly targets.

Run a focused check for each of these and capture at least one
screenshot/video pair for the most failure-prone cases:
- **trusted repo with `.mux/tool_env` / nix path rewriting** — verify
`agent-browser` still resolves after tool-env PATH rewriting
- **untrusted repo** — verify inherited app PATH still exposes the
Mux-managed wrapper even when `.mux/tool_env` is not sourced
- **Windows sanity pass** — verify wrapper filename/format resolution,
`path.delimiter` handling, and no accidental POSIX-only assumptions

### Dogfooding method notes from the skills

- From `dogfood`: document each scenario as you go, and collect
screenshots/videos before moving on.
- From `agent-browser`: use the core loop of `open` → `snapshot -i` →
interact via `@eN` refs → re-snapshot after page changes.
- From `agent-browser`: use `screenshot --annotate` for
reviewer-friendly evidence, and `diff snapshot` when it helps prove a UI
state change happened.
- From `dev-server-sandbox`: prefer an isolated sandbox instance over
reusing your default dev root so the verification environment is
reproducible and easier to debug.

---

## Acceptance criteria

- `BrowserSessionBackend` no longer depends on `spawn("agent-browser",
...)`.
- A **local** Mux install can use browser sessions without a global
`agent-browser` install.
- Mux bash tools can resolve `agent-browser` in local workspaces through
a Mux-managed PATH entry.
- nix/tool_env PATH rewriting does not remove `agent-browser`
availability.
- Packaged builds ship the dependency because it is no longer in
`devDependencies`.
- Packaged builds can resolve and spawn an unpacked vendored native
`agent-browser` executable.
- User-facing errors stop instructing users to globally install
`agent-browser` as the default remedy.
- Startup remains resilient: wrapper/native-binary setup failures do not
crash app launch.
- The dedicated dogfooding pass is completed for the relevant scenarios
(direct CLI, Mux bash tool flow, Browser session flow, packaged
regression where available).
- Dogfooding artifacts include a reviewer-readable `report.md`,
annotated screenshots, and video recordings for the interactive flows.
- The final verification handoff attaches representative
screenshots/videos with `attach_file` so reviewers can audit what was
tested.

---

## Risks / watchpoints

- **Remote runtime ambiguity:** local vendoring does not solve SSH-host
execution.
- **Packaged-path drift:** wrappers that embed old app paths must be
regenerated automatically.
- **ASAR alignment risk:** the `asarUnpack` glob and the packaged-path
rewrite must stay aligned so the resolver never points `spawn()` at a
path still inside `app.asar`.
- **Windows wrapper semantics:** plan for both shell and spawn
realities; do not assume the POSIX wrapper alone is enough.
- **Over-generalization risk:** keep the first version
agent-browser-specific unless a generic abstraction is obviously
justified by the implementation.

---

## Suggested implementation order

1. Move `agent-browser` to `dependencies`.
2. Add the native-binary resolver and packaged-path rewrite helper.
3. Add the shell-wrapper generation helper.
4. Switch `BrowserSessionBackend` to the resolved native binary.
5. Add `asarUnpack` coverage for the vendored native binary.
6. Add wrapper materialization and PATH enrichment in `main.ts`.
7. Update `.mux/tool_env` so nix PATH rewriting preserves the wrapper /
local `.bin` entries.
8. Update error strings and any agent-browser usage docs/skill text that
still imply global install.
9. Run automated validation, then execute the dedicated dogfooding plan
above and attach the resulting evidence.

---

## Validation commands to run during implementation

- `make lint`
- `make typecheck`
- targeted tests for the new launcher/wrapper and browser-session code
- `make test` if touched areas do not have sufficiently narrow coverage
- `make dev-server-sandbox` for the isolated UI-backed dogfood run
- one local packaged smoke pass (`make dist` or equivalent local
packaging workflow on the current platform)
- the dedicated dogfooding evidence capture described above (`report.md`
+ screenshots + videos)

---

## Optional follow-up (only after this lands cleanly)

- extract the resolver/wrapper machinery into a small reusable “vendored
native CLI” helper if another dependency needs the same treatment
- design a separate remote-runtime story if `agent-browser` must become
available in SSH workspaces too

</details>

---
_Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking:
`xhigh` • Cost: `$27.73`_

<!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh
costs=27.73 -->

---------

Signed-off-by: Thomas Kosiewski <tk@coder.com>

v0.21.1-nightly.7

Toggle v0.21.1-nightly.7's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 feat: add browser sidebar tab for live agent-browser viewing (#2951)

## Summary

Add a **Browser tab** to Mux's right sidebar that shows what
agent-browser is doing in real time — live screenshots, URL/title
tracking, action timeline, and session lifecycle controls.

## Background

When an agent uses `agent-browser` to automate web pages, users
currently have no visibility into what's happening. This feature adds a
workspace-scoped Browser tab in the right sidebar so users can watch the
agent navigate, see which page it's on, and understand the sequence of
actions — without leaving the Mux interface.

## Implementation

### Architecture
- **CLI-backed approach** — agent-browser has no library API, so we
manage it as a subprocess
- **ORPC for all transport** — screenshot polling (~2s) fits the
existing event-stream model; no separate WebSocket needed
- **One session per workspace** — enforced by the service layer

### Backend (4 new files + 6 modified)
- `src/common/types/browserSession.ts` — shared types (`BrowserSession`,
`BrowserAction`, `BrowserSessionEvent`)
- `src/common/orpc/schemas/api.ts` — Zod schemas for `getActive`,
`start`, `stop`, `subscribe`
- `src/node/services/browserSessionBackend.ts` — CLI adapter managing
the agent-browser subprocess, screenshot polling, metadata extraction,
external-close detection
- `src/node/services/browserSessionService.ts` — EventEmitter service
with workspace-scoped events (mirrors DevToolsService pattern)
- ORPC routes with async-generator subscription (snapshot-first,
queue+resolveNext)
- Service container wiring + disposal

### Frontend (4 new files + 5 modified)
- `src/browser/features/RightSidebar/BrowserTab/BrowserTab.tsx` — main
tab component with idle/starting/live/error/ended states, screenshot
viewer, action timeline, Start/Stop/Restart controls
-
`src/browser/features/RightSidebar/BrowserTab/useBrowserSessionSubscription.ts`
— ORPC subscription hook (mirrors `useDevToolsSubscription`)
- Tab type registration, config, label in the right-sidebar
infrastructure
- Layout migration so existing workspaces get the Browser tab

### Key design decisions
- **`sharp` lazy-loaded** — prevents Bun test env crashes from the
native dependency
- **External close detection** — if URL transitions from a real page to
`about:blank`, the backend infers the browser was closed externally and
surfaces an error
- **Session controls** — Start/Restart (idle/ended/error states) and
Stop (live/starting states) are mutually exclusive header buttons

## Validation

- `make static-check` passes (typecheck + lint + shellcheck + prettier +
docs)
- `bun test src/browser/utils/rightSidebarLayout.test.ts` — 23/23 pass
- `bun test src/cli/cli.test.ts src/cli/server.test.ts` — 28/28 pass
- **3 full dogfood runs** with agent-browser driving the Mux frontend:
1. Initial dogfood: found 3 issues (button truncation, missing stop
button, no close detection)
  2. Fix verification dogfood: confirmed all 3 fixes
3. Final comprehensive dogfood: all 6 test scenarios passed (idle state,
session start, navigation, tab switching, stop button, external close
detection, restart cycle)

## Risks

- **`sharp` in production Electron** — works in Node.js but may need
packaging attention; fallback to raw PNG if unavailable
- **Poll-based screenshots** — 2-second interval means the viewer lags
slightly behind real-time; acceptable for MVP
- **Single-session limit** — only one browser session per workspace;
multi-session is a deliberate follow-up

---

<details>
<summary>📋 Implementation Plan</summary>

# Integrate agent-browser into Mux right sidebar — implementation plan

## Objective

Ship a **workspace-scoped Browser tab** in Mux’s right sidebar so users
can watch an agent-driven browser session live, understand the agent’s
current step, and eventually take over input when needed.

The implementation should:
- feel native to Mux’s existing right-sidebar/tab model,
- preserve existing tool/chat UX,
- avoid risky DOM embedding of arbitrary pages,
- scale from a fast MVP to a first-class browser tool/session model.

## Verified repo and product constraints

### agent-browser capabilities confirmed from official research
- agent-browser can expose a **live browser stream over WebSocket**.
- The stream carries **base64 JPEG frames + viewport metadata** and
accepts **mouse/keyboard/touch input** back over the socket.
- agent-browser can also connect to an existing browser via **CDP**.
- There is a documented **programmatic BrowserManager API** as well as a
CLI path.
- It supports recording sessions, which is useful for dogfooding/review
artifacts.

### Relevant Mux integration points already in the repo
- Right sidebar container:
`src/browser/features/RightSidebar/RightSidebar.tsx`
- Right-sidebar tab types: `src/browser/types/rightSidebar.ts`
- Right-sidebar tab registry/config:
`src/browser/features/RightSidebar/Tabs/registry.ts`
- Existing live session/tab reference:
`src/browser/features/RightSidebar/TerminalTab.tsx`
- Existing live debug subscription pattern:
`src/browser/features/RightSidebar/DevToolsTab/useDevToolsSubscription.ts`
- Shared cross-boundary types convention: `src/common/types/`
- ORPC schemas convention: `src/common/orpc/schemas/api.ts`
- Backend router: `src/node/orpc/router.ts`
- Service injection context: `src/node/orpc/context.ts`
- Backend streaming/tooling adjacency:
`src/node/services/streamManager.ts`,
`src/node/services/mcpServerManager.ts`
- Electron main process entry: `src/desktop/main.ts`

## Recommended delivery strategy

### Approach A — CLI-backed Browser tab behind a stable service
interface
**Summary:** Mux launches agent-browser as a managed subprocess, uses
the documented stream port/WebSocket viewer, and renders the live
viewport in a new Browser tab.

- **Net LoC estimate (product code only):** **+650 to +900 LoC**
- **Why choose it:** fastest path to a visible, testable MVP
- **Primary risk:** process management and installation/runtime
packaging
- **Recommendation:** **fallback path** if the BrowserManager/API spike
fails or packaging friction is high

### Approach B — BrowserManager-backed Mux-native browser session
service
**Summary:** Mux owns a `BrowserSessionService` and a native
agent-browser backend adapter, with structured session state, action
events, and a right-sidebar viewer.

- **Net LoC estimate (product code only):** **+950 to +1400 LoC**
- **Why choose it:** cleanest long-term architecture; strongest control
over lifecycle, state, and UI sync
- **Primary risk:** unknown effort to wire the agent-browser
runtime/library cleanly into Mux’s desktop build/runtime
- **Recommendation:** **preferred target architecture** if the initial
spike proves viable within 1 day

### Approach C — Human takeover, recording, and replay-friendly session
history
**Summary:** Add explicit user takeover, input arbitration, recording
hooks, and lightweight persisted session/action history.

- **Net LoC estimate (product code only):** **+350 to +650 LoC
incremental**
- **Why choose it:** completes the “watch + intervene + review” story
- **Primary risk:** agent/user race conditions and more UX/state
complexity
- **Recommendation:** **phase 3 follow-up**, not part of the first merge

## Recommended execution decision

Execute a **1-day spike** first, then branch:

1. Try to stand up a minimal `BrowserSessionBackend` using the
**BrowserManager/API path**.
2. If that spike proves stable in Mux’s Node/Electron runtime, proceed
with **Approach B**.
3. If it does not, deliver **Approach A** behind the **same
`BrowserSessionBackend` interface**, so the UI and ORPC contracts remain
stable.

This keeps the team moving while avoiding a throwaway UI.

## Non-negotiable architectural invariants

1. **Do not embed arbitrary web content as live DOM** in the renderer.
- No `dangerouslySetInnerHTML`, `webview`, or loose iframe-based
browsing surface for remote pages.
   - Render the viewer as **frames/images/canvas only**.
2. **Keep high-frequency frames off ORPC if possible.**
- Use ORPC for session lifecycle, status, action timeline, and errors.
- Prefer a dedicated viewer WebSocket endpoint (returned by the backend)
for frame transport.
3. **Make the browser viewer workspace-scoped.**
   - MVP should support **one active browser session per workspace**.
   - Do not start with `browser:${sessionId}` multi-instance tabs.
4. **Lazy initialization only.**
   - Browser session services must not start on app boot.
- Missing binaries/install issues must surface as recoverable UI errors,
not startup crashes.
5. **Single-source shared types.**
   - Cross-boundary types go in `src/common/types/browserSession.ts`.
   - ORPC validation schemas stay in `src/common/orpc/schemas/api.ts`.
6. **Defensive programming at every boundary.**
- Validate session IDs, workspace IDs, viewer URLs, stream payloads, and
action events.
- Assert impossible states in dev/test builds; degrade gracefully in
user-facing UI.
7. **Never leak secrets into the timeline.**
- Filled credentials or vault-backed values must be redacted in browser
action summaries and never persisted.

## Proposed architecture

### Top-level model

```ts
export interface BrowserSession {
  id: string;
  workspaceId: string;
  status: "idle" | "starting" | "live" | "paused" | "ended" | "error";
  ownership: "agent" | "user";
  backend: "agent-browser-cli" | "agent-browser-manager";
  viewerUrl: string | null;
  title: string | null;
  url: string | null;
  lastAction: BrowserAction | null;
  lastError: string | null;
  startedAt: string;
  endedAt?: string;
}

export type BrowserSessionEvent =
  | { type: "snapshot"; session: BrowserSession | null; recentActions: BrowserAction[] }
  | { type: "session-updated"; session: BrowserSession }
  | { type: "action"; action: BrowserAction }
  | { type: "session-ended"; sessionId: string }
  | { type: "error"; sessionId: string; message: string };
```

### Core components/services
- **`BrowserSessionService`**
(`src/node/services/browserSessionService.ts`)
  - Owns the active browser session per workspace.
- Emits workspace-scoped events using the same broad pattern as
`DevToolsService`.
  - Tracks session state, recent actions, errors, and viewer endpoint.
- **`BrowserSessionBackend`**
(`src/node/services/browserSessionBackends/BrowserSessionBackend.ts`)
- Stable internal interface so the runtime can swap between CLI and
BrowserManager implementations.
- **`BrowserTab`** (`src/browser/features/RightSidebar/BrowserTab.tsx`)
  - Subscribes to low-frequency state via ORPC.
- Connects directly to the viewer WebSocket for frames/input when
available.
- Keeps frame rendering local to the leaf component so the rest of the
sidebar does not rerender.

### Transport split
- **ORPC stream:** status, title, URL, current/last action, errors,
ownership, start/stop lifecycle.
- **Viewer socket:** image frames + viewport metadata + optional input
injection.

<details>
<summary>Why split transport instead of streaming frames through
ORPC?</summary>

Mux already has a clean ORPC event-stream pattern for subscription-style
data, but browser frames are much higher-frequency than devtools/tool
state updates. Sending base64 JPEG frames through ORPC would increase
serialization pressure, trigger avoidable rerenders, and tie the
viewer’s frame rate to the app’s control plane.

A dedicated viewer socket keeps the control plane small and typed while
letting the Browser tab own the rendering loop.
</details>

## Execution phases and agent workstreams

## Phase 0 — 1-day architecture spike and runtime decision
**Owner:** Backend/platform agent
**Parallelizable:** no; everything else depends on this answer
**Goal:** choose the backend implementation path without blocking the
rest of the team for more than 1 day

### Tasks
1. Prove whether agent-browser’s **BrowserManager/API path** can run
inside Mux’s runtime.
2. If yes, confirm how Mux obtains:
   - frame stream,
   - input injection,
   - page metadata,
   - shutdown hooks.
3. If no, prove the **CLI + stream port** path works reliably from Mux’s
backend.
4. Decide whether the renderer can connect directly to a local viewer
socket or whether Mux needs a relay.
5. Determine installation/runtime posture for dev builds:
   - external binary on PATH,
   - configured binary path,
   - or managed local install.

### Deliverables
- Decision doc-in-code comment in the new backend interface/service
explaining chosen backend.
- Minimal spike proof (branch-local, not productionized) showing a
session can start and produce viewer data.
- Clear go/no-go verdict: **Approach B** or **Approach A fallback**.

### Exit criteria
- Team knows which backend adapter to implement.
- Team knows whether viewer transport is **direct renderer socket** or
**backend relay**.
- Team knows how missing binary/install will surface in the UI.

### Quality gate
- Capture **1 screenshot** and **1 short video** of the spike showing a
live browser image stream and a start/stop cycle.

---

## Phase 1 — Shared contracts, service skeleton, and ORPC surface
**Owner:** Shared-contracts agent
**Parallelizable with:** frontend shell work once types stabilize
**Primary files:**
- `src/common/types/browserSession.ts` (new)
- `src/common/orpc/schemas/api.ts`
- `src/node/orpc/context.ts`
- `src/node/services/browserSessionService.ts` (new)
- `src/node/orpc/router.ts`

### Tasks
1. Add shared types in `src/common/types/browserSession.ts`:
   - `BrowserSession`
   - `BrowserAction`
   - `BrowserSessionEvent`
   - viewer metadata types if needed
2. Add ORPC schemas in `src/common/orpc/schemas/api.ts` for:
   - `browserSession.getActive`
   - `browserSession.start`
   - `browserSession.stop`
   - `browserSession.subscribe`
- optionally `browserSession.clearRecentActions` if the UI needs it
later
3. Add `browserSessionService` to `ORPCContext` in
`src/node/orpc/context.ts`.
4. Implement `BrowserSessionService` as a workspace-scoped
`EventEmitter` service.
5. Mirror the `DevToolsService`/`useDevToolsSubscription` model:
   - snapshot-first subscription
   - queue buffering
   - listener cleanup on abort/disconnect
6. Define a strict policy for session ownership and lifecycle:
   - **one active session per workspace**
- starting a new session either reuses or explicitly replaces the old
one
   - hiding the Browser tab does **not** stop the session

### Acceptance criteria
- The backend can create a placeholder browser session and stream typed
lifecycle updates over ORPC.
- The service cleans up listeners on unsubscribe/abort.
- The types are shared and not duplicated in browser/node code.

### Defensive programming requirements
- Assert that a workspace-scoped session cannot belong to a different
workspace.
- Assert that `viewerUrl` is null unless the session is
starting/live/paused.
- Reject malformed session transitions early.

### Estimated product code
- **+180 to +280 LoC**

---

## Phase 2 — Backend adapter implementation and lifecycle management
**Owner:** Backend integration agent
**Parallelizable with:** Phase 3 UI shell once contracts are stable
**Primary files:**
- `src/node/services/browserSessionBackends/BrowserSessionBackend.ts`
(new)
-
`src/node/services/browserSessionBackends/AgentBrowserManagerBackend.ts`
(new, preferred)
- `src/node/services/browserSessionBackends/AgentBrowserCliBackend.ts`
(new, fallback)
- `src/node/services/browserSessionService.ts`
- `src/node/services/streamManager.ts` (only if shared run lifecycle
hooks are needed)
- `src/node/services/mcpServerManager.ts` (only if a bridge is needed
later; avoid coupling MVP to this)

### Tasks
1. Create a backend interface with methods like:
   - `startSession(...)`
   - `stopSession(sessionId)`
   - `getViewerEndpoint(sessionId)`
   - `onAction(...)`
   - `onSessionUpdate(...)`
2. Implement the chosen backend adapter.
3. Add lazy runtime preflight:
   - binary/library available?
   - browser install available?
   - helpful error message if not.
4. Allocate an ephemeral viewer port or equivalent viewer endpoint.
5. Convert raw backend events into redacted, user-readable
`BrowserAction` entries.
6. Ensure hard cleanup on:
   - workspace close,
   - session replacement,
   - app shutdown,
   - backend crash/disconnect.
7. Keep raw frames ephemeral; do **not** persist them to disk.
8. Persist only lightweight session/action metadata **if** it materially
improves recovery/debuggability; otherwise keep MVP in-memory.

### Explicit scope control
- **Do not** build generalized support for arbitrary external MCP
browser tools in this phase.
- **Do not** parse ad-hoc shell logs in the renderer.
- If the CLI adapter is used, parsing/translation belongs inside the
backend adapter only.

### Acceptance criteria
- Starting a browser session returns typed session state plus a viewer
endpoint.
- Stopping a session cleans up all child resources and emits final
state.
- Missing dependency/install errors are shown as session errors, not
crashes.
- The service never leaves orphan processes/sockets after stop or
shutdown.

### Defensive programming requirements
- Use explicit disposables/cleanup guards for child processes and
sockets.
- Assert one session per workspace for MVP.
- Redact or omit sensitive input payloads in action events.

### Estimated product code
- **Approach B path:** **+260 to +420 LoC**
- **Approach A fallback path:** **+220 to +340 LoC**

---

## Phase 3 — Right-sidebar Browser tab shell and layout integration
**Owner:** Frontend/right-sidebar agent
**Parallelizable with:** Phase 2 once contracts are stable enough to
mock
**Primary files:**
- `src/browser/types/rightSidebar.ts`
- `src/browser/features/RightSidebar/Tabs/registry.ts`
- `src/browser/features/RightSidebar/Tabs/TabLabels.tsx`
- `src/browser/features/RightSidebar/RightSidebar.tsx`
- `src/browser/features/RightSidebar/BrowserTab.tsx` (new)
- `src/browser/utils/rightSidebarLayout.ts`
- `src/browser/utils/uiLayouts.ts` (only if layout presets need
updating)

### Tasks
1. Add `"browser"` to the right-sidebar base tab model.
2. Register a `BROWSER_TAB_CONFIG` in the right-sidebar registry.
3. Add a `BrowserTabLabel` showing:
   - browser icon,
   - live/error state,
   - subtle activity indicator if a session is active.
4. Implement `BrowserTab.tsx` with the following UI regions:
   - header: title, URL, session status, backend type
   - main viewer region: live browser image/canvas
   - status strip: last action, ownership, errors
   - empty/error/install state
5. Keep frame rendering local to `BrowserTab`:
   - do not push raw frames into right-sidebar layout state,
   - do not rerender the entire sidebar on each frame.
6. Decide Browser tab visibility UX:
   - **recommended:** do not force it into every default layout,
   - automatically insert/select it when a browser session starts,
   - persist user layout choices afterward.
7. Ensure the tab participates cleanly in existing right-sidebar layout
operations.

### UX requirements
- The Browser tab should feel like a sibling of Terminal/Output/Debug,
not a separate product.
- Starting a browser-backed task should auto-focus or at least visibly
surface the Browser tab.
- Hiding the tab should not kill the browser session.
- If the session ends, the tab should show a stable ended/error state
rather than disappearing abruptly.

### Acceptance criteria
- A browser session can appear in the right sidebar and remain visible
while the agent works.
- The rest of the app remains responsive while frames are arriving.
- Layout persistence and tab switching still work.

### Defensive programming requirements
- Validate viewer connection state transitions.
- Clamp or ignore nonsensical frame metadata.
- Ensure null session state renders cleanly without throwing.

### Estimated product code
- **+220 to +320 LoC**

---

## Phase 4 — Viewer transport, frame rendering, and performance
hardening
**Owner:** Frontend performance/interaction agent
**Parallelizable with:** late Phase 2 / late Phase 3
**Primary files:**
- `src/browser/features/RightSidebar/BrowserTab.tsx`
- optionally `src/browser/features/RightSidebar/BrowserViewer.tsx` (new,
only if extraction materially reduces complexity)
- optionally `src/common/types/browserSession.ts` for frame metadata
types

### Tasks
1. Implement the viewer transport in the tab:
   - connect to the backend-provided viewer endpoint,
   - receive frames and metadata,
   - render without React-wide churn.
2. Start with the simplest correct renderer:
   - imperative `<img>` or canvas update loop,
- only introduce a separate extracted viewer component if the code
becomes hard to follow.
3. Add frame management:
   - keep only the latest frame,
   - drop stale queued frames,
   - optionally decode on `requestAnimationFrame`.
4. Handle viewer disconnects and reconnect states.
5. Show visible empty/loading overlays while the first frame is pending.
6. Preserve aspect ratio and pointer coordinate mapping data for future
takeover.

### Acceptance criteria
- The Browser tab can display a steady live viewport without noticeably
degrading the rest of the right sidebar.
- Frame delivery failure yields a recoverable error/reconnect state.
- The renderer is not flooded with state updates from every frame.

### Estimated product code
- **+140 to +240 LoC**

---

## Phase 5 — Action timeline and tool synchronization
**Owner:** Tooling/instrumentation agent
**Parallelizable with:** Phase 4 once the action model is stable
**Primary files:**
- `src/common/types/browserSession.ts`
- `src/node/services/browserSessionService.ts`
- backend adapter file from Phase 2
- `src/browser/features/RightSidebar/BrowserTab.tsx`
- optionally `src/browser/features/RightSidebar/DevToolsTab/*` if
cross-linking is added later

### Tasks
1. Define a compact `BrowserAction` model for user-facing steps:
   - `navigate`
   - `click`
   - `fill`
   - `type`
   - `scroll`
   - `snapshot`
   - `wait`
   - `error`
2. Emit these actions from the backend adapter in a way that does
**not** depend on renderer-side log parsing.
3. Add a recent-action list to the Browser tab.
4. Redact sensitive values:
   - passwords,
   - secrets,
   - auth tokens,
   - vault-backed values.
5. Keep raw detailed logs in existing Output/Debug surfaces where
appropriate, but do not block MVP on deep fusion with those tabs.

### Acceptance criteria
- The user can understand “what the agent is doing” from the Browser tab
itself.
- Sensitive inputs are not displayed.
- Browser action state stays consistent with session lifecycle state.

### Estimated product code
- **+120 to +220 LoC**

---

## Phase 6 — Human takeover and collaboration controls (follow-up)
**Owner:** Interaction/UX agent
**Parallelizable with:** after Phases 2–5 stabilize
**Primary files:**
- `src/browser/features/RightSidebar/BrowserTab.tsx`
- viewer transport/helper file if extracted
- backend adapter file from Phase 2
- `src/common/types/browserSession.ts`
- `src/common/orpc/schemas/api.ts` (only if extra control procedures are
needed)

### Tasks
1. Add explicit **Take over** / **Return control** affordances.
2. When user takeover starts:
   - flip `ownership` from `agent` to `user`,
   - pause or gate agent input,
   - show a visible banner.
3. Translate pointer/keyboard events using frame metadata.
4. Prevent agent/user race conditions.
5. Add a timeout/release policy for abandoned user takeover.

### Acceptance criteria
- The user can click/type into the live browser view when takeover is
active.
- Agent and user inputs never race silently.
- Ownership state is always visible.

### Estimated product code
- **+180 to +320 LoC**

---

## Phase 7 — Testing, stories, and rollout hardening
**Owner:** QA/verification agent
**Parallelizable with:** all later phases
**Primary files:**
- `tests/ipc/browserSession.test.ts` (new)
- `tests/ui/browserTab.test.ts` (new)
- colocated pure tests only if new pure helpers are extracted
- `src/browser/stories/App.BrowserTab.stories.tsx` or the nearest
existing full-app story file that should absorb the new states

### Test plan
1. **IPC/integration tests** (`tests/ipc`)
   - start/stop lifecycle
   - snapshot-first subscription behavior
   - workspace isolation
   - replacement/cleanup behavior
   - missing dependency error state
2. **UI integration tests** (`tests/ui`)
   - browser tab appears and renders idle/loading/error states
   - session start auto-surfaces the tab
   - recent actions/status text update correctly
   - app remains navigable while the tab is active
3. **Pure unit tests** (colocated) only for extracted pure helpers such
as:
   - coordinate mapping,
   - frame metadata normalization,
   - action redaction.
4. **Storybook/full-app story**
   - idle state
   - live state with recent actions
   - ended/error/install-missing state
5. **Targeted e2e** (`tests/e2e`) only if happy-dom is insufficient for
validating the viewer transport or takeover behavior.

### Validation commands
- `make typecheck`
- `make static-check`
- targeted IPC/UI/e2e tests for touched areas

### Rollout posture
- Ship behind an **experimental flag** or equivalent internal-only
exposure first.
- Keep the feature off by default until dogfooding is stable.
- Log session start/stop/error paths with the repo’s `log` helper on the
backend.

### Acceptance criteria
- New code paths have targeted coverage.
- Browser tab UI states are captured in stories.
- Experimental rollout path is defined.

### Estimated product code
- **+60 to +140 LoC**

## Cross-cutting design decisions the team should follow

### 1. Keep browser integration separate from generic MCP integration
for the first delivery
Mux already has `MCPServerManager`, but the first delivery should not
try to unify every possible browser MCP server under one viewer
abstraction. Build a Mux-owned browser session service first; if future
MCP tools want to publish into it, add a bridge later.

### 2. Treat the Browser tab as a first-class right-sidebar resident
The Browser tab should live beside Terminal, Output, and Debug; it
should not open in a separate window for the MVP unless the spike proves
the in-sidebar viewer is impossible.

### 3. Never persist raw frames
Persist, at most, lightweight metadata and redacted action history. Raw
image streams are too large and too risky to store casually.

### 4. Prefer local viewer transport over backend relaying
If Electron/network policy allows it, the renderer should connect
directly to the locally managed viewer socket. Only add a relay if
direct connection is blocked or unsafe.

### 5. Avoid hook proliferation
Colocate live viewer logic with `BrowserTab.tsx`. Extract only the
pieces that are genuinely reusable or become too complex to read.

## Parallelization map for a team of agents

| Workstream | Can start when | Suggested owner |
| --- | --- | --- |
| Phase 0 spike | immediately | backend/platform agent |
| Phase 1 contracts/service shell | after Phase 0 decision is mostly
clear | shared-contracts agent |
| Phase 3 right-sidebar shell with mocked state | after Phase 1 type
shape stabilizes | frontend/right-sidebar agent |
| Phase 2 backend adapter | after Phase 0 backend choice | backend
integration agent |
| Phase 4 viewer transport | after Phase 2 returns a viewer endpoint
contract | frontend performance agent |
| Phase 5 action timeline | after Phase 2 emits structured actions |
tooling/instrumentation agent |
| Phase 7 tests/stories | begins with Phase 1 and expands as each phase
lands | QA/verification agent |
| Phase 6 takeover | after MVP is stable | interaction/UX agent |

## Dogfooding plan (required)

### Dogfooding principles to follow
This plan should absorb the core discipline from the repo’s `dogfood`
and `agent-browser` skills:

- Treat dogfooding as **structured exploratory QA**, not a casual smoke
test.
- Use **repro-first evidence**: when something breaks, stop and document
it immediately before moving on.
- For **interactive/behavioral issues**, capture a **video plus
step-by-step screenshots**.
- For **static/visible-on-load issues**, capture a **single annotated
screenshot** instead of wasting time on video.
- Use **`agent-browser` directly, never `npx agent-browser`**.
- Use **named sessions** so multiple agents can dogfood in parallel
without stepping on each other.
- Follow the core agent-browser loop: **open → wait → snapshot -i →
interact → re-snapshot**.
- After any navigation or major DOM change, **re-snapshot** before
taking the next action.
- Prefer **explicit waits** such as `wait --load networkidle` or
element/url waits; only use sleeps to make repro videos human-watchable.
- Check **console/errors** periodically; some regressions will not be
visible in the viewport.
- Append findings **incrementally** to a dogfood report so an
interrupted run still leaves usable evidence.

### Dogfooding harness and setup
Each dogfood run should create an isolated run ID, session name, and
evidence directory.

1. Launch the normal local Mux development flow (`make dev` or the
team’s standard desktop/Electron dev path).
2. Enable the experimental Browser tab feature flag.
3. Prepare a deterministic target site or sites:
   - at least one simple navigation target,
   - one form-interaction target,
   - one failure-path target if available.
4. Create an isolated output directory per run, for example:
   - `./dogfood-output/browser-tab/<run-id>/screenshots`
   - `./dogfood-output/browser-tab/<run-id>/videos`
   - `./dogfood-output/browser-tab/<run-id>/report.md`
5. Start a **named** agent-browser session for the target browsing
workload.
6. If authentication is required, prefer one of these, in order:
   - saved session/profile/state,
   - auth vault,
   - one-time manual login with saved state.
7. Where feasible, constrain the run with:
   - a domain allowlist,
   - content boundaries,
   - a deterministic viewport/device preset.

### Recommended command pattern for browser-side dogfooding
Use the `agent-browser` skill’s proven workflow for the site being
driven inside the Browser tab.

```bash
RUN_ID=browser-tab-<timestamp>
SESSION=mux-browser-tab-${RUN_ID}
OUT=./dogfood-output/browser-tab/${RUN_ID}

mkdir -p ${OUT}/screenshots ${OUT}/videos

agent-browser --session ${SESSION} open <target-url> && \
agent-browser --session ${SESSION} wait --load networkidle && \
agent-browser --session ${SESSION} screenshot --annotate ${OUT}/screenshots/initial.png && \
agent-browser --session ${SESSION} snapshot -i
```

For authenticated or recurring scenarios, prefer `--session-name`,
`--profile`, or saved state so reruns are fast and reproducible.

### Structured dogfooding workflow

#### 1. Initialize
- Create the run directory and report file.
- Start the named session.
- Capture an initial annotated screenshot and interactive snapshot.
- Record the initial Mux state showing whether the Browser tab is
hidden, visible, empty, or already active.

#### 2. Authenticate (if needed)
- Authenticate once using a repeatable approach.
- Save state if the scenario will be rerun.
- Never expose raw credentials in artifacts or the report.

#### 3. Orient
- Map the top-level Mux workflow for this feature:
  - how a browser session starts,
  - how the Browser tab appears,
  - how the user sees status/action text,
  - how the session ends or errors.
- Map the target site’s main interactive elements using `snapshot -i`.
- Capture a baseline annotated screenshot of the target page and a
screenshot of the Mux Browser tab.

#### 4. Explore systematically
Test the feature like a real user, page by page and workflow by
workflow.

At a minimum, cover:
1. **Session start / tab surfacing**
   - starting a browser-backed task creates or reuses the Browser tab,
   - the tab becomes visible enough that the user notices it,
   - the initial loading state is sane.
2. **Watch-only navigation**
   - navigate across multiple pages,
   - click links/buttons,
   - confirm the Browser tab stays live and visually synchronized.
3. **Form interaction + redaction**
   - fill inputs and submit a harmless form,
   - confirm the recent-action list matches what happened,
- confirm sensitive values are redacted from visible action text and
persisted artifacts.
4. **Layout and right-sidebar behavior**
   - resize the sidebar,
   - switch tabs away and back,
   - collapse/reopen if supported,
   - confirm the session survives UI movement.
5. **Interrupt / cleanup / replacement**
   - stop a run mid-session,
   - start another session in the same workspace,
   - switch workspaces if the product allows it,
   - confirm there are no orphaned or cross-wired sessions.
6. **Error and dependency handling**
   - test missing runtime / failed startup / disconnected viewer paths,
- confirm the app shows a recoverable error state instead of crashing.
7. **Performance / backpressure**
   - keep the Browser tab open during a longer run,
   - confirm the rest of the right sidebar remains responsive.
8. **Takeover flow** (Phase 6 only)
   - take control, click/type, return control,
   - confirm ownership is explicit and agent/user inputs do not race.

During exploration, use the agent-browser workflow rigorously:
- `snapshot -i` before discovering refs,
- interact via refs,
- `wait --load networkidle` or element/url waits after major actions,
- **re-snapshot** after navigation or DOM mutation,
- check `errors` and `console` periodically,
- optionally use `diff snapshot` when validating that an action changed
the page as expected.

### Repro-first issue documentation rules
When a bug is found, stop exploring and document it immediately.

#### Interactive / behavioral issues
Examples: wrong action log, frozen stream, mismatched viewport, takeover
race, session cleanup bug, visible console error after an action.

Required evidence:
1. Start a repro video **before** reproducing.
2. Reproduce at human pace.
3. Capture a screenshot for each significant step.
4. Pause on the broken state and capture an **annotated** screenshot.
5. Stop the video.
6. Append the issue to `report.md` immediately with:
   - issue ID (`ISSUE-001`, etc.),
   - severity,
   - exact repro steps,
   - expected result,
   - actual result,
   - screenshot/video filenames.

When typing is part of the observable repro, prefer `type` over `fill`
so the video is understandable.

#### Static / visible-on-load issues
Examples: clipped text, wrong icon/state, bad empty state copy, layout
overlap, stale title/url, immediately visible console error.

Required evidence:
1. Capture a single annotated screenshot.
2. Append a concise issue entry to `report.md` immediately.
3. Mark repro video as `N/A`.

### Evidence requirements per milestone
For every milestone review, provide both broad milestone evidence and
issue-specific evidence.

#### Broad milestone evidence
- **At least 2 screenshots**:
  - one of the Browser tab during a live session,
  - one of an ended/error/install-missing state.
- **At least 1 short video** showing the agent actively browsing while
the Browser tab is visible in Mux.

#### Issue-specific evidence
- Every reproducible interactive issue gets:
  - one repro video,
  - step-by-step screenshots,
  - one annotated result screenshot.
- Every reproducible static issue gets:
  - one annotated screenshot.

Where practical, capture both:
- the **Mux-side evidence** (the Browser tab visible in the app), and
- the **browser-side evidence** (agent-browser screenshots/video of the
underlying session).

### Wrap-up procedure
At the end of each dogfood run:
1. Re-read the report and make sure summary counts match the actual
issue list.
2. Explicitly note whether the run found:
   - blocking issues,
   - moderate issues,
   - minor issues,
   - or no additional reproducible issues.
3. Close the named agent-browser session.
4. Preserve all artifacts; do not delete screenshots, videos, or reports
mid-run.
5. Attach screenshots and the key video to the implementation
handoff/review.

### Phase quality gates tied to dogfooding
- **After Phase 0 spike:** one live-session screenshot, one short
start/stop video, one note on runtime/install friction.
- **Before Milestone M1 sign-off:** complete one structured exploratory
run with evidence across start, navigation, form interaction, resize/tab
switching, interrupt, and error handling.
- **Before Milestone M2 sign-off:** complete one structured run
specifically validating action-log fidelity and redaction behavior.
- **Before Milestone M3 sign-off:** complete one structured run
specifically validating takeover ownership, input arbitration, and
recovery.

### Parallel-team guidance
If multiple agents dogfood simultaneously:
- each agent must use a unique session name,
- each agent must write to a separate run directory,
- each agent must append findings to its own report first, then merge
findings into the shared review summary.

Aim for the depth of coverage that would normally yield **5–10
well-documented findings’ worth of exploration**. If fewer issues are
found, state explicitly that no additional reproducible issues were
observed rather than inventing weak findings.

## Final milestone definitions

### Milestone M1 — Visible viewer MVP
Includes Phases 0–4 and the Phase 7 test/story minimums.

**Success means:**
- a right-sidebar Browser tab exists,
- a live browser session can be viewed there,
- the UI stays stable,
- lifecycle errors are recoverable.

### Milestone M2 — “See what the agent is doing” product pass
Adds Phase 5 and expands verification.

**Success means:**
- the Browser tab shows live viewport + current/recent actions,
- the user can correlate the visible browser with agent intent,
- sensitive actions are redacted correctly.

### Milestone M3 — Human collaboration pass
Adds Phase 6.

**Success means:**
- the user can safely take over and hand control back,
- ownership is explicit,
- sessions remain stable under collaboration.

## Recommended first implementation order

1. Phase 0 spike
2. Phase 1 shared contracts and ORPC shell
3. Phase 3 Browser tab shell using mocked/stubbed session state
4. Phase 2 real backend adapter and lifecycle
5. Phase 4 live viewer transport/perf hardening
6. Phase 5 action timeline sync
7. Phase 7 full verification/story coverage
8. Phase 6 takeover only after M1/M2 are solid

## What not to do in the first pass
- Do not start with multi-session browser tabs.
- Do not make this a separate popout-only window.
- Do not route raw frame streams through generic chat/tool message
rendering.
- Do not block the feature on deep Debug/Output/MCP unification.
- Do not attempt full browser replay/history storage.
- Do not ship without screenshots/video from dogfooding.

</details>


---
_Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking:
`xhigh` • Cost: `$52.11`_

<!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh
costs=52.11 -->

v0.21.0

Toggle v0.21.0's commit message
release: v0.21.0

v0.20.2-nightly.105

Toggle v0.20.2-nightly.105's commit message

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
🤖 refactor: auto-cleanup (#2942)

## Summary

Periodic auto-cleanup: removes the dead `setPRStatusStoreInstance`
export from `PRStatusStore.ts`.

## Background

The function was exported but never imported or called anywhere in the
codebase. The getter (`getPRStatusStoreInstance`) creates the singleton
on demand; no test or production code ever needed to inject a custom
instance via the setter. It is the only `set*StoreInstance` pattern
across all stores, so there is no convention to maintain.

## Validation

- `make typecheck` — passes
- `make lint` — passes
- `make fmt-check` — passes
- `bun test src/browser/stores/PRStatusStore` — 12/12 pass
- Grep confirms zero references outside the definition site

Auto-cleanup checkpoint: ff743d1

---

_Generated with `mux` • Model: `anthropic:claude-opus-4-6` • Thinking:
`xhigh` • Cost: `$0.00`_

<!-- mux-attribution: model=anthropic:claude-opus-4-6 thinking=xhigh
costs=0.00 -->

Co-authored-by: mux-bot[bot] <264182336+mux-bot[bot]@users.noreply.github.com>