Skip to content

🤖 feat: add browser sidebar tab for live agent-browser viewing#2951

Merged
ThomasK33 merged 19 commits intomainfrom
agent-browser-r1as
Mar 14, 2026
Merged

🤖 feat: add browser sidebar tab for live agent-browser viewing#2951
ThomasK33 merged 19 commits intomainfrom
agent-browser-r1as

Conversation

@ThomasK33
Copy link
Copy Markdown
Member

Summary

Add a Browser tab to Mux's right sidebar that shows what agent-browser is doing in real time — live screenshots, URL/title tracking, action timeline, and session lifecycle controls.

Background

When an agent uses agent-browser to automate web pages, users currently have no visibility into what's happening. This feature adds a workspace-scoped Browser tab in the right sidebar so users can watch the agent navigate, see which page it's on, and understand the sequence of actions — without leaving the Mux interface.

Implementation

Architecture

  • CLI-backed approach — agent-browser has no library API, so we manage it as a subprocess
  • ORPC for all transport — screenshot polling (~2s) fits the existing event-stream model; no separate WebSocket needed
  • One session per workspace — enforced by the service layer

Backend (4 new files + 6 modified)

  • src/common/types/browserSession.ts — shared types (BrowserSession, BrowserAction, BrowserSessionEvent)
  • src/common/orpc/schemas/api.ts — Zod schemas for getActive, start, stop, subscribe
  • src/node/services/browserSessionBackend.ts — CLI adapter managing the agent-browser subprocess, screenshot polling, metadata extraction, external-close detection
  • src/node/services/browserSessionService.ts — EventEmitter service with workspace-scoped events (mirrors DevToolsService pattern)
  • ORPC routes with async-generator subscription (snapshot-first, queue+resolveNext)
  • Service container wiring + disposal

Frontend (4 new files + 5 modified)

  • src/browser/features/RightSidebar/BrowserTab/BrowserTab.tsx — main tab component with idle/starting/live/error/ended states, screenshot viewer, action timeline, Start/Stop/Restart controls
  • src/browser/features/RightSidebar/BrowserTab/useBrowserSessionSubscription.ts — ORPC subscription hook (mirrors useDevToolsSubscription)
  • Tab type registration, config, label in the right-sidebar infrastructure
  • Layout migration so existing workspaces get the Browser tab

Key design decisions

  • sharp lazy-loaded — prevents Bun test env crashes from the native dependency
  • External close detection — if URL transitions from a real page to about:blank, the backend infers the browser was closed externally and surfaces an error
  • Session controls — Start/Restart (idle/ended/error states) and Stop (live/starting states) are mutually exclusive header buttons

Validation

  • make static-check passes (typecheck + lint + shellcheck + prettier + docs)
  • bun test src/browser/utils/rightSidebarLayout.test.ts — 23/23 pass
  • bun test src/cli/cli.test.ts src/cli/server.test.ts — 28/28 pass
  • 3 full dogfood runs with agent-browser driving the Mux frontend:
    1. Initial dogfood: found 3 issues (button truncation, missing stop button, no close detection)
    2. Fix verification dogfood: confirmed all 3 fixes
    3. Final comprehensive dogfood: all 6 test scenarios passed (idle state, session start, navigation, tab switching, stop button, external close detection, restart cycle)

Risks

  • sharp in production Electron — works in Node.js but may need packaging attention; fallback to raw PNG if unavailable
  • Poll-based screenshots — 2-second interval means the viewer lags slightly behind real-time; acceptable for MVP
  • Single-session limit — only one browser session per workspace; multi-session is a deliberate follow-up

📋 Implementation Plan

Integrate agent-browser into Mux right sidebar — implementation plan

Objective

Ship a workspace-scoped Browser tab in Mux’s right sidebar so users can watch an agent-driven browser session live, understand the agent’s current step, and eventually take over input when needed.

The implementation should:

  • feel native to Mux’s existing right-sidebar/tab model,
  • preserve existing tool/chat UX,
  • avoid risky DOM embedding of arbitrary pages,
  • scale from a fast MVP to a first-class browser tool/session model.

Verified repo and product constraints

agent-browser capabilities confirmed from official research

  • agent-browser can expose a live browser stream over WebSocket.
  • The stream carries base64 JPEG frames + viewport metadata and accepts mouse/keyboard/touch input back over the socket.
  • agent-browser can also connect to an existing browser via CDP.
  • There is a documented programmatic BrowserManager API as well as a CLI path.
  • It supports recording sessions, which is useful for dogfooding/review artifacts.

Relevant Mux integration points already in the repo

  • Right sidebar container: src/browser/features/RightSidebar/RightSidebar.tsx
  • Right-sidebar tab types: src/browser/types/rightSidebar.ts
  • Right-sidebar tab registry/config: src/browser/features/RightSidebar/Tabs/registry.ts
  • Existing live session/tab reference: src/browser/features/RightSidebar/TerminalTab.tsx
  • Existing live debug subscription pattern: src/browser/features/RightSidebar/DevToolsTab/useDevToolsSubscription.ts
  • Shared cross-boundary types convention: src/common/types/
  • ORPC schemas convention: src/common/orpc/schemas/api.ts
  • Backend router: src/node/orpc/router.ts
  • Service injection context: src/node/orpc/context.ts
  • Backend streaming/tooling adjacency: src/node/services/streamManager.ts, src/node/services/mcpServerManager.ts
  • Electron main process entry: src/desktop/main.ts

Recommended delivery strategy

Approach A — CLI-backed Browser tab behind a stable service interface

Summary: Mux launches agent-browser as a managed subprocess, uses the documented stream port/WebSocket viewer, and renders the live viewport in a new Browser tab.

  • Net LoC estimate (product code only): +650 to +900 LoC
  • Why choose it: fastest path to a visible, testable MVP
  • Primary risk: process management and installation/runtime packaging
  • Recommendation: fallback path if the BrowserManager/API spike fails or packaging friction is high

Approach B — BrowserManager-backed Mux-native browser session service

Summary: Mux owns a BrowserSessionService and a native agent-browser backend adapter, with structured session state, action events, and a right-sidebar viewer.

  • Net LoC estimate (product code only): +950 to +1400 LoC
  • Why choose it: cleanest long-term architecture; strongest control over lifecycle, state, and UI sync
  • Primary risk: unknown effort to wire the agent-browser runtime/library cleanly into Mux’s desktop build/runtime
  • Recommendation: preferred target architecture if the initial spike proves viable within 1 day

Approach C — Human takeover, recording, and replay-friendly session history

Summary: Add explicit user takeover, input arbitration, recording hooks, and lightweight persisted session/action history.

  • Net LoC estimate (product code only): +350 to +650 LoC incremental
  • Why choose it: completes the “watch + intervene + review” story
  • Primary risk: agent/user race conditions and more UX/state complexity
  • Recommendation: phase 3 follow-up, not part of the first merge

Recommended execution decision

Execute a 1-day spike first, then branch:

  1. Try to stand up a minimal BrowserSessionBackend using the BrowserManager/API path.
  2. If that spike proves stable in Mux’s Node/Electron runtime, proceed with Approach B.
  3. If it does not, deliver Approach A behind the same BrowserSessionBackend interface, so the UI and ORPC contracts remain stable.

This keeps the team moving while avoiding a throwaway UI.

Non-negotiable architectural invariants

  1. Do not embed arbitrary web content as live DOM in the renderer.
    • No dangerouslySetInnerHTML, webview, or loose iframe-based browsing surface for remote pages.
    • Render the viewer as frames/images/canvas only.
  2. Keep high-frequency frames off ORPC if possible.
    • Use ORPC for session lifecycle, status, action timeline, and errors.
    • Prefer a dedicated viewer WebSocket endpoint (returned by the backend) for frame transport.
  3. Make the browser viewer workspace-scoped.
    • MVP should support one active browser session per workspace.
    • Do not start with browser:${sessionId} multi-instance tabs.
  4. Lazy initialization only.
    • Browser session services must not start on app boot.
    • Missing binaries/install issues must surface as recoverable UI errors, not startup crashes.
  5. Single-source shared types.
    • Cross-boundary types go in src/common/types/browserSession.ts.
    • ORPC validation schemas stay in src/common/orpc/schemas/api.ts.
  6. Defensive programming at every boundary.
    • Validate session IDs, workspace IDs, viewer URLs, stream payloads, and action events.
    • Assert impossible states in dev/test builds; degrade gracefully in user-facing UI.
  7. Never leak secrets into the timeline.
    • Filled credentials or vault-backed values must be redacted in browser action summaries and never persisted.

Proposed architecture

Top-level model

export interface BrowserSession {
  id: string;
  workspaceId: string;
  status: "idle" | "starting" | "live" | "paused" | "ended" | "error";
  ownership: "agent" | "user";
  backend: "agent-browser-cli" | "agent-browser-manager";
  viewerUrl: string | null;
  title: string | null;
  url: string | null;
  lastAction: BrowserAction | null;
  lastError: string | null;
  startedAt: string;
  endedAt?: string;
}

export type BrowserSessionEvent =
  | { type: "snapshot"; session: BrowserSession | null; recentActions: BrowserAction[] }
  | { type: "session-updated"; session: BrowserSession }
  | { type: "action"; action: BrowserAction }
  | { type: "session-ended"; sessionId: string }
  | { type: "error"; sessionId: string; message: string };

Core components/services

  • BrowserSessionService (src/node/services/browserSessionService.ts)
    • Owns the active browser session per workspace.
    • Emits workspace-scoped events using the same broad pattern as DevToolsService.
    • Tracks session state, recent actions, errors, and viewer endpoint.
  • BrowserSessionBackend (src/node/services/browserSessionBackends/BrowserSessionBackend.ts)
    • Stable internal interface so the runtime can swap between CLI and BrowserManager implementations.
  • BrowserTab (src/browser/features/RightSidebar/BrowserTab.tsx)
    • Subscribes to low-frequency state via ORPC.
    • Connects directly to the viewer WebSocket for frames/input when available.
    • Keeps frame rendering local to the leaf component so the rest of the sidebar does not rerender.

Transport split

  • ORPC stream: status, title, URL, current/last action, errors, ownership, start/stop lifecycle.
  • Viewer socket: image frames + viewport metadata + optional input injection.
Why split transport instead of streaming frames through ORPC?

Mux already has a clean ORPC event-stream pattern for subscription-style data, but browser frames are much higher-frequency than devtools/tool state updates. Sending base64 JPEG frames through ORPC would increase serialization pressure, trigger avoidable rerenders, and tie the viewer’s frame rate to the app’s control plane.

A dedicated viewer socket keeps the control plane small and typed while letting the Browser tab own the rendering loop.

Execution phases and agent workstreams

Phase 0 — 1-day architecture spike and runtime decision

Owner: Backend/platform agent
Parallelizable: no; everything else depends on this answer
Goal: choose the backend implementation path without blocking the rest of the team for more than 1 day

Tasks

  1. Prove whether agent-browser’s BrowserManager/API path can run inside Mux’s runtime.
  2. If yes, confirm how Mux obtains:
    • frame stream,
    • input injection,
    • page metadata,
    • shutdown hooks.
  3. If no, prove the CLI + stream port path works reliably from Mux’s backend.
  4. Decide whether the renderer can connect directly to a local viewer socket or whether Mux needs a relay.
  5. Determine installation/runtime posture for dev builds:
    • external binary on PATH,
    • configured binary path,
    • or managed local install.

Deliverables

  • Decision doc-in-code comment in the new backend interface/service explaining chosen backend.
  • Minimal spike proof (branch-local, not productionized) showing a session can start and produce viewer data.
  • Clear go/no-go verdict: Approach B or Approach A fallback.

Exit criteria

  • Team knows which backend adapter to implement.
  • Team knows whether viewer transport is direct renderer socket or backend relay.
  • Team knows how missing binary/install will surface in the UI.

Quality gate

  • Capture 1 screenshot and 1 short video of the spike showing a live browser image stream and a start/stop cycle.

Phase 1 — Shared contracts, service skeleton, and ORPC surface

Owner: Shared-contracts agent
Parallelizable with: frontend shell work once types stabilize
Primary files:

  • src/common/types/browserSession.ts (new)
  • src/common/orpc/schemas/api.ts
  • src/node/orpc/context.ts
  • src/node/services/browserSessionService.ts (new)
  • src/node/orpc/router.ts

Tasks

  1. Add shared types in src/common/types/browserSession.ts:
    • BrowserSession
    • BrowserAction
    • BrowserSessionEvent
    • viewer metadata types if needed
  2. Add ORPC schemas in src/common/orpc/schemas/api.ts for:
    • browserSession.getActive
    • browserSession.start
    • browserSession.stop
    • browserSession.subscribe
    • optionally browserSession.clearRecentActions if the UI needs it later
  3. Add browserSessionService to ORPCContext in src/node/orpc/context.ts.
  4. Implement BrowserSessionService as a workspace-scoped EventEmitter service.
  5. Mirror the DevToolsService/useDevToolsSubscription model:
    • snapshot-first subscription
    • queue buffering
    • listener cleanup on abort/disconnect
  6. Define a strict policy for session ownership and lifecycle:
    • one active session per workspace
    • starting a new session either reuses or explicitly replaces the old one
    • hiding the Browser tab does not stop the session

Acceptance criteria

  • The backend can create a placeholder browser session and stream typed lifecycle updates over ORPC.
  • The service cleans up listeners on unsubscribe/abort.
  • The types are shared and not duplicated in browser/node code.

Defensive programming requirements

  • Assert that a workspace-scoped session cannot belong to a different workspace.
  • Assert that viewerUrl is null unless the session is starting/live/paused.
  • Reject malformed session transitions early.

Estimated product code

  • +180 to +280 LoC

Phase 2 — Backend adapter implementation and lifecycle management

Owner: Backend integration agent
Parallelizable with: Phase 3 UI shell once contracts are stable
Primary files:

  • src/node/services/browserSessionBackends/BrowserSessionBackend.ts (new)
  • src/node/services/browserSessionBackends/AgentBrowserManagerBackend.ts (new, preferred)
  • src/node/services/browserSessionBackends/AgentBrowserCliBackend.ts (new, fallback)
  • src/node/services/browserSessionService.ts
  • src/node/services/streamManager.ts (only if shared run lifecycle hooks are needed)
  • src/node/services/mcpServerManager.ts (only if a bridge is needed later; avoid coupling MVP to this)

Tasks

  1. Create a backend interface with methods like:
    • startSession(...)
    • stopSession(sessionId)
    • getViewerEndpoint(sessionId)
    • onAction(...)
    • onSessionUpdate(...)
  2. Implement the chosen backend adapter.
  3. Add lazy runtime preflight:
    • binary/library available?
    • browser install available?
    • helpful error message if not.
  4. Allocate an ephemeral viewer port or equivalent viewer endpoint.
  5. Convert raw backend events into redacted, user-readable BrowserAction entries.
  6. Ensure hard cleanup on:
    • workspace close,
    • session replacement,
    • app shutdown,
    • backend crash/disconnect.
  7. Keep raw frames ephemeral; do not persist them to disk.
  8. Persist only lightweight session/action metadata if it materially improves recovery/debuggability; otherwise keep MVP in-memory.

Explicit scope control

  • Do not build generalized support for arbitrary external MCP browser tools in this phase.
  • Do not parse ad-hoc shell logs in the renderer.
  • If the CLI adapter is used, parsing/translation belongs inside the backend adapter only.

Acceptance criteria

  • Starting a browser session returns typed session state plus a viewer endpoint.
  • Stopping a session cleans up all child resources and emits final state.
  • Missing dependency/install errors are shown as session errors, not crashes.
  • The service never leaves orphan processes/sockets after stop or shutdown.

Defensive programming requirements

  • Use explicit disposables/cleanup guards for child processes and sockets.
  • Assert one session per workspace for MVP.
  • Redact or omit sensitive input payloads in action events.

Estimated product code

  • Approach B path: +260 to +420 LoC
  • Approach A fallback path: +220 to +340 LoC

Phase 3 — Right-sidebar Browser tab shell and layout integration

Owner: Frontend/right-sidebar agent
Parallelizable with: Phase 2 once contracts are stable enough to mock
Primary files:

  • src/browser/types/rightSidebar.ts
  • src/browser/features/RightSidebar/Tabs/registry.ts
  • src/browser/features/RightSidebar/Tabs/TabLabels.tsx
  • src/browser/features/RightSidebar/RightSidebar.tsx
  • src/browser/features/RightSidebar/BrowserTab.tsx (new)
  • src/browser/utils/rightSidebarLayout.ts
  • src/browser/utils/uiLayouts.ts (only if layout presets need updating)

Tasks

  1. Add "browser" to the right-sidebar base tab model.
  2. Register a BROWSER_TAB_CONFIG in the right-sidebar registry.
  3. Add a BrowserTabLabel showing:
    • browser icon,
    • live/error state,
    • subtle activity indicator if a session is active.
  4. Implement BrowserTab.tsx with the following UI regions:
    • header: title, URL, session status, backend type
    • main viewer region: live browser image/canvas
    • status strip: last action, ownership, errors
    • empty/error/install state
  5. Keep frame rendering local to BrowserTab:
    • do not push raw frames into right-sidebar layout state,
    • do not rerender the entire sidebar on each frame.
  6. Decide Browser tab visibility UX:
    • recommended: do not force it into every default layout,
    • automatically insert/select it when a browser session starts,
    • persist user layout choices afterward.
  7. Ensure the tab participates cleanly in existing right-sidebar layout operations.

UX requirements

  • The Browser tab should feel like a sibling of Terminal/Output/Debug, not a separate product.
  • Starting a browser-backed task should auto-focus or at least visibly surface the Browser tab.
  • Hiding the tab should not kill the browser session.
  • If the session ends, the tab should show a stable ended/error state rather than disappearing abruptly.

Acceptance criteria

  • A browser session can appear in the right sidebar and remain visible while the agent works.
  • The rest of the app remains responsive while frames are arriving.
  • Layout persistence and tab switching still work.

Defensive programming requirements

  • Validate viewer connection state transitions.
  • Clamp or ignore nonsensical frame metadata.
  • Ensure null session state renders cleanly without throwing.

Estimated product code

  • +220 to +320 LoC

Phase 4 — Viewer transport, frame rendering, and performance hardening

Owner: Frontend performance/interaction agent
Parallelizable with: late Phase 2 / late Phase 3
Primary files:

  • src/browser/features/RightSidebar/BrowserTab.tsx
  • optionally src/browser/features/RightSidebar/BrowserViewer.tsx (new, only if extraction materially reduces complexity)
  • optionally src/common/types/browserSession.ts for frame metadata types

Tasks

  1. Implement the viewer transport in the tab:
    • connect to the backend-provided viewer endpoint,
    • receive frames and metadata,
    • render without React-wide churn.
  2. Start with the simplest correct renderer:
    • imperative <img> or canvas update loop,
    • only introduce a separate extracted viewer component if the code becomes hard to follow.
  3. Add frame management:
    • keep only the latest frame,
    • drop stale queued frames,
    • optionally decode on requestAnimationFrame.
  4. Handle viewer disconnects and reconnect states.
  5. Show visible empty/loading overlays while the first frame is pending.
  6. Preserve aspect ratio and pointer coordinate mapping data for future takeover.

Acceptance criteria

  • The Browser tab can display a steady live viewport without noticeably degrading the rest of the right sidebar.
  • Frame delivery failure yields a recoverable error/reconnect state.
  • The renderer is not flooded with state updates from every frame.

Estimated product code

  • +140 to +240 LoC

Phase 5 — Action timeline and tool synchronization

Owner: Tooling/instrumentation agent
Parallelizable with: Phase 4 once the action model is stable
Primary files:

  • src/common/types/browserSession.ts
  • src/node/services/browserSessionService.ts
  • backend adapter file from Phase 2
  • src/browser/features/RightSidebar/BrowserTab.tsx
  • optionally src/browser/features/RightSidebar/DevToolsTab/* if cross-linking is added later

Tasks

  1. Define a compact BrowserAction model for user-facing steps:
    • navigate
    • click
    • fill
    • type
    • scroll
    • snapshot
    • wait
    • error
  2. Emit these actions from the backend adapter in a way that does not depend on renderer-side log parsing.
  3. Add a recent-action list to the Browser tab.
  4. Redact sensitive values:
    • passwords,
    • secrets,
    • auth tokens,
    • vault-backed values.
  5. Keep raw detailed logs in existing Output/Debug surfaces where appropriate, but do not block MVP on deep fusion with those tabs.

Acceptance criteria

  • The user can understand “what the agent is doing” from the Browser tab itself.
  • Sensitive inputs are not displayed.
  • Browser action state stays consistent with session lifecycle state.

Estimated product code

  • +120 to +220 LoC

Phase 6 — Human takeover and collaboration controls (follow-up)

Owner: Interaction/UX agent
Parallelizable with: after Phases 2–5 stabilize
Primary files:

  • src/browser/features/RightSidebar/BrowserTab.tsx
  • viewer transport/helper file if extracted
  • backend adapter file from Phase 2
  • src/common/types/browserSession.ts
  • src/common/orpc/schemas/api.ts (only if extra control procedures are needed)

Tasks

  1. Add explicit Take over / Return control affordances.
  2. When user takeover starts:
    • flip ownership from agent to user,
    • pause or gate agent input,
    • show a visible banner.
  3. Translate pointer/keyboard events using frame metadata.
  4. Prevent agent/user race conditions.
  5. Add a timeout/release policy for abandoned user takeover.

Acceptance criteria

  • The user can click/type into the live browser view when takeover is active.
  • Agent and user inputs never race silently.
  • Ownership state is always visible.

Estimated product code

  • +180 to +320 LoC

Phase 7 — Testing, stories, and rollout hardening

Owner: QA/verification agent
Parallelizable with: all later phases
Primary files:

  • tests/ipc/browserSession.test.ts (new)
  • tests/ui/browserTab.test.ts (new)
  • colocated pure tests only if new pure helpers are extracted
  • src/browser/stories/App.BrowserTab.stories.tsx or the nearest existing full-app story file that should absorb the new states

Test plan

  1. IPC/integration tests (tests/ipc)
    • start/stop lifecycle
    • snapshot-first subscription behavior
    • workspace isolation
    • replacement/cleanup behavior
    • missing dependency error state
  2. UI integration tests (tests/ui)
    • browser tab appears and renders idle/loading/error states
    • session start auto-surfaces the tab
    • recent actions/status text update correctly
    • app remains navigable while the tab is active
  3. Pure unit tests (colocated) only for extracted pure helpers such as:
    • coordinate mapping,
    • frame metadata normalization,
    • action redaction.
  4. Storybook/full-app story
    • idle state
    • live state with recent actions
    • ended/error/install-missing state
  5. Targeted e2e (tests/e2e) only if happy-dom is insufficient for validating the viewer transport or takeover behavior.

Validation commands

  • make typecheck
  • make static-check
  • targeted IPC/UI/e2e tests for touched areas

Rollout posture

  • Ship behind an experimental flag or equivalent internal-only exposure first.
  • Keep the feature off by default until dogfooding is stable.
  • Log session start/stop/error paths with the repo’s log helper on the backend.

Acceptance criteria

  • New code paths have targeted coverage.
  • Browser tab UI states are captured in stories.
  • Experimental rollout path is defined.

Estimated product code

  • +60 to +140 LoC

Cross-cutting design decisions the team should follow

1. Keep browser integration separate from generic MCP integration for the first delivery

Mux already has MCPServerManager, but the first delivery should not try to unify every possible browser MCP server under one viewer abstraction. Build a Mux-owned browser session service first; if future MCP tools want to publish into it, add a bridge later.

2. Treat the Browser tab as a first-class right-sidebar resident

The Browser tab should live beside Terminal, Output, and Debug; it should not open in a separate window for the MVP unless the spike proves the in-sidebar viewer is impossible.

3. Never persist raw frames

Persist, at most, lightweight metadata and redacted action history. Raw image streams are too large and too risky to store casually.

4. Prefer local viewer transport over backend relaying

If Electron/network policy allows it, the renderer should connect directly to the locally managed viewer socket. Only add a relay if direct connection is blocked or unsafe.

5. Avoid hook proliferation

Colocate live viewer logic with BrowserTab.tsx. Extract only the pieces that are genuinely reusable or become too complex to read.

Parallelization map for a team of agents

Workstream Can start when Suggested owner
Phase 0 spike immediately backend/platform agent
Phase 1 contracts/service shell after Phase 0 decision is mostly clear shared-contracts agent
Phase 3 right-sidebar shell with mocked state after Phase 1 type shape stabilizes frontend/right-sidebar agent
Phase 2 backend adapter after Phase 0 backend choice backend integration agent
Phase 4 viewer transport after Phase 2 returns a viewer endpoint contract frontend performance agent
Phase 5 action timeline after Phase 2 emits structured actions tooling/instrumentation agent
Phase 7 tests/stories begins with Phase 1 and expands as each phase lands QA/verification agent
Phase 6 takeover after MVP is stable interaction/UX agent

Dogfooding plan (required)

Dogfooding principles to follow

This plan should absorb the core discipline from the repo’s dogfood and agent-browser skills:

  • Treat dogfooding as structured exploratory QA, not a casual smoke test.
  • Use repro-first evidence: when something breaks, stop and document it immediately before moving on.
  • For interactive/behavioral issues, capture a video plus step-by-step screenshots.
  • For static/visible-on-load issues, capture a single annotated screenshot instead of wasting time on video.
  • Use agent-browser directly, never npx agent-browser.
  • Use named sessions so multiple agents can dogfood in parallel without stepping on each other.
  • Follow the core agent-browser loop: open → wait → snapshot -i → interact → re-snapshot.
  • After any navigation or major DOM change, re-snapshot before taking the next action.
  • Prefer explicit waits such as wait --load networkidle or element/url waits; only use sleeps to make repro videos human-watchable.
  • Check console/errors periodically; some regressions will not be visible in the viewport.
  • Append findings incrementally to a dogfood report so an interrupted run still leaves usable evidence.

Dogfooding harness and setup

Each dogfood run should create an isolated run ID, session name, and evidence directory.

  1. Launch the normal local Mux development flow (make dev or the team’s standard desktop/Electron dev path).
  2. Enable the experimental Browser tab feature flag.
  3. Prepare a deterministic target site or sites:
    • at least one simple navigation target,
    • one form-interaction target,
    • one failure-path target if available.
  4. Create an isolated output directory per run, for example:
    • ./dogfood-output/browser-tab/<run-id>/screenshots
    • ./dogfood-output/browser-tab/<run-id>/videos
    • ./dogfood-output/browser-tab/<run-id>/report.md
  5. Start a named agent-browser session for the target browsing workload.
  6. If authentication is required, prefer one of these, in order:
    • saved session/profile/state,
    • auth vault,
    • one-time manual login with saved state.
  7. Where feasible, constrain the run with:
    • a domain allowlist,
    • content boundaries,
    • a deterministic viewport/device preset.

Recommended command pattern for browser-side dogfooding

Use the agent-browser skill’s proven workflow for the site being driven inside the Browser tab.

RUN_ID=browser-tab-<timestamp>
SESSION=mux-browser-tab-${RUN_ID}
OUT=./dogfood-output/browser-tab/${RUN_ID}

mkdir -p ${OUT}/screenshots ${OUT}/videos

agent-browser --session ${SESSION} open <target-url> && \
agent-browser --session ${SESSION} wait --load networkidle && \
agent-browser --session ${SESSION} screenshot --annotate ${OUT}/screenshots/initial.png && \
agent-browser --session ${SESSION} snapshot -i

For authenticated or recurring scenarios, prefer --session-name, --profile, or saved state so reruns are fast and reproducible.

Structured dogfooding workflow

1. Initialize

  • Create the run directory and report file.
  • Start the named session.
  • Capture an initial annotated screenshot and interactive snapshot.
  • Record the initial Mux state showing whether the Browser tab is hidden, visible, empty, or already active.

2. Authenticate (if needed)

  • Authenticate once using a repeatable approach.
  • Save state if the scenario will be rerun.
  • Never expose raw credentials in artifacts or the report.

3. Orient

  • Map the top-level Mux workflow for this feature:
    • how a browser session starts,
    • how the Browser tab appears,
    • how the user sees status/action text,
    • how the session ends or errors.
  • Map the target site’s main interactive elements using snapshot -i.
  • Capture a baseline annotated screenshot of the target page and a screenshot of the Mux Browser tab.

4. Explore systematically

Test the feature like a real user, page by page and workflow by workflow.

At a minimum, cover:

  1. Session start / tab surfacing
    • starting a browser-backed task creates or reuses the Browser tab,
    • the tab becomes visible enough that the user notices it,
    • the initial loading state is sane.
  2. Watch-only navigation
    • navigate across multiple pages,
    • click links/buttons,
    • confirm the Browser tab stays live and visually synchronized.
  3. Form interaction + redaction
    • fill inputs and submit a harmless form,
    • confirm the recent-action list matches what happened,
    • confirm sensitive values are redacted from visible action text and persisted artifacts.
  4. Layout and right-sidebar behavior
    • resize the sidebar,
    • switch tabs away and back,
    • collapse/reopen if supported,
    • confirm the session survives UI movement.
  5. Interrupt / cleanup / replacement
    • stop a run mid-session,
    • start another session in the same workspace,
    • switch workspaces if the product allows it,
    • confirm there are no orphaned or cross-wired sessions.
  6. Error and dependency handling
    • test missing runtime / failed startup / disconnected viewer paths,
    • confirm the app shows a recoverable error state instead of crashing.
  7. Performance / backpressure
    • keep the Browser tab open during a longer run,
    • confirm the rest of the right sidebar remains responsive.
  8. Takeover flow (Phase 6 only)
    • take control, click/type, return control,
    • confirm ownership is explicit and agent/user inputs do not race.

During exploration, use the agent-browser workflow rigorously:

  • snapshot -i before discovering refs,
  • interact via refs,
  • wait --load networkidle or element/url waits after major actions,
  • re-snapshot after navigation or DOM mutation,
  • check errors and console periodically,
  • optionally use diff snapshot when validating that an action changed the page as expected.

Repro-first issue documentation rules

When a bug is found, stop exploring and document it immediately.

Interactive / behavioral issues

Examples: wrong action log, frozen stream, mismatched viewport, takeover race, session cleanup bug, visible console error after an action.

Required evidence:

  1. Start a repro video before reproducing.
  2. Reproduce at human pace.
  3. Capture a screenshot for each significant step.
  4. Pause on the broken state and capture an annotated screenshot.
  5. Stop the video.
  6. Append the issue to report.md immediately with:
    • issue ID (ISSUE-001, etc.),
    • severity,
    • exact repro steps,
    • expected result,
    • actual result,
    • screenshot/video filenames.

When typing is part of the observable repro, prefer type over fill so the video is understandable.

Static / visible-on-load issues

Examples: clipped text, wrong icon/state, bad empty state copy, layout overlap, stale title/url, immediately visible console error.

Required evidence:

  1. Capture a single annotated screenshot.
  2. Append a concise issue entry to report.md immediately.
  3. Mark repro video as N/A.

Evidence requirements per milestone

For every milestone review, provide both broad milestone evidence and issue-specific evidence.

Broad milestone evidence

  • At least 2 screenshots:
    • one of the Browser tab during a live session,
    • one of an ended/error/install-missing state.
  • At least 1 short video showing the agent actively browsing while the Browser tab is visible in Mux.

Issue-specific evidence

  • Every reproducible interactive issue gets:
    • one repro video,
    • step-by-step screenshots,
    • one annotated result screenshot.
  • Every reproducible static issue gets:
    • one annotated screenshot.

Where practical, capture both:

  • the Mux-side evidence (the Browser tab visible in the app), and
  • the browser-side evidence (agent-browser screenshots/video of the underlying session).

Wrap-up procedure

At the end of each dogfood run:

  1. Re-read the report and make sure summary counts match the actual issue list.
  2. Explicitly note whether the run found:
    • blocking issues,
    • moderate issues,
    • minor issues,
    • or no additional reproducible issues.
  3. Close the named agent-browser session.
  4. Preserve all artifacts; do not delete screenshots, videos, or reports mid-run.
  5. Attach screenshots and the key video to the implementation handoff/review.

Phase quality gates tied to dogfooding

  • After Phase 0 spike: one live-session screenshot, one short start/stop video, one note on runtime/install friction.
  • Before Milestone M1 sign-off: complete one structured exploratory run with evidence across start, navigation, form interaction, resize/tab switching, interrupt, and error handling.
  • Before Milestone M2 sign-off: complete one structured run specifically validating action-log fidelity and redaction behavior.
  • Before Milestone M3 sign-off: complete one structured run specifically validating takeover ownership, input arbitration, and recovery.

Parallel-team guidance

If multiple agents dogfood simultaneously:

  • each agent must use a unique session name,
  • each agent must write to a separate run directory,
  • each agent must append findings to its own report first, then merge findings into the shared review summary.

Aim for the depth of coverage that would normally yield 5–10 well-documented findings’ worth of exploration. If fewer issues are found, state explicitly that no additional reproducible issues were observed rather than inventing weak findings.

Final milestone definitions

Milestone M1 — Visible viewer MVP

Includes Phases 0–4 and the Phase 7 test/story minimums.

Success means:

  • a right-sidebar Browser tab exists,
  • a live browser session can be viewed there,
  • the UI stays stable,
  • lifecycle errors are recoverable.

Milestone M2 — “See what the agent is doing” product pass

Adds Phase 5 and expands verification.

Success means:

  • the Browser tab shows live viewport + current/recent actions,
  • the user can correlate the visible browser with agent intent,
  • sensitive actions are redacted correctly.

Milestone M3 — Human collaboration pass

Adds Phase 6.

Success means:

  • the user can safely take over and hand control back,
  • ownership is explicit,
  • sessions remain stable under collaboration.

Recommended first implementation order

  1. Phase 0 spike
  2. Phase 1 shared contracts and ORPC shell
  3. Phase 3 Browser tab shell using mocked/stubbed session state
  4. Phase 2 real backend adapter and lifecycle
  5. Phase 4 live viewer transport/perf hardening
  6. Phase 5 action timeline sync
  7. Phase 7 full verification/story coverage
  8. Phase 6 takeover only after M1/M2 are solid

What not to do in the first pass

  • Do not start with multi-session browser tabs.
  • Do not make this a separate popout-only window.
  • Do not route raw frame streams through generic chat/tool message rendering.
  • Do not block the feature on deep Debug/Output/MCP unification.
  • Do not attempt full browser replay/history storage.
  • Do not ship without screenshots/video from dogfooding.

Generated with mux • Model: anthropic:claude-opus-4-6 • Thinking: xhigh • Cost: $52.11

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 62add94c74

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Addressed the P1 review comment: added a this.disposed guard in start() so that if stop() cancels the open command, the clean "ended" state from stop() is preserved instead of being overwritten by transitionToError().

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc063e0722

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Addressed the P2 review comment: BrowserSessionService.dispose() now calls backend.stop() instead of backend.dispose(), ensuring the agent-browser close command is sent during shutdown to prevent orphaned browser processes.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 60f8420a8c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 force-pushed the agent-browser-r1as branch from 60f8420 to f873fdc Compare March 14, 2026 20:37
@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Rebased on main, resolved conflicts with upstream portabledesktop (#2950) changes, and fixed merge resolution syntax issues.

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Fixed cleanupWorkspace() to call backend.stop() instead of backend.dispose(), ensuring agent-browser close is sent during error recovery/session replacement to prevent orphaned processes.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e81cb06fa

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Changes since last review:

  1. Fixed backend replacement race condition — callbacks now guard by backend identity, and cleanupWorkspace awaits stop() before replacement
  2. Gated Browser tab behind AGENT_BROWSER experiment flag — off by default, togglable in Settings → Experiments, follows the same pattern as PORTABLE_DESKTOP (no platform restriction)
  3. Browser tab no longer appears in default layouts; it's dynamically added/removed based on experiment state
  4. Updated E2E + unit tests for default tab counts (without browser)
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ce580631e5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Added a startingWorkspaces guard set to prevent concurrent startSession calls from creating orphaned backends. If a start is already in-flight for a workspace, the concurrent call returns the existing session instead of creating a duplicate.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1bfde2640

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Fixed both P2s: (1) concurrent start guard now only returns nonterminal sessions, (2) action timeline clears when a new session ID arrives on restart.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b033cc8933

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33
Copy link
Copy Markdown
Member Author

@codex review

Fixed final P1+P2: (1) concurrent startSession calls now return the same in-flight promise via startPromises map, (2) action timeline clears on any session ID change including null → newId.

@chatgpt-codex-connector
Copy link
Copy Markdown

Codex Review: Didn't find any major issues. Delightful!

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

@ThomasK33 ThomasK33 added this pull request to the merge queue Mar 14, 2026
Merged via the queue into main with commit 580c212 Mar 14, 2026
24 checks passed
@ThomasK33 ThomasK33 deleted the agent-browser-r1as branch March 14, 2026 22:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant