Skip to content

bug: stale SAGEOX_AGENT_ID leaks across Claude Code sessions, breaking session recording #258

@galexy

Description

@galexy

Summary

Session recording silently produces zero entries when a user runs /clear in a Claude Code session that follows a previous session in the same repo. The /clear command appears to corrupt the env var state — reverting SAGEOX_AGENT_ID to the previous (dead) session's value — which causes recording to start under the wrong agent. PostToolUse hooks then silently noop because they resolve a different agent ID from the session marker. The session shows as ⊘ ghost with entry_count: 0.

Likely trigger: /clear between or within Claude Code sessions.

Observed in ~/src/github.com/galexy/edgar-diff on 2026-03-16.

Why /clear is the likely trigger

/clear causes two simultaneous breaks:

1. Env state corruption

Before /clear, the SessionStart hook had correctly written SAGEOX_AGENT_ID=OxU6Rh to session-env/4f2d5455.../sessionstart-hook-1.sh. The Bash tool env should have had OxU6Rh. But after /clear, the Bash tool env reverted to SAGEOX_AGENT_ID=Ox4f8n (the previous dead session's value).

This suggests /clear either:

  • Resets Claude Code's env sourcing, causing it to stop applying the current session's hook env file
  • Causes a re-source that picks up stale values from the previous session
  • Clears the session-env state without the /clear-triggered SessionStart hook properly repopulating it

The sessionstart-hook-1.sh file on disk still has the correct OxU6Rh values — but the Bash tool runtime sees the wrong Ox4f8n values. The file is right; the sourcing is broken.

2. Context loss forces redundant re-prime

/clear wipes the agent's conversation context, erasing the SessionStart hook's prime output (<session-context agent_id="OxU6Rh">). The agent then follows CLAUDE.md's instruction to run ox agent prime, which:

  • Can't find the session marker (CLAUDE_CODE_SESSION_ID not in Bash env)
  • Generates a third agent ID (OxnIET)
  • Can't update the env file (write gated on agentSessionID != "")
  • Leaves the stale Ox4f8n values permanently uncorrected

Without /clear, the agent would have retained OxU6Rh in its context from the hook output and the env would have stayed correct. /clear broke both simultaneously.

Timeline (reverse-engineered from transcript analysis)

All times PDT. Two Claude Code sessions in the same repo, same terminal.

Session 1 (dead) — Claude Code session 4f6f42b3...

Time Event Agent ID PID
16:27 SessionStart hook fires → runPrimeForHookox agent prime Creates Ox4f8n 160975
16:27 Hook writes to session-env/4f6f42b3.../sessionstart-hook-1.sh SAGEOX_AGENT_ID=Ox4f8n
16:27 Session marker written for 4f6f42b3...Ox4f8n
~16:30 User exits Claude Code PID 160975 dies

Session 2 (current) — Claude Code session 4f2d5455...

Time Event Agent ID PID
16:31 SessionStart hook fires → runPrimeForHookox agent prime Creates OxU6Rh 163173
16:31 Hook writes to session-env/4f2d5455.../sessionstart-hook-1.sh SAGEOX_AGENT_ID=OxU6Rh
16:31 Session marker written for 4f2d5455...OxU6Rh
~16:31 User runs /clear ← THE TRIGGER
/clear wipes agent context (OxU6Rh identity lost)
/clear triggers SessionStart:clear hooks (re-prime) Re-primes OxU6Rh
Env state corrupted: SAGEOX_AGENT_ID reverts to Ox4f8n (stale)
16:31 Agent reads CLAUDE.md, runs ox agent prime via Bash tool
CLAUDE_CODE_SESSION_ID NOT in Bash env → can't find session marker
agentSessionID="" → generates NEW agent ID Creates OxnIET 163315
SAGEOX_AGENT_ID=Ox4f8n in env (stale!) → sets parent_agent_id=Ox4f8n
agentSessionID="" → env file write SKIPPED (gated on non-empty session ID)
16:35 Agent runs /ox-session-startox agent session start (no explicit agent ID)
↳ Dispatcher uses SAGEOX_AGENT_ID from env → resolves to Ox4f8n (stale!)
↳ Recording started under Ox4f8n (dead agent, PID 160975)
16:35+ PostToolUse hooks fire
↳ Hook reads session marker for 4f2d5455... → gets OxU6Rh
↳ Looks for recording state for OxU6Rh → NOT FOUND (recording is under Ox4f8n)
Silent noop — entries never captured
16:40 ox session status shows recording under Ox4f8n, agent_alive: false
entry_count: 0, process_status: dead, ⊘ ghost

Result: 3 agent IDs, none correlated

Source Agent ID Role
SessionStart hook (automatic) OxU6Rh Correct for this session, but only hooks know about it
Manual ox agent prime (Bash tool) OxnIET Prime output injected into context, but orphaned
ox agent session start (env var) Ox4f8n Dead agent from previous session — recording started here
PostToolUse hooks (marker lookup) OxU6Rh Doesn't match recording (Ox4f8n) → noop

Root Causes

1. /clear corrupts env var state (likely trigger)

The session-env/4f2d5455.../sessionstart-hook-1.sh file correctly has SAGEOX_AGENT_ID=OxU6Rh. But after /clear, the Bash tool environment sees SAGEOX_AGENT_ID=Ox4f8n (from the previous dead session 4f6f42b3...).

Evidence: SAGEOX_SESSION_ID=oxsid_01KKWFMBAYYSF0CKKPF191YJ5W in the env output (transcript line 132) — this is unambiguously Ox4f8n's server session ID. And ox agent session start (no explicit agent ID) resolved to Ox4f8n.

The hook env file on disk is correct (OxU6Rh). The runtime env is wrong (Ox4f8n). This disconnect points to /clear disrupting Claude Code's env file sourcing — either the /clear-triggered SessionStart:clear hook's env writes aren't picked up, or /clear resets env state to a point before the current session's hook ran.

2. CLAUDE_CODE_SESSION_ID not in Bash tool environment

Confirmed absent from the Bash tool environment (transcript line 132 and verified in a parallel ox session). This means any ox agent prime call from a Bash tool cannot find the session marker (which is keyed by session_id). Without the marker:

  • A new agent ID is generated every time
  • The env file write is SKIPPED (gated on agentSessionID != "" at agent_prime.go:764)
  • The stale env vars are never corrected

3. Env file write gated on agentSessionID

In agent_prime.go:764-787, the session marker write AND env file write are both inside if agentSessionID != "". When the manual prime can't determine the session ID (Bug 2), it can't update the env file, so stale values from the previous session persist indefinitely.

4. No agent ID validation at session start

ox agent session start (without explicit agent ID) blindly trusts SAGEOX_AGENT_ID from the environment. It doesn't validate that the agent is alive, belongs to the current session, or has a matching session marker.

Cascading Failure Chain

/clear wipes agent context + corrupts env state
    ↓
Agent re-primes manually (CLAUDE.md instruction)
    ↓
CLAUDE_CODE_SESSION_ID missing from Bash env
    → manual prime can't find session marker
    → generates new agent ID (OxnIET)
    → can't write to env file (agentSessionID="")
    → stale SAGEOX_AGENT_ID=Ox4f8n persists
    ↓
ox agent session start uses Ox4f8n (dead)
    → recording created under dead agent
    ↓
PostToolUse hooks use marker's OxU6Rh
    → no recording found for OxU6Rh
    → silent noop on every hook
    → entry_count stays at 0
    → session is ghost

Evidence

Key transcript lines

Line Event Evidence
2-3 SessionStart:clear hooks fire (triggered by /clear) bd prime + ox agent hook SessionStart
5 /clear command Conversation context wiped
11 Agent manually runs ox agent prime Creates OxnIET (env has stale Ox4f8n)
13 Prime result <session-context agent_id="OxnIET">
63 ox agent session start No explicit agent ID — uses env
65 Session start result "agent_id": "Ox4f8n" (stale from dead session!)
99 ox session status --json --current "agent_alive": false, "process_status": "dead"
132 env | grep -i session SAGEOX_SESSION_ID=oxsid_01KKWFMBAYYSF0CKKPF191YJ5W (Ox4f8n's)

Disk vs runtime contradiction

Location SAGEOX_AGENT_ID Source
session-env/4f2d5455.../sessionstart-hook-1.sh (on disk) OxU6Rh ✓ correct Hook wrote correctly
Bash tool runtime (after /clear) Ox4f8n ✗ stale Previous dead session leaked

Suggested Fix Areas

  1. Investigate /clear env sourcing — understand why /clear causes Claude Code to stop applying the current session's hook env file and revert to stale values. May require upstream Claude Code fix or workaround.
  2. Validate agent liveness at session start — refuse to start recording under a dead agent; check IsProcessAlive() before using env-provided agent ID
  3. Remove env file write gate on agentSessionID — always write env vars during prime so stale values get corrected even when session ID is unavailable
  4. Fall back to session marker in PostToolUse hooks — if SAGEOX_AGENT_ID doesn't match the marker, prefer the marker
  5. Add CLAUDE_CODE_SESSION_ID to Bash tool env — may require upstream Claude Code change, but would fix marker lookup from manual prime
  6. Detect ghost recordings at session start — clean up stale recordings from dead agents before starting new ones

Reproduction

  1. Start Claude Code in a repo with ox initialized and session recording enabled
  2. Let SessionStart hook fire (creates agent ID A)
  3. Exit Claude Code (agent A dies)
  4. Start Claude Code again in the same repo (creates agent ID B)
  5. Run /clear
  6. Let the agent run ox agent prime and /ox-session-start
  7. Observe: session recording starts under dead agent A, not B
  8. All PostToolUse hooks silently noop; ox session status shows ghost

Environment

  • ox v0.5.1
  • Claude Code 2.1.76
  • Linux 6.17.0-19-generic
  • Repo: galexy/edgar-diff

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions