-
Notifications
You must be signed in to change notification settings - Fork 5
bug: stale SAGEOX_AGENT_ID leaks across Claude Code sessions, breaking session recording #258
Description
Summary
Session recording silently produces zero entries when a user runs /clear in a Claude Code session that follows a previous session in the same repo. The /clear command appears to corrupt the env var state — reverting SAGEOX_AGENT_ID to the previous (dead) session's value — which causes recording to start under the wrong agent. PostToolUse hooks then silently noop because they resolve a different agent ID from the session marker. The session shows as ⊘ ghost with entry_count: 0.
Likely trigger: /clear between or within Claude Code sessions.
Observed in ~/src/github.com/galexy/edgar-diff on 2026-03-16.
Why /clear is the likely trigger
/clear causes two simultaneous breaks:
1. Env state corruption
Before /clear, the SessionStart hook had correctly written SAGEOX_AGENT_ID=OxU6Rh to session-env/4f2d5455.../sessionstart-hook-1.sh. The Bash tool env should have had OxU6Rh. But after /clear, the Bash tool env reverted to SAGEOX_AGENT_ID=Ox4f8n (the previous dead session's value).
This suggests /clear either:
- Resets Claude Code's env sourcing, causing it to stop applying the current session's hook env file
- Causes a re-source that picks up stale values from the previous session
- Clears the session-env state without the
/clear-triggered SessionStart hook properly repopulating it
The sessionstart-hook-1.sh file on disk still has the correct OxU6Rh values — but the Bash tool runtime sees the wrong Ox4f8n values. The file is right; the sourcing is broken.
2. Context loss forces redundant re-prime
/clear wipes the agent's conversation context, erasing the SessionStart hook's prime output (<session-context agent_id="OxU6Rh">). The agent then follows CLAUDE.md's instruction to run ox agent prime, which:
- Can't find the session marker (
CLAUDE_CODE_SESSION_IDnot in Bash env) - Generates a third agent ID (
OxnIET) - Can't update the env file (write gated on
agentSessionID != "") - Leaves the stale
Ox4f8nvalues permanently uncorrected
Without /clear, the agent would have retained OxU6Rh in its context from the hook output and the env would have stayed correct. /clear broke both simultaneously.
Timeline (reverse-engineered from transcript analysis)
All times PDT. Two Claude Code sessions in the same repo, same terminal.
Session 1 (dead) — Claude Code session 4f6f42b3...
| Time | Event | Agent ID | PID |
|---|---|---|---|
| 16:27 | SessionStart hook fires → runPrimeForHook → ox agent prime |
Creates Ox4f8n | 160975 |
| 16:27 | Hook writes to session-env/4f6f42b3.../sessionstart-hook-1.sh |
SAGEOX_AGENT_ID=Ox4f8n |
|
| 16:27 | Session marker written for 4f6f42b3... → Ox4f8n |
||
| ~16:30 | User exits Claude Code | PID 160975 dies |
Session 2 (current) — Claude Code session 4f2d5455...
| Time | Event | Agent ID | PID |
|---|---|---|---|
| 16:31 | SessionStart hook fires → runPrimeForHook → ox agent prime |
Creates OxU6Rh | 163173 |
| 16:31 | Hook writes to session-env/4f2d5455.../sessionstart-hook-1.sh |
SAGEOX_AGENT_ID=OxU6Rh |
|
| 16:31 | Session marker written for 4f2d5455... → OxU6Rh |
||
| ~16:31 | User runs /clear ← THE TRIGGER |
||
↳ /clear wipes agent context (OxU6Rh identity lost) |
|||
↳ /clear triggers SessionStart:clear hooks (re-prime) |
Re-primes OxU6Rh | ||
↳ Env state corrupted: SAGEOX_AGENT_ID reverts to Ox4f8n (stale) |
|||
| 16:31 | Agent reads CLAUDE.md, runs ox agent prime via Bash tool |
||
↳ CLAUDE_CODE_SESSION_ID NOT in Bash env → can't find session marker |
|||
↳ agentSessionID="" → generates NEW agent ID |
Creates OxnIET | 163315 | |
↳ SAGEOX_AGENT_ID=Ox4f8n in env (stale!) → sets parent_agent_id=Ox4f8n |
|||
↳ agentSessionID="" → env file write SKIPPED (gated on non-empty session ID) |
|||
| 16:35 | Agent runs /ox-session-start → ox agent session start (no explicit agent ID) |
||
↳ Dispatcher uses SAGEOX_AGENT_ID from env → resolves to Ox4f8n (stale!) |
|||
| ↳ Recording started under Ox4f8n (dead agent, PID 160975) | |||
| 16:35+ | PostToolUse hooks fire | ||
↳ Hook reads session marker for 4f2d5455... → gets OxU6Rh |
|||
| ↳ Looks for recording state for OxU6Rh → NOT FOUND (recording is under Ox4f8n) | |||
| ↳ Silent noop — entries never captured | |||
| 16:40 | ox session status shows recording under Ox4f8n, agent_alive: false |
||
↳ entry_count: 0, process_status: dead, ⊘ ghost |
Result: 3 agent IDs, none correlated
| Source | Agent ID | Role |
|---|---|---|
| SessionStart hook (automatic) | OxU6Rh | Correct for this session, but only hooks know about it |
Manual ox agent prime (Bash tool) |
OxnIET | Prime output injected into context, but orphaned |
ox agent session start (env var) |
Ox4f8n | Dead agent from previous session — recording started here |
| PostToolUse hooks (marker lookup) | OxU6Rh | Doesn't match recording (Ox4f8n) → noop |
Root Causes
1. /clear corrupts env var state (likely trigger)
The session-env/4f2d5455.../sessionstart-hook-1.sh file correctly has SAGEOX_AGENT_ID=OxU6Rh. But after /clear, the Bash tool environment sees SAGEOX_AGENT_ID=Ox4f8n (from the previous dead session 4f6f42b3...).
Evidence: SAGEOX_SESSION_ID=oxsid_01KKWFMBAYYSF0CKKPF191YJ5W in the env output (transcript line 132) — this is unambiguously Ox4f8n's server session ID. And ox agent session start (no explicit agent ID) resolved to Ox4f8n.
The hook env file on disk is correct (OxU6Rh). The runtime env is wrong (Ox4f8n). This disconnect points to /clear disrupting Claude Code's env file sourcing — either the /clear-triggered SessionStart:clear hook's env writes aren't picked up, or /clear resets env state to a point before the current session's hook ran.
2. CLAUDE_CODE_SESSION_ID not in Bash tool environment
Confirmed absent from the Bash tool environment (transcript line 132 and verified in a parallel ox session). This means any ox agent prime call from a Bash tool cannot find the session marker (which is keyed by session_id). Without the marker:
- A new agent ID is generated every time
- The env file write is SKIPPED (gated on
agentSessionID != ""atagent_prime.go:764) - The stale env vars are never corrected
3. Env file write gated on agentSessionID
In agent_prime.go:764-787, the session marker write AND env file write are both inside if agentSessionID != "". When the manual prime can't determine the session ID (Bug 2), it can't update the env file, so stale values from the previous session persist indefinitely.
4. No agent ID validation at session start
ox agent session start (without explicit agent ID) blindly trusts SAGEOX_AGENT_ID from the environment. It doesn't validate that the agent is alive, belongs to the current session, or has a matching session marker.
Cascading Failure Chain
/clear wipes agent context + corrupts env state
↓
Agent re-primes manually (CLAUDE.md instruction)
↓
CLAUDE_CODE_SESSION_ID missing from Bash env
→ manual prime can't find session marker
→ generates new agent ID (OxnIET)
→ can't write to env file (agentSessionID="")
→ stale SAGEOX_AGENT_ID=Ox4f8n persists
↓
ox agent session start uses Ox4f8n (dead)
→ recording created under dead agent
↓
PostToolUse hooks use marker's OxU6Rh
→ no recording found for OxU6Rh
→ silent noop on every hook
→ entry_count stays at 0
→ session is ghost
Evidence
- Claude Code transcript: https://gist.github.com/galexy/70ac4de0e42986a4f4790b564b5c9303
- Disk state evidence (markers, env files, recording state): https://gist.github.com/galexy/55cd0da7706a8b40562c3366533339d9
Key transcript lines
| Line | Event | Evidence |
|---|---|---|
| 2-3 | SessionStart:clear hooks fire (triggered by /clear) |
bd prime + ox agent hook SessionStart |
| 5 | /clear command |
Conversation context wiped |
| 11 | Agent manually runs ox agent prime |
Creates OxnIET (env has stale Ox4f8n) |
| 13 | Prime result | <session-context agent_id="OxnIET"> |
| 63 | ox agent session start |
No explicit agent ID — uses env |
| 65 | Session start result | "agent_id": "Ox4f8n" (stale from dead session!) |
| 99 | ox session status --json --current |
"agent_alive": false, "process_status": "dead" |
| 132 | env | grep -i session |
SAGEOX_SESSION_ID=oxsid_01KKWFMBAYYSF0CKKPF191YJ5W (Ox4f8n's) |
Disk vs runtime contradiction
| Location | SAGEOX_AGENT_ID |
Source |
|---|---|---|
session-env/4f2d5455.../sessionstart-hook-1.sh (on disk) |
OxU6Rh ✓ correct | Hook wrote correctly |
Bash tool runtime (after /clear) |
Ox4f8n ✗ stale | Previous dead session leaked |
Suggested Fix Areas
- Investigate
/clearenv sourcing — understand why/clearcauses Claude Code to stop applying the current session's hook env file and revert to stale values. May require upstream Claude Code fix or workaround. - Validate agent liveness at session start — refuse to start recording under a dead agent; check
IsProcessAlive()before using env-provided agent ID - Remove env file write gate on
agentSessionID— always write env vars during prime so stale values get corrected even when session ID is unavailable - Fall back to session marker in PostToolUse hooks — if
SAGEOX_AGENT_IDdoesn't match the marker, prefer the marker - Add
CLAUDE_CODE_SESSION_IDto Bash tool env — may require upstream Claude Code change, but would fix marker lookup from manual prime - Detect ghost recordings at session start — clean up stale recordings from dead agents before starting new ones
Reproduction
- Start Claude Code in a repo with ox initialized and session recording enabled
- Let SessionStart hook fire (creates agent ID A)
- Exit Claude Code (agent A dies)
- Start Claude Code again in the same repo (creates agent ID B)
- Run
/clear - Let the agent run
ox agent primeand/ox-session-start - Observe: session recording starts under dead agent A, not B
- All PostToolUse hooks silently noop;
ox session statusshows ghost
Environment
- ox v0.5.1
- Claude Code 2.1.76
- Linux 6.17.0-19-generic
- Repo:
galexy/edgar-diff