security: redact secrets from execute_code output and all tool results#4364
Open
0xbyt4 wants to merge 3 commits intoNousResearch:mainfrom
Open
security: redact secrets from execute_code output and all tool results#43640xbyt4 wants to merge 3 commits intoNousResearch:mainfrom
0xbyt4 wants to merge 3 commits intoNousResearch:mainfrom
Conversation
execute_code (PTC) returned script stdout/stderr to LLM context without redaction. An agent could exfiltrate .env secrets via: import os; print(os.environ["ANTHROPIC_API_KEY"]) bypassing terminal_tool's redaction entirely. Fix: - Apply redact_sensitive_text to execute_code stdout/stderr - Add defense-in-depth redaction in the sequential tool result path (run_agent.py) so ALL tool outputs are redacted before entering LLM context, regardless of whether individual tools redact Added 4 tests verifying secret redaction in script output.
The concurrent (parallel) tool execution path was missing the defense-in-depth redaction added to the sequential path. Tool results from parallel execution could enter LLM context unredacted.
Memory and skill files are injected into the system prompt on every session. A prompt injection that saves a secret to memory/skill would exfiltrate it across all future sessions. - Add secret pattern scanning (API keys, KEY=value assignments) to memory_tool's _scan_memory_content - Add _scan_skill_for_secrets to skill_manage — blocks create, edit, patch, and write_file actions containing secrets - 6 new tests for memory/skill secret blocking
jeremyjh
reviewed
Mar 31, 2026
| # Without this, execute_code can exfiltrate .env secrets via | ||
| # `import os; print(os.environ)` bypassing terminal redaction. | ||
| from agent.redact import redact_sensitive_text | ||
| stdout_text = redact_sensitive_text(stdout_text) |
There was a problem hiding this comment.
Pattern matching is not enough. It is not possible to know the pattern of every secret value and the KEY=VALUE pattern is already defeated in the sample I shared.
2 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Multiple secret exfiltration vectors where API keys and tokens from
.envcould leak into LLM context or persistent storage, bypassing existing redaction.Vulnerabilities Fixed
1. execute_code (PTC) — raw stdout returned without redaction
execute_coderuns Python scripts and returns stdout to LLM context. No redaction was applied:This bypasses
terminal_tool's redaction entirely.Fix: Apply
redact_sensitive_textto stdout and stderr incode_execution_tool.py.2. All tool results — no defense-in-depth redaction
Only
terminal_toolandfile_toolshad redaction. Browser, MCP, vision, delegate, and all other tools returned raw output to LLM context.Fix: Add
redact_sensitive_textto both sequential and concurrent tool result paths inrun_agent.py— catches any tool that doesn't redact internally.3. Memory persistence — secrets writable to memory files
Memory entries are injected into the system prompt on every session. A prompt injection that saves
ANTHROPIC_API_KEY=sk-ant-...to memory would exfiltrate it across all future sessions.Fix: Add API key and KEY=value pattern scanning to
_scan_memory_contentinmemory_tool.py.4. Skill persistence — secrets writable to skill files
Skill files are loaded into context when referenced. Same exfiltration vector as memory.
Fix: Add
_scan_skill_for_secretstoskill_manage— blocks create, edit, patch, and write_file actions containing secrets.What was vulnerable
Test plan