Skip to content

security: redact secrets from execute_code output and all tool results#4364

Open
0xbyt4 wants to merge 3 commits intoNousResearch:mainfrom
0xbyt4:fix/secret-exfil-redaction
Open

security: redact secrets from execute_code output and all tool results#4364
0xbyt4 wants to merge 3 commits intoNousResearch:mainfrom
0xbyt4:fix/secret-exfil-redaction

Conversation

@0xbyt4
Copy link
Copy Markdown
Contributor

@0xbyt4 0xbyt4 commented Mar 31, 2026

Summary

Multiple secret exfiltration vectors where API keys and tokens from .env could leak into LLM context or persistent storage, bypassing existing redaction.

Vulnerabilities Fixed

1. execute_code (PTC) — raw stdout returned without redaction

execute_code runs Python scripts and returns stdout to LLM context. No redaction was applied:

import os; print(os.environ["ANTHROPIC_API_KEY"])

This bypasses terminal_tool's redaction entirely.

Fix: Apply redact_sensitive_text to stdout and stderr in code_execution_tool.py.

2. All tool results — no defense-in-depth redaction

Only terminal_tool and file_tools had redaction. Browser, MCP, vision, delegate, and all other tools returned raw output to LLM context.

Fix: Add redact_sensitive_text to both sequential and concurrent tool result paths in run_agent.py — catches any tool that doesn't redact internally.

3. Memory persistence — secrets writable to memory files

Memory entries are injected into the system prompt on every session. A prompt injection that saves ANTHROPIC_API_KEY=sk-ant-... to memory would exfiltrate it across all future sessions.

Fix: Add API key and KEY=value pattern scanning to _scan_memory_content in memory_tool.py.

4. Skill persistence — secrets writable to skill files

Skill files are loaded into context when referenced. Same exfiltration vector as memory.

Fix: Add _scan_skill_for_secrets to skill_manage — blocks create, edit, patch, and write_file actions containing secrets.

What was vulnerable

Vector Had protection? Fix
terminal_tool output Yes (existing)
file_tools read Yes (existing)
execute_code stdout No redact_sensitive_text on stdout/stderr
All tool results → LLM No defense-in-depth in run_agent.py
Memory persistence No secret pattern scanning
Skill persistence No secret pattern scanning

Test plan

  • 48 redaction tests passing (38 existing + 10 new)
  • execute_code: env var, OpenRouter key, multi-key dump, non-secret passthrough
  • memory: blocks API key, blocks env assignment, allows normal content
  • skill: blocks API key, blocks env assignment, allows normal content
0xbyt4 added 3 commits April 1, 2026 01:10
execute_code (PTC) returned script stdout/stderr to LLM context without
redaction. An agent could exfiltrate .env secrets via:
  import os; print(os.environ["ANTHROPIC_API_KEY"])
bypassing terminal_tool's redaction entirely.

Fix:
- Apply redact_sensitive_text to execute_code stdout/stderr
- Add defense-in-depth redaction in the sequential tool result path
  (run_agent.py) so ALL tool outputs are redacted before entering
  LLM context, regardless of whether individual tools redact

Added 4 tests verifying secret redaction in script output.
The concurrent (parallel) tool execution path was missing the
defense-in-depth redaction added to the sequential path. Tool
results from parallel execution could enter LLM context unredacted.
Memory and skill files are injected into the system prompt on every
session. A prompt injection that saves a secret to memory/skill would
exfiltrate it across all future sessions.

- Add secret pattern scanning (API keys, KEY=value assignments) to
  memory_tool's _scan_memory_content
- Add _scan_skill_for_secrets to skill_manage — blocks create, edit,
  patch, and write_file actions containing secrets
- 6 new tests for memory/skill secret blocking
# Without this, execute_code can exfiltrate .env secrets via
# `import os; print(os.environ)` bypassing terminal redaction.
from agent.redact import redact_sensitive_text
stdout_text = redact_sensitive_text(stdout_text)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pattern matching is not enough. It is not possible to know the pattern of every secret value and the KEY=VALUE pattern is already defeated in the sample I shared.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants