feat: replace inline nudges with background memory/skill review by teknium1 · Pull Request #2235 · NousResearch/hermes-agent

teknium1 · 2026-03-20T22:31:32Z

Summary

Replaces the inline memory/skill nudges — which polluted 43% of user messages with backward-looking system instructions — with a background agent that reviews and saves independently after the main response is delivered.

The problem

Memory and skill nudges were appended directly to the user's message content:

User's actual message: "fix this bug"
What the model saw:    "fix this bug

[System: The previous task involved many tool calls. Save the approach as a skill...]
[System: You've had several exchanges. Consider: has the user shared preferences...]"

The model had to choose between the user's forward-looking task and the backward-looking system directives. In 2 confirmed cases, the agent spent 1-3 tool calls on memory/skill work before starting the user's task. The nudges were also permanently stored in conversation history, polluting session transcripts.

The solution

When nudge conditions are met, a background review agent spawns after the main response completes:

# After the agent finishes responding to the user:
threading.Thread(target=_run_review, daemon=True).start()

The review agent:

Uses the main model (same quality, not auxiliary — skills/memory are high-precision)
Gets a read-only snapshot of the conversation
Has only memory + skill_manage tools (5 iteration budget)
Shares the memory store so writes persist immediately
Runs with quiet_mode=True, skip_context_files=True
Never modifies the main conversation or produces user-visible output
All exceptions caught — can never affect the main session

What doesn't change

Trigger conditions: still every 10 user turns (memory) and after 10+ tool iterations (skills)
Token cost: same context processed either way, just on a separate track
Memory/skill quality: actually better — dedicated prompt for review vs a hint appended to an unrelated message

Changes

run_agent.py:

Remove nudge injection from run_conversation() (lines 5225-5249 → tracking only, no user_message +=)
Add _spawn_background_review() method with _BACKGROUND_REVIEW_PROMPT
Add background fork trigger after response delivery (before return)
Net: +77 lines, -15 lines

Test plan

5670 passed, 200 skipped, 23 deselected

Closes #2227.

Remove the memory and skill nudges that were appended directly to user messages, causing backward-looking system instructions to compete with forward-looking user tasks. Found in 43% of user messages across 15 sessions, with confirmed cases of the agent spending tool calls on nudge responses before starting the user's actual request. Replace with a background review agent that runs AFTER the main agent finishes responding: - Spawns a background thread with a snapshot of the conversation - Uses the main model (not auxiliary) for high-precision memory/skill work - Only has memory + skill_manage tools (5 iteration budget) - Shares the memory store for direct writes - Never modifies the main conversation history - Never competes with the user's task for model attention - Zero latency impact (runs after response is delivered) - Same token cost (processes the same context, just on a separate track) The trigger conditions are unchanged (every 10 user turns for memory, after 10+ tool iterations for skills). Only the execution path changes: from inline injection to background fork. Closes #2227.

…Research#2235) Remove the memory and skill nudges that were appended directly to user messages, causing backward-looking system instructions to compete with forward-looking user tasks. Found in 43% of user messages across 15 sessions, with confirmed cases of the agent spending tool calls on nudge responses before starting the user's actual request. Replace with a background review agent that runs AFTER the main agent finishes responding: - Spawns a background thread with a snapshot of the conversation - Uses the main model (not auxiliary) for high-precision memory/skill work - Only has memory + skill_manage tools (5 iteration budget) - Shares the memory store for direct writes - Never modifies the main conversation history - Never competes with the user's task for model attention - Zero latency impact (runs after response is delivered) - Same token cost (processes the same context, just on a separate track) The trigger conditions are unchanged (every 10 user turns for memory, after 10+ tool iterations for skills). Only the execution path changes: from inline injection to background fork. Closes NousResearch#2227. Co-authored-by: Test <test@test.com>

teknium1 force-pushed the hermes/hermes-3369cdb1 branch 3 times, most recently from 42ae0be to eaf3eec Compare March 20, 2026 23:59

teknium1 force-pushed the hermes/hermes-3369cdb1 branch from eaf3eec to 470d89c Compare March 21, 2026 01:28

teknium1 merged commit 45058b4 into main Mar 21, 2026
1 check passed

teknium1 mentioned this pull request Mar 22, 2026

fix: move memory/skill nudges to API-call-time injection only #2233

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: replace inline nudges with background memory/skill review#2235

feat: replace inline nudges with background memory/skill review#2235
teknium1 merged 1 commit intomainfrom
hermes/hermes-3369cdb1

teknium1 commented Mar 20, 2026

Uh oh!

Labels

1 participant

Conversation