fix: stabilize system prompt across gateway turns for cache hits by teknium1 · Pull Request #754 · NousResearch/hermes-agent

teknium1 · 2026-03-09T08:51:19Z

Summary

Prevents unnecessary Anthropic prompt cache misses in the gateway by stabilizing the system prompt across all turns in a session.

Problem

The gateway creates a fresh AIAgent per user message. Each new agent rebuilt the system prompt from disk, picking up:

Memory changes — the model wrote to memory during previous turns, so the disk content changed. But the model already has the updated memory in its conversation history, making the system prompt rebuild redundant AND cache-breaking.
Honcho context changes — re-fetching Honcho on each turn could return different context, changing the system message.

Both caused the system prompt content to differ between turns, breaking the Anthropic prefix cache and forcing full re-reads (~75% cost increase on input tokens).

Fix

System prompt reuse for continuing sessions:

When conversation_history is non-empty and the session DB has a stored system prompt, reuse it instead of rebuilding from disk
First turn of a session builds the prompt fresh (as before)
Compression still invalidates and rebuilds (as before)

Honcho context stabilization:

Only prefetch Honcho context on the first turn (empty history)
Bake Honcho context into the cached system prompt before storing to DB
Remove the per-turn Honcho injection from the API call loop

What stays the same

CLI behavior (same AIAgent persists, prompt was already stable)
Compression (invalidates prompt and rebuilds from scratch)
First-turn behavior (builds fresh with current memory + Honcho)

Tests

6 new tests in TestSystemPromptStability covering:

Stored prompt reuse for continuing sessions
Fresh build on first turn (no history)
Fallback to fresh build when DB has no stored prompt
Honcho context baked into prompt on first turn
Honcho prefetch gating

Full suite: 2556 passed, 0 failures

Two changes to prevent unnecessary Anthropic prompt cache misses in the gateway, where a fresh AIAgent is created per user message: 1. Reuse stored system prompt for continuing sessions: When conversation_history is non-empty, load the system prompt from the session DB instead of rebuilding from disk. The model already has updated memory in its conversation history (it wrote it!), so re-reading memory from disk produces a different system prompt that breaks the cache prefix. 2. Stabilize Honcho context per session: - Only prefetch Honcho context on the first turn (empty history) - Bake Honcho context into the cached system prompt and store to DB - Remove the per-turn Honcho injection from the API call loop This ensures the system message is identical across all turns in a session. Previously, re-fetching Honcho could return different context on each turn, changing the system message and invalidating the cache. Both changes preserve the existing behavior for compression (which invalidates the prompt and rebuilds from scratch) and for the CLI (where the same AIAgent persists and the cached prompt is already stable across turns). Tests: 2556 passed (6 new)

teknium1 merged commit a2d0d07 into main Mar 9, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: stabilize system prompt across gateway turns for cache hits#754

fix: stabilize system prompt across gateway turns for cache hits#754
teknium1 merged 1 commit intomainfrom
hermes/hermes-c53b7cba

teknium1 commented Mar 9, 2026

Uh oh!

Labels

1 participant

Conversation