Skip to content

fix: stabilize system prompt across gateway turns for cache hits#754

Merged
teknium1 merged 1 commit intomainfrom
hermes/hermes-c53b7cba
Mar 9, 2026
Merged

fix: stabilize system prompt across gateway turns for cache hits#754
teknium1 merged 1 commit intomainfrom
hermes/hermes-c53b7cba

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

@teknium1 teknium1 commented Mar 9, 2026

Summary

Prevents unnecessary Anthropic prompt cache misses in the gateway by stabilizing the system prompt across all turns in a session.

Problem

The gateway creates a fresh AIAgent per user message. Each new agent rebuilt the system prompt from disk, picking up:

  1. Memory changes — the model wrote to memory during previous turns, so the disk content changed. But the model already has the updated memory in its conversation history, making the system prompt rebuild redundant AND cache-breaking.
  2. Honcho context changes — re-fetching Honcho on each turn could return different context, changing the system message.

Both caused the system prompt content to differ between turns, breaking the Anthropic prefix cache and forcing full re-reads (~75% cost increase on input tokens).

Fix

System prompt reuse for continuing sessions:

  • When conversation_history is non-empty and the session DB has a stored system prompt, reuse it instead of rebuilding from disk
  • First turn of a session builds the prompt fresh (as before)
  • Compression still invalidates and rebuilds (as before)

Honcho context stabilization:

  • Only prefetch Honcho context on the first turn (empty history)
  • Bake Honcho context into the cached system prompt before storing to DB
  • Remove the per-turn Honcho injection from the API call loop

What stays the same

  • CLI behavior (same AIAgent persists, prompt was already stable)
  • Compression (invalidates prompt and rebuilds from scratch)
  • First-turn behavior (builds fresh with current memory + Honcho)

Tests

6 new tests in TestSystemPromptStability covering:

  • Stored prompt reuse for continuing sessions
  • Fresh build on first turn (no history)
  • Fallback to fresh build when DB has no stored prompt
  • Honcho context baked into prompt on first turn
  • Honcho prefetch gating

Full suite: 2556 passed, 0 failures

Two changes to prevent unnecessary Anthropic prompt cache misses in the
gateway, where a fresh AIAgent is created per user message:

1. Reuse stored system prompt for continuing sessions:
   When conversation_history is non-empty, load the system prompt from
   the session DB instead of rebuilding from disk. The model already has
   updated memory in its conversation history (it wrote it!), so
   re-reading memory from disk produces a different system prompt that
   breaks the cache prefix.

2. Stabilize Honcho context per session:
   - Only prefetch Honcho context on the first turn (empty history)
   - Bake Honcho context into the cached system prompt and store to DB
   - Remove the per-turn Honcho injection from the API call loop

   This ensures the system message is identical across all turns in a
   session. Previously, re-fetching Honcho could return different context
   on each turn, changing the system message and invalidating the cache.

Both changes preserve the existing behavior for compression (which
invalidates the prompt and rebuilds from scratch) and for the CLI
(where the same AIAgent persists and the cached prompt is already
stable across turns).

Tests: 2556 passed (6 new)
@teknium1 teknium1 merged commit a2d0d07 into main Mar 9, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant