fix: gateway token double-counting — use absolute set instead of increment by teknium1 · Pull Request #3317 · NousResearch/hermes-agent

teknium1 · 2026-03-27T02:10:04Z

Summary

The gateway's token usage stats inflate with every message because update_session() uses += for cumulative values the agent already tracks.

Root cause

The cached agent's session_prompt_tokens / session_completion_tokens are running totals that accumulate across run_conversation() calls. The gateway reads these cumulative values and passes them to update_session(), which does entry.input_tokens += cumulative. Each message re-adds the full running total.

Live test result (before fix):

After msg 1: input=1000 → stored 1000 ✓
After msg 2: input=2500 → stored 3500 (should be 2500) ✗
After msg 3: input=5000 → stored 8500 (should be 5000) ✗
Token inflation: 1.7x after 3 messages

Fix

gateway/session.py: Change += to = for input_tokens, output_tokens, cache_read/write_tokens, estimated_cost_usd
hermes_state.py: Add set_token_counts() — uses SQL direct assignment (input_tokens = ?) instead of increment (input_tokens = input_tokens + ?)
gateway/session.py: Switch DB call from update_token_counts to set_token_counts

CLI mode continues using update_token_counts() (increment) since it tracks per-API-call deltas.

Live test result (after fix):

After msg 1: input=1000 → stored 1000 ✓
After msg 2: input=2500 → stored 2500 ✓
After msg 3: input=5000 → stored 5000 ✓
Token inflation: 1.0x

Validation

python -m pytest tests/test_hermes_state.py tests/gateway/test_session.py -n0 -q → 186 passed

Based on analysis from PR #3222 by @zaycruz.
Co-authored-by: zaycruz zay@users.noreply.github.com

@zaycruz

…ement The gateway's update_session() used += for token counts, but the cached agent's session_prompt_tokens / session_completion_tokens are cumulative totals that grow across messages. Each update_session call re-added the running total, inflating usage stats with every message (1.7x after 3 messages, worse over longer conversations). Fix: change += to = for in-memory entry fields, add set_token_counts() to SessionDB that uses direct assignment instead of SQL increment, and switch the gateway to call it. CLI mode continues using update_token_counts() (increment) since it tracks per-API-call deltas — that path is unchanged. Based on analysis from PR #3222 by @zaycruz (closed). Co-authored-by: zaycruz <zay@users.noreply.github.com>

# Conflicts: # gateway/session.py

@zaycruz

…ement (NousResearch#3317) The gateway's update_session() used += for token counts, but the cached agent's session_prompt_tokens / session_completion_tokens are cumulative totals that grow across messages. Each update_session call re-added the running total, inflating usage stats with every message (1.7x after 3 messages, worse over longer conversations). Fix: change += to = for in-memory entry fields, add set_token_counts() to SessionDB that uses direct assignment instead of SQL increment, and switch the gateway to call it. CLI mode continues using update_token_counts() (increment) since it tracks per-API-call deltas — that path is unchanged. Based on analysis from PR NousResearch#3222 by @zaycruz (closed). Co-authored-by: zaycruz <zay@users.noreply.github.com>

teknium1 and others added 2 commits March 26, 2026 19:09

Merge remote-tracking branch 'origin/main' into hermes/hermes-5ef8201d

5626d45

# Conflicts: # gateway/session.py

teknium1 merged commit 22cfad1 into main Mar 27, 2026
1 of 2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: gateway token double-counting — use absolute set instead of increment#3317

fix: gateway token double-counting — use absolute set instead of increment#3317
teknium1 merged 2 commits intomainfrom
hermes/hermes-5ef8201d

teknium1 commented Mar 27, 2026

Uh oh!

Labels

1 participant

Conversation