Skip to content

fix: gateway token double-counting — use absolute set instead of increment#3317

Merged
teknium1 merged 2 commits intomainfrom
hermes/hermes-5ef8201d
Mar 27, 2026
Merged

fix: gateway token double-counting — use absolute set instead of increment#3317
teknium1 merged 2 commits intomainfrom
hermes/hermes-5ef8201d

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

Summary

The gateway's token usage stats inflate with every message because update_session() uses += for cumulative values the agent already tracks.

Root cause

The cached agent's session_prompt_tokens / session_completion_tokens are running totals that accumulate across run_conversation() calls. The gateway reads these cumulative values and passes them to update_session(), which does entry.input_tokens += cumulative. Each message re-adds the full running total.

Live test result (before fix):

After msg 1: input=1000 → stored 1000 ✓
After msg 2: input=2500 → stored 3500 (should be 2500) ✗
After msg 3: input=5000 → stored 8500 (should be 5000) ✗
Token inflation: 1.7x after 3 messages

Fix

  • gateway/session.py: Change += to = for input_tokens, output_tokens, cache_read/write_tokens, estimated_cost_usd
  • hermes_state.py: Add set_token_counts() — uses SQL direct assignment (input_tokens = ?) instead of increment (input_tokens = input_tokens + ?)
  • gateway/session.py: Switch DB call from update_token_counts to set_token_counts

CLI mode continues using update_token_counts() (increment) since it tracks per-API-call deltas.

Live test result (after fix):

After msg 1: input=1000 → stored 1000 ✓
After msg 2: input=2500 → stored 2500 ✓
After msg 3: input=5000 → stored 5000 ✓
Token inflation: 1.0x

Validation

  • python -m pytest tests/test_hermes_state.py tests/gateway/test_session.py -n0 -q → 186 passed

Based on analysis from PR #3222 by @zaycruz.
Co-authored-by: zaycruz zay@users.noreply.github.com

teknium1 and others added 2 commits March 26, 2026 19:09
…ement

The gateway's update_session() used += for token counts, but the cached
agent's session_prompt_tokens / session_completion_tokens are cumulative
totals that grow across messages. Each update_session call re-added the
running total, inflating usage stats with every message (1.7x after 3
messages, worse over longer conversations).

Fix: change += to = for in-memory entry fields, add set_token_counts()
to SessionDB that uses direct assignment instead of SQL increment, and
switch the gateway to call it.

CLI mode continues using update_token_counts() (increment) since it
tracks per-API-call deltas — that path is unchanged.

Based on analysis from PR #3222 by @zaycruz (closed).
Co-authored-by: zaycruz <zay@users.noreply.github.com>
@teknium1 teknium1 merged commit 22cfad1 into main Mar 27, 2026
1 of 2 checks passed
StreamOfRon pushed a commit to StreamOfRon/hermes-agent that referenced this pull request Mar 29, 2026
…ement (NousResearch#3317)

The gateway's update_session() used += for token counts, but the cached
agent's session_prompt_tokens / session_completion_tokens are cumulative
totals that grow across messages. Each update_session call re-added the
running total, inflating usage stats with every message (1.7x after 3
messages, worse over longer conversations).

Fix: change += to = for in-memory entry fields, add set_token_counts()
to SessionDB that uses direct assignment instead of SQL increment, and
switch the gateway to call it.

CLI mode continues using update_token_counts() (increment) since it
tracks per-API-call deltas — that path is unchanged.

Based on analysis from PR NousResearch#3222 by @zaycruz (closed).

Co-authored-by: zaycruz <zay@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant