Skip to content

fix: gateway token double-counting with cached agents#3306

Merged
teknium1 merged 1 commit intomainfrom
hermes/hermes-140430f8
Mar 27, 2026
Merged

fix: gateway token double-counting with cached agents#3306
teknium1 merged 1 commit intomainfrom
hermes/hermes-140430f8

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

Summary

Fixes #3222 (reported by @zaycruz).

Gateway was double/triple-counting token usage because the cached agent accumulates session_input_tokens across messages (cumulative totals), but update_session() used += (increment) in both the in-memory entry and the SQLite DB.

Example of the bug

Message Agent returns Entry had Entry becomes (bug) Should be
1 100 0 0 + 100 = 100 ✓ 100
2 250 100 100 + 250 = 350 ✗ 250
3 300 350 350 + 300 = 650 ✗ 300

This caused inflated /usage reports and could trigger premature context compression.

Fix

  • session.py: change in-memory += to = (direct assignment for cumulative values)
  • hermes_state.py: add absolute=True flag to update_token_counts() — uses SET col = ? instead of SET col = col + ?
  • session.py: pass absolute=True when calling the DB

The CLI path is unchanged — it passes per-API-call deltas directly with the default absolute=False (increment).

Why not cherry-pick #3222

The original PR is stale (+225/-123 with heavy formatting noise) and bundles an unrelated platform toolset refactor that no longer applies. The actual fix is the +== change plus the DB flag.

…lative totals

The cached agent accumulates session_input_tokens across messages, so
run_conversation() returns cumulative totals. But update_session() used
+= (increment), double-counting on every message after the first.

- session.py: change in-memory entry updates from += to = (direct
  assignment for cumulative values)
- hermes_state.py: add absolute=True flag to update_token_counts()
  that uses SET column = ? instead of SET column = column + ?
- session.py: pass absolute=True to the DB call

CLI path is unchanged — it passes per-API-call deltas directly to
update_token_counts() with the default absolute=False (increment).

Reported by @zaycruz in #3222. Closes #3222.
@teknium1 teknium1 merged commit a8df7f9 into main Mar 27, 2026
1 of 2 checks passed
StreamOfRon pushed a commit to StreamOfRon/hermes-agent that referenced this pull request Mar 29, 2026
)

The cached agent accumulates session_input_tokens across messages, so
run_conversation() returns cumulative totals. But update_session() used
+= (increment), double-counting on every message after the first.

- session.py: change in-memory entry updates from += to = (direct
  assignment for cumulative values)
- hermes_state.py: add absolute=True flag to update_token_counts()
  that uses SET column = ? instead of SET column = column + ?
- session.py: pass absolute=True to the DB call

CLI path is unchanged — it passes per-API-call deltas directly to
update_token_counts() with the default absolute=False (increment).

Reported by @zaycruz in NousResearch#3222. Closes NousResearch#3222.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant