fix: gateway token double-counting with cached agents#3306
Merged
Conversation
…lative totals The cached agent accumulates session_input_tokens across messages, so run_conversation() returns cumulative totals. But update_session() used += (increment), double-counting on every message after the first. - session.py: change in-memory entry updates from += to = (direct assignment for cumulative values) - hermes_state.py: add absolute=True flag to update_token_counts() that uses SET column = ? instead of SET column = column + ? - session.py: pass absolute=True to the DB call CLI path is unchanged — it passes per-API-call deltas directly to update_token_counts() with the default absolute=False (increment). Reported by @zaycruz in #3222. Closes #3222.
StreamOfRon
pushed a commit
to StreamOfRon/hermes-agent
that referenced
this pull request
Mar 29, 2026
) The cached agent accumulates session_input_tokens across messages, so run_conversation() returns cumulative totals. But update_session() used += (increment), double-counting on every message after the first. - session.py: change in-memory entry updates from += to = (direct assignment for cumulative values) - hermes_state.py: add absolute=True flag to update_token_counts() that uses SET column = ? instead of SET column = column + ? - session.py: pass absolute=True to the DB call CLI path is unchanged — it passes per-API-call deltas directly to update_token_counts() with the default absolute=False (increment). Reported by @zaycruz in NousResearch#3222. Closes NousResearch#3222.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #3222 (reported by @zaycruz).
Gateway was double/triple-counting token usage because the cached agent accumulates
session_input_tokensacross messages (cumulative totals), butupdate_session()used+=(increment) in both the in-memory entry and the SQLite DB.Example of the bug
This caused inflated
/usagereports and could trigger premature context compression.Fix
session.py: change in-memory+=to=(direct assignment for cumulative values)hermes_state.py: addabsolute=Trueflag toupdate_token_counts()— usesSET col = ?instead ofSET col = col + ?session.py: passabsolute=Truewhen calling the DBThe CLI path is unchanged — it passes per-API-call deltas directly with the default
absolute=False(increment).Why not cherry-pick #3222
The original PR is stale (+225/-123 with heavy formatting noise) and bundles an unrelated platform toolset refactor that no longer applies. The actual fix is the
+=→=change plus the DB flag.