fix: eliminate 3x SQLite message duplication in gateway sessions#873
Merged
fix: eliminate 3x SQLite message duplication in gateway sessions#873
Conversation
Three separate code paths all wrote to the same SQLite state.db with no deduplication, inflating session transcripts by 3-4x: 1. _log_msg_to_db() — wrote each message individually after append 2. _flush_messages_to_session_db() — re-wrote ALL new messages at every _persist_session() call (~18 exit points), with no tracking of what was already written 3. gateway append_to_transcript() — wrote everything a third time after the agent returned Since load_transcript() prefers SQLite over JSONL, the inflated data was loaded on every session resume, causing proportional token waste. Fix: - Remove _log_msg_to_db() and all 16 call sites (redundant with flush) - Add _last_flushed_db_idx tracking in _flush_messages_to_session_db() so repeated _persist_session() calls only write truly new messages - Reset flush cursor on compression (new session ID) - Add skip_db parameter to SessionStore.append_to_transcript() so the gateway skips SQLite writes when the agent already persisted them - Gateway now passes skip_db=True for agent-managed messages, still writes to JSONL as backup Verified: a 12-message CLI session with tool calls produces exactly 12 SQLite rows with zero duplicates (previously would be 36-48). Tests: 9 new tests covering flush deduplication, skip_db behavior, compression reset, and initialization. Full suite passes (2869 tests).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #860 — SQLite session transcripts accumulated duplicate messages (3-4x token inflation).
Root Cause
Three separate code paths all wrote to the same
state.dbwith no deduplication:_log_msg_to_db()— wrote each message individually right aftermessages.append()_flush_messages_to_session_db()— re-wrote ALL new messages at every_persist_session()call (~18 exit points), with no tracking of what was already writtenappend_to_transcript()— wrote everything a third time after the agent returnedSince
load_transcript()prefers SQLite over JSONL, the inflated data was loaded on every session resume, causing proportional token waste.Fix
run_agent.py:_log_msg_to_db()method and all 16 call sites (redundant with the flush mechanism)_last_flushed_db_idxtracking in_flush_messages_to_session_db()so repeated_persist_session()calls only write truly new messagesgateway/session.py:skip_dbparameter toSessionStore.append_to_transcript()— when True, writes JSONL onlygateway/run.py:skip_db=Truewhen the agent already persisted messages to SQLiteVerification
Live-tested with
hermes chat: a 12-message session with 10 tool calls produces exactly 12 SQLite rows with zero duplicates (previously would have been 36-48).Tests
tests/test_860_dedup.pycovering:_persist_sessioncalls (no duplication)skip_db=Trueprevents SQLite writesskip_db=False(default) writes to both stores_last_flushed_db_idxinitializationtest_interrupt.pyto remove reference to deleted_log_msg_to_db