-
Notifications
You must be signed in to change notification settings - Fork 2.6k
bug: SQLite session transcript accumulates duplicate messages (3-4x token inflation) #860
Description
Summary
Every message in a gateway session gets written to the SQLite database (state.db) multiple times, causing the loaded conversation history to be 3–4x larger than it should be. This results in proportionally inflated token usage on every API call.
The JSONL transcript file is unaffected and contains the correct message count.
Root Cause
Three separate code paths all write to the same SQLite session, with no deduplication:
1. run_agent._log_msg_to_db() (in run_agent.py)
Called immediately when the user message is appended to the messages list. Writes the user message once.
2. run_agent._persist_session() → _flush_messages_to_session_db() (in run_agent.py)
Called at every exit point in the tool-calling loop — normal finish, errors, retries, early returns (~15+ call sites). Each call re-flushes all new messages from start_idx = len(conversation_history) with no check for what was already written. For a normal conversation turn with tool calls, this fires multiple times.
3. gateway.session_store.append_to_transcript() (in gateway/run.py, _handle_message)
Called after the agent returns, writing the new messages again via the gateway's own SessionStore._db.
GatewayRunner._session_db and SessionStore._db are separate SessionDB() instances but both connect to the same state.db file.
Result
- User messages: written ~3x (paths 1 + 2 + 3)
- Assistant/tool messages: written ~2–3x (paths 2 + 3, potentially multiple times via path 2)
load_transcript()prefers SQLite over JSONL, so it loads the inflated history on every message- Observed ratio in practice: ~3.4x (194 messages in JSONL, 654 in SQLite for the same session)
Reproduction
Run a gateway session (Telegram/Discord), send ~10 messages with tool calls, then:
import sqlite3
conn = sqlite3.connect('~/.hermes/state.db')
c = conn.cursor()
c.execute("SELECT role, COUNT(*) FROM messages WHERE session_id=? GROUP BY role", (session_id,))
print(c.fetchall())Compare against the JSONL line count for the same session. SQLite count will be ~3x higher.
Impact
Token usage is inflated by 3–4x per message. A session with a 5-hour usage budget was exhausted in ~40 minutes of actual conversation.
Suggested Fix
Option A (minimal): Track a _last_flushed_db_idx in _flush_messages_to_session_db so repeat calls only write truly new messages:
def _flush_messages_to_session_db(self, messages, conversation_history=None):
if not self._session_db:
return
start_idx = getattr(self, '_last_flushed_db_idx', len(conversation_history) if conversation_history else 0)
for msg in messages[start_idx:]:
self._session_db.append_message(...)
self._last_flushed_db_idx = len(messages)Option B: Remove _log_msg_to_db from run_conversation (redundant with _flush_messages_to_session_db) and ensure the gateway's append_to_transcript doesn't double-write what _flush_messages_to_session_db already persisted.
Option C (quickest workaround): Swap load_transcript priority to prefer JSONL over SQLite, since the JSONL is always correct.
Environment
- Commit:
f6bc620 - Platform: Arch Linux, gateway via systemd user service
- Platforms affected: Telegram (confirmed), likely all gateway platforms