Skip to content

bug: SQLite session transcript accumulates duplicate messages (3-4x token inflation) #860

@DiamondEyesFox

Description

@DiamondEyesFox

Summary

Every message in a gateway session gets written to the SQLite database (state.db) multiple times, causing the loaded conversation history to be 3–4x larger than it should be. This results in proportionally inflated token usage on every API call.

The JSONL transcript file is unaffected and contains the correct message count.

Root Cause

Three separate code paths all write to the same SQLite session, with no deduplication:

1. run_agent._log_msg_to_db() (in run_agent.py)
Called immediately when the user message is appended to the messages list. Writes the user message once.

2. run_agent._persist_session()_flush_messages_to_session_db() (in run_agent.py)
Called at every exit point in the tool-calling loop — normal finish, errors, retries, early returns (~15+ call sites). Each call re-flushes all new messages from start_idx = len(conversation_history) with no check for what was already written. For a normal conversation turn with tool calls, this fires multiple times.

3. gateway.session_store.append_to_transcript() (in gateway/run.py, _handle_message)
Called after the agent returns, writing the new messages again via the gateway's own SessionStore._db.

GatewayRunner._session_db and SessionStore._db are separate SessionDB() instances but both connect to the same state.db file.

Result

  • User messages: written ~3x (paths 1 + 2 + 3)
  • Assistant/tool messages: written ~2–3x (paths 2 + 3, potentially multiple times via path 2)
  • load_transcript() prefers SQLite over JSONL, so it loads the inflated history on every message
  • Observed ratio in practice: ~3.4x (194 messages in JSONL, 654 in SQLite for the same session)

Reproduction

Run a gateway session (Telegram/Discord), send ~10 messages with tool calls, then:

import sqlite3
conn = sqlite3.connect('~/.hermes/state.db')
c = conn.cursor()
c.execute("SELECT role, COUNT(*) FROM messages WHERE session_id=? GROUP BY role", (session_id,))
print(c.fetchall())

Compare against the JSONL line count for the same session. SQLite count will be ~3x higher.

Impact

Token usage is inflated by 3–4x per message. A session with a 5-hour usage budget was exhausted in ~40 minutes of actual conversation.

Suggested Fix

Option A (minimal): Track a _last_flushed_db_idx in _flush_messages_to_session_db so repeat calls only write truly new messages:

def _flush_messages_to_session_db(self, messages, conversation_history=None):
    if not self._session_db:
        return
    start_idx = getattr(self, '_last_flushed_db_idx', len(conversation_history) if conversation_history else 0)
    for msg in messages[start_idx:]:
        self._session_db.append_message(...)
    self._last_flushed_db_idx = len(messages)

Option B: Remove _log_msg_to_db from run_conversation (redundant with _flush_messages_to_session_db) and ensure the gateway's append_to_transcript doesn't double-write what _flush_messages_to_session_db already persisted.

Option C (quickest workaround): Swap load_transcript priority to prefer JSONL over SQLite, since the JSONL is always correct.

Environment

  • Commit: f6bc620
  • Platform: Arch Linux, gateway via systemd user service
  • Platforms affected: Telegram (confirmed), likely all gateway platforms

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions