bug: SQLite session transcript accumulates duplicate messages (3-4x token inflation)

Summary

Every message in a gateway session gets written to the SQLite database (state.db) multiple times, causing the loaded conversation history to be 3–4x larger than it should be. This results in proportionally inflated token usage on every API call.

The JSONL transcript file is unaffected and contains the correct message count.

Root Cause

Three separate code paths all write to the same SQLite session, with no deduplication:

1. run_agent._log_msg_to_db() (in run_agent.py)
Called immediately when the user message is appended to the messages list. Writes the user message once.

2. run_agent._persist_session() → _flush_messages_to_session_db() (in run_agent.py)
Called at every exit point in the tool-calling loop — normal finish, errors, retries, early returns (~15+ call sites). Each call re-flushes all new messages from start_idx = len(conversation_history) with no check for what was already written. For a normal conversation turn with tool calls, this fires multiple times.

3. gateway.session_store.append_to_transcript() (in gateway/run.py, _handle_message)
Called after the agent returns, writing the new messages again via the gateway's own SessionStore._db.

GatewayRunner._session_db and SessionStore._db are separate SessionDB() instances but both connect to the same state.db file.

Result

User messages: written ~3x (paths 1 + 2 + 3)
Assistant/tool messages: written ~2–3x (paths 2 + 3, potentially multiple times via path 2)
load_transcript() prefers SQLite over JSONL, so it loads the inflated history on every message
Observed ratio in practice: ~3.4x (194 messages in JSONL, 654 in SQLite for the same session)

Reproduction

Run a gateway session (Telegram/Discord), send ~10 messages with tool calls, then:

import sqlite3
conn = sqlite3.connect('~/.hermes/state.db')
c = conn.cursor()
c.execute("SELECT role, COUNT(*) FROM messages WHERE session_id=? GROUP BY role", (session_id,))
print(c.fetchall())

Compare against the JSONL line count for the same session. SQLite count will be ~3x higher.

Impact

Token usage is inflated by 3–4x per message. A session with a 5-hour usage budget was exhausted in ~40 minutes of actual conversation.

Suggested Fix

Option A (minimal): Track a _last_flushed_db_idx in _flush_messages_to_session_db so repeat calls only write truly new messages:

def _flush_messages_to_session_db(self, messages, conversation_history=None):
    if not self._session_db:
        return
    start_idx = getattr(self, '_last_flushed_db_idx', len(conversation_history) if conversation_history else 0)
    for msg in messages[start_idx:]:
        self._session_db.append_message(...)
    self._last_flushed_db_idx = len(messages)

Option B: Remove _log_msg_to_db from run_conversation (redundant with _flush_messages_to_session_db) and ensure the gateway's append_to_transcript doesn't double-write what _flush_messages_to_session_db already persisted.

Option C (quickest workaround): Swap load_transcript priority to prefer JSONL over SQLite, since the JSONL is always correct.

Environment

Commit: f6bc620
Platform: Arch Linux, gateway via systemd user service
Platforms affected: Telegram (confirmed), likely all gateway platforms

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bug: SQLite session transcript accumulates duplicate messages (3-4x token inflation) #860

Summary

Root Cause

Result

Reproduction

Impact

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

bug: SQLite session transcript accumulates duplicate messages (3-4x token inflation) #860

Description

Summary

Root Cause

Result

Reproduction

Impact

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions