Skip to content

Summarizer improvements#173

Merged
VVoruganti merged 6 commits intomainfrom
ben/summarizer-improvements
Jul 31, 2025
Merged

Summarizer improvements#173
VVoruganti merged 6 commits intomainfrom
ben/summarizer-improvements

Conversation

@dr-frmr
Copy link
Copy Markdown
Contributor

@dr-frmr dr-frmr commented Jul 30, 2025

Summary by CodeRabbit

  • New Features

    • Added comprehensive documentation explaining conversation summarization strategies combining recent messages with LLM-generated summaries.
    • Introduced new benchmark test data files for long, short, trivial, and multi-session conversation summaries.
  • Improvements

    • Increased default maximum token limit for context retrieval to 100,000 and renamed the configuration key for clarity.
    • Enhanced summary prompts for more thorough, fact-based outputs with improved token limit handling and detailed logging.
    • Updated default summary generation model to a more advanced version.
    • Improved modularity and reuse in message retrieval with token limit support.
    • Refined session context retrieval to better handle token limits and summary inclusion, with clearer logging.
  • Bug Fixes

    • Improved token limit handling and summary logic in session context retrieval to ensure accurate message selection and logging.
  • Documentation

    • Updated configuration and environment variable documentation to reflect new token limit settings and summarization details.
  • Tests

    • Expanded and refined benchmark tests to better evaluate summarization and context retrieval across various scenarios.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jul 30, 2025

Walkthrough

This update renames the context token limit configuration key and environment variable from GET_CONTEXT_DEFAULT_MAX_TOKENS to GET_CONTEXT_MAX_TOKENS, increasing its default value from 2048 to 100,000. It also enhances summarization logic, refactors token limit application in message retrieval, updates documentation, and adds new and revised benchmark test data for summarization.

Changes

Cohort / File(s) Change Summary
Configuration Key and Environment Variable Update
.env.template, config.toml.example, src/config.py, docs/v2/contributing/configuration.mdx
Renamed GET_CONTEXT_DEFAULT_MAX_TOKENS to GET_CONTEXT_MAX_TOKENS and increased default value from 2048 to 100,000 in environment, config, code, and documentation. Numeric literals reformatted for readability.
Summarizer Logic and Prompt Improvements
src/utils/summarizer.py
Replaced hardcoded summary metadata key with a constant, improved logging, prompt instructions, and token accounting. Enhanced summary creation and storage logic for clarity and monitoring.
Message Retrieval Token Limit Refactor
src/crud/message.py
Introduced a helper to apply token limits in message queries, refactored message retrieval functions to use it, and updated function signatures for flexibility.
Session Context Endpoint and Token Handling
src/routers/sessions.py
Updated endpoint to use new token limit parameter, removed force_new, improved summary inclusion logic, and adjusted token limit enforcement and logging.
Summarizer Documentation
docs/v2/documentation/core-concepts/summarizer.mdx
Added new documentation detailing summarization strategy, implementation, and usage scenarios.
Benchmark Test Runner and Data
tests/bench/run_tests.py
Improved type annotation and logging for token limits in test runner.
Benchmark Test Data Additions
tests/bench/tests/long_summary.json, tests/bench/tests/summary_long_messages.json, tests/bench/tests/summary_multi_session.json, tests/bench/tests/summary_short_messages.json
Added new JSON files with diverse conversation scenarios to test summarization and context retrieval.
Benchmark Test Data Updates
tests/bench/tests/summary_and_query.json, tests/bench/tests/summary_trivial.json
Removed or adjusted summary requests with specific token limits and added new summary calls to existing test data.
Chat Function Call Update
src/dialectic/chat.py
Removed deprecated force_new argument from get_session_context call in chat function.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant API (get_session_context)
    participant DB
    participant Summarizer

    Client->>API (get_session_context): Request context (tokens, summary)
    API->>DB: Fetch session, config, messages
    alt summary requested
        API->>Summarizer: get_both_summaries
        Summarizer->>DB: Retrieve summaries
        Summarizer-->>API: Return summaries
        API->>DB: Fetch messages after summary (with token limit)
        DB-->>API: Return messages
        API-->>Client: Return summary + recent messages
    else no summary
        API->>DB: Fetch messages (with token limit)
        DB-->>API: Return messages
        API-->>Client: Return messages
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

  • plastic-labs/honcho#115: Introduces centralized configuration via Pydantic Settings, directly related to the environment variable and config key renaming and value changes in this PR.
  • plastic-labs/honcho#167: Adds the GET_CONTEXT_DEFAULT_MAX_TOKENS environment variable and config key with default 2048, which this PR renames and updates.
  • plastic-labs/honcho#160: Reorganizes and expands configuration variables, including renaming and restructuring of LLM-related settings and token limits, overlapping with this PR’s changes.

Suggested reviewers

  • VVoruganti

Poem

In fields of code where tokens grow,
A rabbit hops with numbers in tow.
Limits rise, summaries bloom,
Benchmarks test in every room.
Configs renamed, docs renewed—
With every hop, the context improved!
🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b9a8fb6 and f482e93.

📒 Files selected for processing (1)
  • src/utils/summarizer.py (16 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

src/**/*.py: Follow isort conventions with absolute imports preferred
snake_case for variables/functions; PascalCase for classes
Line length: 88 chars (Black compatible)
Explicit error handling with appropriate exception types
Docstrings: Use Google style docstrings
Use environment variables via python-dotenv (.env)
Use specific exception types (ResourceNotFoundException, ValidationException, etc.)
Proper logging with context instead of print statements

Files:

  • src/utils/summarizer.py
🧠 Learnings (2)
📚 Learning: applies to src/utils/history.py : session history summarization: short every 20 messages, long every...
Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/utils/history.py : Session history summarization: short every 20 messages, long every 60

Applied to files:

  • src/utils/summarizer.py
📚 Learning: the embeddingclient.batch_embed method in src/embeddings.py expects pre-encoded tokens as input. it ...
Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#144
File: scripts/generate_message_embeddings.py:96-106
Timestamp: 2025-06-26T18:32:43.206Z
Learning: The EmbeddingClient.batch_embed method in src/embeddings.py expects pre-encoded tokens as input. It takes a dictionary mapping text IDs to tuples of (text, encoded_tokens) where encoded_tokens is a list of integers. The method uses these pre-encoded tokens for chunking logic and does not perform encoding internally.

Applied to files:

  • src/utils/summarizer.py
🪛 Ruff (0.12.2)
src/utils/summarizer.py

339-339: Trailing comma missing

Add trailing comma

(COM812)


354-354: Trailing comma missing

Add trailing comma

(COM812)

🔇 Additional comments (11)
src/utils/summarizer.py (11)

13-13: LGTM! Good refactoring to move import to module level.

Moving tracked_db import to the module level improves code organization and removes the need for local imports.


57-57: Good practice to use a constant for the metadata key.

Using SUMMARIES_KEY constant instead of hardcoded strings improves maintainability and reduces the risk of typos.


74-74: Good addition of input_tokens parameter for dynamic summary sizing.

This allows the summary length to be proportional to the input content, ensuring summaries are appropriately sized.


84-88: Good fallback handling for missing previous summary.

The explicit message when no previous summary exists provides clear context to the LLM.


129-133: Consistent fallback handling across summary types.

Good to maintain the same fallback message pattern as in create_short_summary.


199-204: Excellent logging improvements for better observability.

The info-level logs provide clear visibility into summary creation events with all relevant context (session, message ID, sequence number).

Also applies to: 215-220, 237-242, 251-256


290-298: Correct implementation of input token calculation.

The token counting accurately combines message tokens with previous summary tokens, providing the necessary input for dynamic summary sizing.


351-359: Excellent error detection and logging enhancements.

The empty summary detection helps identify token limit issues, and the detailed logging provides valuable debugging information.


369-384: Good metrics collection for monitoring summary generation.

The input and output token metrics will help track performance and costs of summary generation.


426-428: Consistent usage of SUMMARIES_KEY constant throughout.

All hardcoded "summaries" strings have been properly replaced with the constant, improving maintainability.

Also applies to: 525-525, 553-553


335-340: Consider adding a trailing comma for better maintainability.

         response = await create_short_summary(
-            messages, input_tokens, previous_summary_text
+            messages, input_tokens, previous_summary_text,
         )
⛔ Skipped due to learnings
Learnt from: VVoruganti
PR: plastic-labs/honcho#164
File: sdks/python/src/honcho/async_client/peer.py:136-136
Timestamp: 2025-07-21T15:53:20.965Z
Learning: The honcho repository should have COM812 (trailing comma missing) rule disabled in the root pyproject.toml Ruff configuration by adding it to the existing ignore list: ignore = ["E501", "B008", "COM812"]. The maintainers do not want trailing comma suggestions in code reviews.
Learnt from: VVoruganti
PR: plastic-labs/honcho#164
File: sdks/python/src/honcho/async_client/peer.py:136-136
Timestamp: 2025-07-21T15:53:20.965Z
Learning: The honcho repository maintainers do not want trailing comma suggestions or complaints from static analysis tools like Ruff. The COM812 rule should be disabled in the Ruff configuration to avoid these suggestions in future reviews.
Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#144
File: tests/integration/test_message_embeddings.py:128-187
Timestamp: 2025-06-26T18:35:46.478Z
Learning: The honcho repository does not use trailing commas in function signatures, function calls, or object instantiation. This is consistently applied across the entire codebase including test files, and reviews should not suggest adding trailing commas.
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ben/summarizer-improvements

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2871ab8 and 56e1dbb.

📒 Files selected for processing (15)
  • .env.template (1 hunks)
  • config.toml.example (1 hunks)
  • docs/v2/contributing/configuration.mdx (1 hunks)
  • docs/v2/documentation/core-concepts/summarizer.mdx (1 hunks)
  • src/config.py (5 hunks)
  • src/crud/message.py (4 hunks)
  • src/routers/sessions.py (2 hunks)
  • src/utils/summarizer.py (16 hunks)
  • tests/bench/run_tests.py (2 hunks)
  • tests/bench/tests/long_summary.json (1 hunks)
  • tests/bench/tests/summary_and_query.json (1 hunks)
  • tests/bench/tests/summary_long_messages.json (1 hunks)
  • tests/bench/tests/summary_multi_session.json (1 hunks)
  • tests/bench/tests/summary_short_messages.json (1 hunks)
  • tests/bench/tests/summary_trivial.json (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
tests/**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

Tests in pytest with fixtures in tests/conftest.py

Files:

  • tests/bench/run_tests.py
src/**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

src/**/*.py: Follow isort conventions with absolute imports preferred
snake_case for variables/functions; PascalCase for classes
Line length: 88 chars (Black compatible)
Explicit error handling with appropriate exception types
Docstrings: Use Google style docstrings
Use environment variables via python-dotenv (.env)
Use specific exception types (ResourceNotFoundException, ValidationException, etc.)
Proper logging with context instead of print statements

Files:

  • src/crud/message.py
  • src/routers/sessions.py
  • src/utils/summarizer.py
  • src/config.py
src/routers/**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

API routes must follow the pattern: /v1/{resource}/{id}/{action}

Files:

  • src/routers/sessions.py
🧠 Learnings (7)
tests/bench/tests/summary_and_query.json (1)

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/utils/history.py : Session history summarization: short every 20 messages, long every 60

tests/bench/tests/summary_trivial.json (1)

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/utils/history.py : Session history summarization: short every 20 messages, long every 60

docs/v2/documentation/core-concepts/summarizer.mdx (1)

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/utils/history.py : Session history summarization: short every 20 messages, long every 60

src/crud/message.py (1)

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/{models,crud}.py : Token counting on messages for usage tracking

src/routers/sessions.py (1)

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/utils/history.py : Session history summarization: short every 20 messages, long every 60

src/utils/summarizer.py (1)

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/utils/history.py : Session history summarization: short every 20 messages, long every 60

src/config.py (2)

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/{models,crud}.py : Token counting on messages for usage tracking

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/utils/history.py : Session history summarization: short every 20 messages, long every 60

🧬 Code Graph Analysis (1)
src/routers/sessions.py (2)
src/utils/summarizer.py (1)
  • get_both_summaries (524-547)
src/crud/message.py (1)
  • get_messages_id_range (205-250)
🪛 Ruff (0.12.2)
tests/bench/run_tests.py

520-520: Trailing comma missing

Add trailing comma

(COM812)

src/crud/message.py

20-20: Trailing comma missing

Add trailing comma

(COM812)


55-55: Unnecessary assignment to stmt before return statement

Remove unnecessary assignment

(RET504)

src/routers/sessions.py

376-376: Boolean-typed positional argument in function definition

(FBT001)


377-377: Boolean positional value in function call

(FBT003)

src/utils/summarizer.py

332-332: Trailing comma missing

Add trailing comma

(COM812)


347-347: Trailing comma missing

Add trailing comma

(COM812)

🪛 LanguageTool
docs/v2/documentation/core-concepts/summarizer.mdx

[grammar] ~7-~7: Use correct spacing
Context: ...n such a way that the resulting context is: * Exhaustive: the combination of recent me...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~13-~13: Use correct spacing
Context: ... build, so Honcho comes with a built-in solution. ### Creating Summaries Honcho already has a...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~15-~15: Use correct spacing
Context: ...with a built-in solution. ### Creating Summaries Honcho already has an asynchronous task ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~17-~17: Use correct spacing
Context: ...ly, Honcho has two configurable summary types: * Short summaries: by default, enqueued ev...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~19-~19: Use proper capitalization
Context: ... 20 messages and given a token limit of 1000 * Long summaries: by default, enqueued eve...

(QB_NEW_EN_OTHER_ERROR_IDS_6)


[grammar] ~20-~20: There might be a mistake here.
Context: ... 60 messages and given a token limit of 4000 Both summaries are designed to be exhaus...

(QB_NEW_EN_OTHER)


[grammar] ~22-~22: Use correct spacing
Context: ...essages while still covering the entire conversation. For example, if message 160 in a convers...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~24-~24: Use correct spacing
Context: ...summary type: new summaries replace old ones. It's important to keep in mind that summ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~26-~26: Use correct spacing
Context: ...40, and so on, in our desired recursive way. ### Retrieving Summaries Summaries are retr...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~28-~28: Use correct spacing
Context: ... desired recursive way. ### Retrieving Summaries Summaries are retrieved from the session...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~30-~30: Use correct spacing
Context: ...et_contextmethod. This method has two parameters: *summary`: A boolean indicating whether to includ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~33-~33: Use modal and auxiliary verbs correctly
Context: ...ontext` will retrieve as many tokens as is required to create exhaustive conversat...

(QB_NEW_EN_OTHER_ERROR_IDS_24)


[grammar] ~33-~33: There might be a mistake here.
Context: ...uired to create exhaustive conversation coverage.** The return type is simply a list of rece...

(QB_NEW_EN_OTHER)


[grammar] ~35-~35: Use correct spacing
Context: ...ize for recent messages and 40% for the summary. There's a critical trade-off to understa...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~37-~37: Use correct spacing
Context: ... and token usage. Let's go through some scenarios: * If the last message contains more toke...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[typographical] ~39-~39: To join two clauses or set off examples, consider using an em dash.
Context: ... limit, no summary or message list is possible -- both will be empty. * If the *last few ...

(QB_NEW_EN_DASH_RULE_EM)


[grammar] ~39-~39: Use correct spacing
Context: ...essage list is possible -- both will be empty. * If the last few messages contain more ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[typographical] ~41-~41: To join two clauses or set off examples, consider using an em dash.
Context: ... the context token limit, no summary is possible -- the context will only contain the last 1...

(QB_NEW_EN_DASH_RULE_EM)


[grammar] ~41-~41: Use correct spacing
Context: ...t 1 or 2 messages that fit in the token limit. * If the summaries contain more tokens tha...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[style] ~43-~43: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...essages that fit in the token limit. * If the summaries contain more tokens than ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[typographical] ~43-~43: To join two clauses or set off examples, consider using an em dash.
Context: ... the context token limit, no summary is possible -- the context will only contain the X most...

(QB_NEW_EN_DASH_RULE_EM)


[grammar] ~43-~43: Use correct spacing
Context: ...larger than the configured long summary size. The above scenarios indicate where summa...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~45-~45: There might be a mistake here.
Context: ...ios indicate where summarization is not possible -- therefore, the context retrieved will a...

(QB_NEW_EN_OTHER)


[grammar] ~45-~45: Use correct spacing
Context: ...rieved will almost certainly not be exhaustive. Sometimes, gaps in context aren't an iss...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[typographical] ~47-~47: To join two clauses or set off examples, consider using an em dash.
Context: ...ur needs. Other cases demand exhaustive context -- don't pass a token limit and just let Ho...

(QB_NEW_EN_DASH_RULE_EM)


[grammar] ~47-~47: There might be a mistake here.
Context: ...er cases demand exhaustive context -- don't pass a token limit and just let Honcho ...

(QB_NEW_EN_OTHER)


[grammar] ~47-~47: Use correct spacing
Context: ...hat can be retrieved (currently 100,000 tokens). As a final note, remember that summaries...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[style] ~49-~49: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...vailable immediately. If you batch-save a large number of messages, assume that summaries will no...

(LARGE_NUMBER_OF)

🪛 GitHub Check: basedpyright
src/crud/message.py

[warning] 44-44:
Argument type is unknown
  Argument corresponds to parameter "whereclause" in function "where" (reportUnknownArgumentType)


[failure] 20-20:
Expected type arguments for generic class "list" (reportMissingTypeArgument)


[warning] 20-20:
Type of parameter "base_conditions" is partially unknown
  Parameter type is "list[Unknown]" (reportUnknownParameterType)

🪛 GitHub Actions: Static Analysis
src/crud/message.py

[warning] 20-20: Type of parameter "base_conditions" is partially unknown. Parameter type is "list[Unknown]" (reportUnknownParameterType).


[error] 20-20: Expected type arguments for generic class "list" (reportMissingTypeArgument).

🔇 Additional comments (25)
.env.template (1)

14-14: LGTM - Configuration variable renamed and limit increased.

The renaming from GET_CONTEXT_DEFAULT_MAX_TOKENS to GET_CONTEXT_MAX_TOKENS simplifies the variable name, and the increase from 2048 to 100,000 tokens significantly enhances context retrieval capabilities for improved summarization.

config.toml.example (1)

12-12: LGTM - Consistent configuration key update.

The configuration key change aligns with the environment variable template, maintaining consistency across configuration methods. The increased token limit supports enhanced summarization capabilities.

tests/bench/tests/summary_and_query.json (1)

210-214: LGTM - Test data updated for enhanced summarization.

The addition of an unlimited token summary call (without max_tokens) properly tests the new summarization capabilities that support larger context windows, aligning with the increased token limits in configuration.

tests/bench/tests/summary_trivial.json (1)

193-198: LGTM - Consistent test data improvements.

The token limit increase to 2500 and the addition of an unlimited summary call test both bounded and unbounded summarization scenarios, properly validating the enhanced summarization functionality.

docs/v2/contributing/configuration.mdx (1)

117-117: LGTM - Documentation updated to match configuration changes.

The documentation correctly reflects the renamed environment variable and increased token limit, maintaining consistency between code and documentation.

tests/bench/tests/long_summary.json (1)

1-267: Well-structured test data for long conversation summarization.

This comprehensive test file effectively serves its purpose of testing the summarization system with extended conversations. The mix of regular dialogue, Lorem Ipsum content, and creative fictional scenarios (Mercury shopping trip) provides good coverage for testing how the summarizer handles different types of content within long conversations.

tests/bench/tests/summary_multi_session.json (1)

1-251: Excellent multi-session test coverage.

This test file provides valuable coverage for multi-session summarization scenarios. The two sessions with similar but distinct conversation flows will help validate that the summarization system can properly handle and differentiate between multiple conversation contexts.

tests/bench/run_tests.py (3)

496-496: Good type annotation for clarity.

The explicit int | None type annotation clearly communicates that max_tokens can be None, which improves code readability and helps with type checking.


513-513: Improved logging shows actual retrieved summary.

Printing session_context.summary instead of a potentially stale summary string ensures the debug output reflects the actual summary that was retrieved from the session context.


522-522: Proper null check prevents runtime errors.

The null check for max_tokens before comparison prevents TypeError when max_tokens is None, which is good defensive programming.

tests/bench/tests/summary_short_messages.json (1)

1-307: Comprehensive test coverage for short message summarization.

This test file provides excellent coverage for multi-participant conversations with short messages. The realistic office chat scenario with diverse topics (work projects, dining, entertainment, social plans) will effectively test the summarizer's ability to capture key themes and participant interactions across fragmented conversational threads.

tests/bench/tests/summary_long_messages.json (1)

1-259: Comprehensive long-session fixture looks solid

The JSON structure is valid, the data volume (≈60 messages) is a realistic stress-case for the new 100 K token window, and it exercises the “no explicit token limit” path.
No changes needed.

src/crud/message.py (2)

183-186: LGTM! Clean refactoring

Good use of the helper function to eliminate code duplication while maintaining the same functionality.


205-251: Well-implemented token limit support

The addition of token limit functionality to get_messages_id_range is well-executed:

  • Maintains backward compatibility with optional parameter
  • Properly reuses the _apply_token_limit helper
  • Clear docstring update about inclusive behavior
  • Clean refactoring of conditions into a list
src/routers/sessions.py (2)

368-446: Excellent refactoring of context retrieval logic

The changes improve the endpoint significantly:

  • Proper use of renamed configuration constant
  • Clean integration with get_messages_id_range for token-limited retrieval
  • Appropriate logging level adjustment
  • Maintains backward compatibility while enhancing functionality

397-397: Tuple unpacking confirmed correct — approving changes.

The get_both_summaries function returns (short_summary, long_summary) in that exact order, so the unpacking on line 397 is valid. No further action needed.

src/config.py (3)

231-234: Verify summary provider change is intentional

The summary provider and model have been changed from Google Gemini to OpenAI GPT-4. This may have cost and performance implications.

Please confirm this change is intentional and that any API key requirements have been addressed.


251-253: Significant increase in default context token limit

The default GET_CONTEXT_MAX_TOKENS has increased from 2,048 to 100,000 tokens (approximately 50x). While this aligns with modern LLM capabilities, please ensure:

  1. This doesn't cause performance issues with large contexts
  2. Downstream systems can handle this increased limit
  3. The upper bound of 250,000 is validated against your LLM provider limits

177-259: LGTM! Configuration improvements enhance readability and capabilities

The numeric literal formatting with underscores improves code readability, and the increased token limits align well with the PR's summarization improvements.

src/utils/summarizer.py (6)

57-57: Good practice: Centralized metadata key

Using a constant for the summaries metadata key improves maintainability and prevents typos.


72-108: Excellent improvements to summary generation

The enhancements to create_short_summary are well-designed:

  • Dynamic output length based on input tokens
  • Clear handling of missing previous summaries
  • Improved prompt emphasizing factual content and chronological narrative
  • Appropriate tokens-to-words conversion ratio

344-351: Good error detection and logging improvements

The empty summary detection and info-level logging for summary content will help diagnose issues in production.


192-249: Improved observability with info-level logging

The upgrade from debug to info level for summary save operations provides better visibility into the summarization process, which will be valuable for monitoring and debugging.


524-548: Consistent metadata key usage

Good use of the SUMMARIES_KEY constant in both retrieval functions, maintaining consistency with the save operation.


331-333: Add trailing comma for consistency

             response = await create_short_summary(
-                messages, input_tokens, previous_summary_text
+                messages, input_tokens, previous_summary_text,
             )
⛔ Skipped due to learnings
Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/{models,crud}.py : Token counting on messages for usage tracking
Learnt from: VVoruganti
PR: plastic-labs/honcho#164
File: sdks/python/src/honcho/async_client/peer.py:136-136
Timestamp: 2025-07-21T15:53:20.965Z
Learning: The honcho repository should have COM812 (trailing comma missing) rule disabled in the root pyproject.toml Ruff configuration by adding it to the existing ignore list: ignore = ["E501", "B008", "COM812"]. The maintainers do not want trailing comma suggestions in code reviews.
Learnt from: Rajat-Ahuja1997
PR: plastic-labs/honcho#144
File: tests/integration/test_message_embeddings.py:128-187
Timestamp: 2025-06-26T18:35:46.478Z
Learning: The honcho repository does not use trailing commas in function signatures, function calls, or object instantiation. This is consistently applied across the entire codebase including test files, and reviews should not suggest adding trailing commas.
Learnt from: VVoruganti
PR: plastic-labs/honcho#164
File: sdks/python/src/honcho/async_client/peer.py:136-136
Timestamp: 2025-07-21T15:53:20.965Z
Learning: The honcho repository maintainers do not want trailing comma suggestions or complaints from static analysis tools like Ruff. The COM812 rule should be disabled in the Ruff configuration to avoid these suggestions in future reviews.
@dr-frmr dr-frmr requested a review from VVoruganti July 30, 2025 21:37
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bc29919 and b9a8fb6.

📒 Files selected for processing (2)
  • docs/v2/documentation/core-concepts/summarizer.mdx (1 hunks)
  • src/crud/message.py (5 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
src/**/*.py

📄 CodeRabbit Inference Engine (CLAUDE.md)

src/**/*.py: Follow isort conventions with absolute imports preferred
snake_case for variables/functions; PascalCase for classes
Line length: 88 chars (Black compatible)
Explicit error handling with appropriate exception types
Docstrings: Use Google style docstrings
Use environment variables via python-dotenv (.env)
Use specific exception types (ResourceNotFoundException, ValidationException, etc.)
Proper logging with context instead of print statements

Files:

  • src/crud/message.py
🧠 Learnings (2)
docs/v2/documentation/core-concepts/summarizer.mdx (2)

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/{models,crud}.py : Token counting on messages for usage tracking

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/utils/history.py : Session history summarization: short every 20 messages, long every 60

src/crud/message.py (4)

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/{models,crud}.py : Token counting on messages for usage tracking

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/models.py : Use explicit type hints with SQLAlchemy mapped_column annotations

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/models.py : All tables use text IDs (nanoid format) as primary keys

Learnt from: CR
PR: plastic-labs/honcho#0
File: CLAUDE.md:0-0
Timestamp: 2025-07-21T14:51:09.913Z
Learning: Applies to src/routers/messages.py : Batch message creation supports up to 100 messages

🪛 LanguageTool
docs/v2/documentation/core-concepts/summarizer.mdx

[grammar] ~7-~7: Use correct spacing
Context: ...n such a way that the resulting context is: * Exhaustive: the combination of recent me...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~13-~13: Use correct spacing
Context: ... build, so Honcho comes with a built-in solution. ### Creating Summaries Honcho already has a...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~15-~15: Use correct spacing
Context: ...with a built-in solution. ### Creating Summaries Honcho already has an asynchronous task ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~17-~17: Use correct spacing
Context: ...ly, Honcho has two configurable summary types: * Short summaries: by default, enqueued ev...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~19-~19: Use proper capitalization
Context: ... 20 messages and given a token limit of 1000 * Long summaries: by default, enqueued eve...

(QB_NEW_EN_OTHER_ERROR_IDS_6)


[grammar] ~20-~20: There might be a mistake here.
Context: ... 60 messages and given a token limit of 4000 Both summaries are designed to be exhaus...

(QB_NEW_EN_OTHER)


[grammar] ~22-~22: Use correct spacing
Context: ...essages while still covering the entire conversation. For example, if message 160 in a convers...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~24-~24: Use correct spacing
Context: ...summary type: new summaries replace old ones. It's important to keep in mind that summ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~26-~26: Use correct spacing
Context: ...40, and so on, in our desired recursive way. ### Retrieving Summaries Summaries are retr...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~28-~28: Use correct spacing
Context: ... desired recursive way. ### Retrieving Summaries Summaries are retrieved from the session...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~30-~30: Use correct spacing
Context: ...et_contextmethod. This method has two parameters: *summary`: A boolean indicating whether to includ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~33-~33: There might be a mistake here.
Context: ...uired to create exhaustive conversation coverage.** The return type is simply a list of rece...

(QB_NEW_EN_OTHER)


[grammar] ~35-~35: Use correct spacing
Context: ...ize for recent messages and 40% for the summary. There's a critical trade-off to understa...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~37-~37: Use correct spacing
Context: ... and token usage. Let's go through some scenarios: * If the last message contains more toke...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[typographical] ~39-~39: To join two clauses or set off examples, consider using an em dash.
Context: ... limit, no summary or message list is possible -- both will be empty. * If the *last few ...

(QB_NEW_EN_DASH_RULE_EM)


[grammar] ~39-~39: Use correct spacing
Context: ...essage list is possible -- both will be empty. * If the last few messages contain more ...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[typographical] ~41-~41: To join two clauses or set off examples, consider using an em dash.
Context: ... the context token limit, no summary is possible -- the context will only contain the last 1...

(QB_NEW_EN_DASH_RULE_EM)


[grammar] ~41-~41: Use correct spacing
Context: ...t 1 or 2 messages that fit in the token limit. * If the summaries contain more tokens tha...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[style] ~43-~43: Three successive sentences begin with the same word. Consider rewording the sentence or use a thesaurus to find a synonym.
Context: ...essages that fit in the token limit. * If the summaries contain more tokens than ...

(ENGLISH_WORD_REPEAT_BEGINNING_RULE)


[typographical] ~43-~43: To join two clauses or set off examples, consider using an em dash.
Context: ... the context token limit, no summary is possible -- the context will only contain the X most...

(QB_NEW_EN_DASH_RULE_EM)


[grammar] ~43-~43: Use correct spacing
Context: ...larger than the configured long summary size. The above scenarios indicate where summa...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[grammar] ~45-~45: There might be a mistake here.
Context: ...ios indicate where summarization is not possible -- therefore, the context retrieved will a...

(QB_NEW_EN_OTHER)


[grammar] ~45-~45: Use correct spacing
Context: ...rieved will almost certainly not be exhaustive. Sometimes, gaps in context aren't an iss...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[typographical] ~47-~47: To join two clauses or set off examples, consider using an em dash.
Context: ...ur needs. Other cases demand exhaustive context -- don't pass a token limit and just let Ho...

(QB_NEW_EN_DASH_RULE_EM)


[grammar] ~47-~47: There might be a mistake here.
Context: ...er cases demand exhaustive context -- don't pass a token limit and just let Honcho ...

(QB_NEW_EN_OTHER)


[grammar] ~47-~47: Use correct spacing
Context: ...hat can be retrieved (currently 100,000 tokens). As a final note, remember that summaries...

(QB_NEW_EN_OTHER_ERROR_IDS_5)


[style] ~49-~49: To reduce wordiness, try specifying a number or using “many” or “numerous” instead.
Context: ...vailable immediately. If you batch-save a large number of messages, assume that summaries will no...

(LARGE_NUMBER_OF)

🪛 Ruff (0.12.2)
src/crud/message.py

20-20: Trailing comma missing

Add trailing comma

(COM812)

🔇 Additional comments (8)
src/crud/message.py (4)

19-53: LGTM! Well-designed helper function for token limit logic

The _apply_token_limit helper function effectively encapsulates the token limit logic with:

  • Clear separation of concerns
  • Proper use of window functions for running token sum calculation
  • Good docstring following Google style conventions
  • Correct type annotations with ColumnElement[Any] for SQLAlchemy conditions

The implementation correctly handles the complex token limit logic that was previously embedded inline.


182-183: LGTM! Clean refactor to use helper function

The replacement of inline token limit logic with the helper function call improves code maintainability and reusability.


209-209: LGTM! Good parameter addition for consistency

Adding the token_limit parameter aligns get_messages_id_range with the token-aware functionality across the message retrieval functions.


231-248: LGTM! Consistent implementation of token limiting

The refactored logic properly:

  • Builds base conditions as a list for reuse
  • Applies token limit using the helper function when specified
  • Maintains proper ordering (ascending ID for this function)
  • Falls back to standard query when no token limit is provided

The implementation is consistent with the pattern established in get_messages.

docs/v2/documentation/core-concepts/summarizer.mdx (4)

17-26: LGTM! Accurate description of summarization strategy

The documentation correctly describes the asynchronous summarization approach with:

  • Two summary types (short every 20 messages, long every 60 messages)
  • Recursive compression strategy using prior summaries plus new messages
  • Clear explanation of the exhaustive nature through examples

This aligns well with the retrieved learnings about session history summarization patterns.


30-36: LGTM! Clear explanation of context retrieval

The documentation effectively explains:

  • The get_context method parameters and behavior
  • Dynamic token allocation (60% recent messages, 40% summary)
  • The trade-off between exhaustiveness and token limits

The technical details match the implementation patterns described in the AI summary.


37-46: LGTM! Comprehensive scenario coverage

The documentation thoughtfully covers edge cases where summarization isn't possible:

  • Last message exceeds token limit
  • Recent messages exceed token limit
  • Summaries exceed allocated token budget

This provides valuable guidance for developers implementing context retrieval.


47-49: LGTM! Practical guidance for different use cases

The final section provides clear guidance for:

  • When to use token limits vs. exhaustive context
  • How to handle message-only scenarios
  • Important caveats about asynchronous processing

Given the user's previous feedback about token limit documentation, this section appropriately focuses on usage patterns rather than specific configuration details.

Copy link
Copy Markdown
Collaborator

@VVoruganti VVoruganti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just add some more comments in the code to explain the logic, but it all looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants