Skip to content

metrics for deriver / dialectic input + output tokens#274

Merged
VVoruganti merged 9 commits intomainfrom
rajat/DEV-1258
Nov 20, 2025
Merged

metrics for deriver / dialectic input + output tokens#274
VVoruganti merged 9 commits intomainfrom
rajat/DEV-1258

Conversation

@Rajat-Ahuja1997
Copy link
Copy Markdown
Collaborator

@Rajat-Ahuja1997 Rajat-Ahuja1997 commented Nov 18, 2025

Summary by CodeRabbit

  • New Features

    • Streaming responses now emit final-chunk output-token counts.
  • Improvements

    • More granular token metrics: separate input vs. output tracking, per-component labels, and new dialectic/deriver metrics for better observability.
    • Summary and prompt flows now include explicit input-token accounting and prompt-token estimators for more accurate usage reporting.
  • Other

    • Token encoding updated, which may change reported token counts.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 18, 2025

Walkthrough

Token accounting was made explicit: added track_input_tokens and per-component input tracking; streaming chunks carry final output_tokens; removed internal estimated_input_tokens state and parameter from critical analysis paths; adjusted prompts, estimators, and Prometheus metrics to record input/output token components.

Changes

Cohort / File(s) Summary
Deriver core
src/deriver/deriver.py
Added track_input_tokens(...); removed estimated_input_tokens parameter from critical_analysis_call and from CertaintyReasoner constructor/usage; updated batch processing to compute prompt/peer/session component tokens and call track_input_tokens; updated metrics to record only response.output_tokens for outputs.
Deriver prompts
src/deriver/prompts.py
Renamed estimate_base_prompt_tokens()estimate_critical_analysis_prompt_tokens(); added estimate_peer_card_prompt_tokens().
Dialectic prompts & chat
src/dialectic/prompts.py, src/dialectic/chat.py
Added cached estimate_dialectic_prompt_tokens(); dialectic_prompt signature changed (union types, keyword-only observer/observed); chat computes prompt-token estimates and increments Prometheus input/output counters for direct and streaming flows.
Prometheus metrics
src/prometheus.py
Expanded DERIVER_QUEUE_ITEMS_PROCESSED labels to include task_type; renamed/expanded DERIVER_TOKENS_PROCESSED to deriver_tokens_processed_total with labels namespace,task_type,token_type,component; added DIALECTIC_TOKENS_PROCESSED (dialectic_tokens_processed_total).
LLM client streaming
src/utils/clients.py
HonchoLLMCallStreamChunk gains `output_tokens: int
Token utilities
src/utils/tokens.py
Switched tokenizer encoding from cl100k_baseo200k_base; added track_input_tokens(task_type, components) to increment Prometheus input-token counters per component.
Summarizer helpers
src/utils/summarizer.py
Added short/long summary prompt builders, token-estimators, create_short_summary/create_long_summary, track_input_tokens for summaries; summary creation now returns (Summary, is_fallback) and tracks metrics when real LLM calls occur.
Schemas
src/schemas.py
Message token encoding model changed from cl100k_base to o200k_base in validation.
Tests
tests/test_llm_mock.py, tests/utils/test_clients.py
Removed estimated_input_tokens argument in tests; added PromptRepresentation type annotation; updated streaming mocks to include usage/usage_metadata and final-chunk output_tokens expectations.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant Deriver as Deriver.process_representation_tasks_batch
    participant Prom as Prometheus
    participant Reasoner as CertaintyReasoner.reason
    participant LLM as critical_analysis_call
    participant Provider as LLM Provider (streaming/non-streaming)

    Client->>Deriver: submit tasks
    Deriver->>Deriver: estimate components (peer, rep, prompt, new_turns)
    Deriver->>Deriver: get session_context_tokens
    Deriver->>Prom: track_input_tokens(task_type, {peer, rep, prompt, new_turns, session_context})
    Deriver->>Reasoner: run reason()
    Reasoner->>LLM: critical_analysis_call(...)  -- no estimated_input_tokens param
    LLM->>Provider: call provider
    Provider-->>LLM: stream chunks (final chunk may include output_tokens)
    LLM-->>Reasoner: response (response.output_tokens when available)
    Reasoner->>Prom: increment output metric (token_type="output", component="total")
    Reasoner-->>Deriver: representation result
    Deriver-->>Client: final response
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

  • Pay extra attention to:
    • src/utils/clients.py — provider-specific streaming logic and final output_tokens propagation.
    • src/deriver/deriver.py — removal of estimated_input_tokens, new track_input_tokens usage, and CertaintyReasoner signature changes.
    • src/prometheus.py — new/renamed metrics and label consistency at call sites.
    • src/dialectic/prompts.py & src/dialectic/chat.py — prompt signature changes, cached estimator, and callers.

Possibly related PRs

Suggested reviewers

  • VVoruganti
  • dr-frmr

Poem

🐰 I hopped through prompts and counted each byte,

Input pebbles tucked neat, outputs glowing bright.
Streams now whisper tokens on the last tiny drop,
Prometheus tallies — I give a carrot-hop. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning All changes are directly related to token tracking metrics; however, tokenizer model updates (cl100k_base to o200k_base) extend beyond the stated scope of adding metrics. Clarify whether tokenizer model changes in src/schemas.py and src/utils/tokens.py are intentional and necessary for the token tracking implementation, or if they should be addressed separately.
Docstring Coverage ⚠️ Warning Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: implementing metrics for tracking input and output tokens in both the deriver and dialectic components.
Linked Issues check ✅ Passed The PR addresses all coding objectives from DEV-1258: tracking input/output tokens for deriver and dialectic, and confirming LLM provider output token parsing.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch rajat/DEV-1258

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

  • Provide your own instructions using the high_level_summary_instructions setting.
  • Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
  • Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

  1. 📝 Description — Summarize the main change in 50–60 words, explaining what was done.
  2. 📓 References — List relevant issues, discussions, documentation, or related PRs.
  3. 📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.
  4. 📊 Contributor Summary — Include a Markdown table showing contributions:
    | Contributor | Lines Added | Lines Removed | Files Changed |
  5. ✔️ Additional Notes — Add any extra reviewer context.
    Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
tests/test_llm_mock.py (1)

7-13: Fix invalid and ineffective isinstance(..., str | object) checks

Using isinstance(short_result, str | object) / isinstance(long_result, str | object) is problematic:

  • At runtime, str | object is a PEP‑604 union, not a plain type or tuple, so isinstance may raise TypeError depending on the Python version.
  • Even if it were accepted, object matches everything, so the assertion is always true and doesn’t validate anything useful.

Given the subsequent logic already distinguishes between string and “object with content”, a simpler pattern is:

-    # For functions with return_call_response=True, we should get a string or object with content
-    # The existing mock returns a string, so we check if it's a string
-    assert isinstance(short_result, str | object)
-    assert isinstance(long_result, str | object)
-    # If it's not a string, check for content attribute
-    if not isinstance(short_result, str):
-        assert hasattr(short_result, "content")
-    if not isinstance(long_result, str):
-        assert hasattr(long_result, "content")
+    # For functions with return_call_response=True, we should get either a string
+    # or an object with a `content` attribute.
+    assert isinstance(short_result, str) or hasattr(short_result, "content")
+    assert isinstance(long_result, str) or hasattr(long_result, "content")

This avoids runtime issues and directly encodes the behavior you care about.

Also applies to: 23-55, 80-90

src/deriver/deriver.py (1)

166-182: Input token estimation looks good, but task_type label differs between input and output

The input token estimation in process_representation_tasks_batch:

  • uses estimate_base_prompt_tokens() + dynamic components (peer card, working_representation, new turns, session context), and
  • passes their sum into CertaintyReasoner as estimated_input_tokens,

is consistent with the prior deriver estimation approach and should give a reasonable upper bound without double‑counting template overhead. Based on learnings.

In CertaintyReasoner.reason, the subsequent metric increment is:

prometheus.DERIVER_TOKENS_PROCESSED.labels(
    task_type="critical_analysis",
    token_type="input",
).inc(self.estimated_input_tokens)

while the corresponding output tokens for the same logical call are recorded earlier with:

prometheus.DERIVER_TOKENS_PROCESSED.labels(
    task_type="representation",
    token_type="output",
).inc(response.output_tokens)

This means input and output tokens for a single critical‑analysis/representation pass are labeled with different task_type values, which makes per‑task input/output ratios harder to derive.

If the intent is to attribute both input and output tokens to the same logical task, consider aligning the task_type label (e.g., both "representation" or both "critical_analysis"). If they’re meant to be distinct series, a brief comment explaining the distinction would help future readers.

Also applies to: 241-247, 345-348

src/utils/clients.py (1)

882-907: Wire Groq streaming output_tokens by enabling and extracting usage data

Groq's streaming API can return usage tokens on the final chunk, but it requires passing stream_options={"include_usage": true} in the request. The current code at lines 882-907 doesn't include this parameter and doesn't check for chunk.usage on the final chunk, so output tokens are never surfaced.

This creates a parity gap: Anthropic, OpenAI, and Gemini streaming implementations all populate output_tokens on final chunks, allowing dialectic_stream (src/dialectic/chat.py:188) to increment output-token metrics. Groq streaming calls never increment these metrics because output_tokens remains None.

Add stream_options to groq_params and extract completion_tokens from the final chunk's usage field (similar to how OpenAI handles it at line 812), then populate the final chunk's output_tokens field.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2945f6b and c7e97d5.

📒 Files selected for processing (6)
  • src/deriver/deriver.py (2 hunks)
  • src/dialectic/chat.py (6 hunks)
  • src/dialectic/prompts.py (2 hunks)
  • src/prometheus.py (1 hunks)
  • src/utils/clients.py (5 hunks)
  • tests/test_llm_mock.py (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 144
File: src/crud.py:1250-1250
Timestamp: 2025-06-26T18:39:54.942Z
Learning: Rajat-Ahuja1997 is comfortable exposing token counts in MessageCreate schema as a public property when it improves code maintainability and avoids type checking errors.
📚 Learning: 2025-09-25T19:53:21.360Z
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.

Applied to files:

  • src/dialectic/chat.py
  • src/deriver/deriver.py
  • src/dialectic/prompts.py
📚 Learning: 2025-09-05T21:35:05.674Z
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 202
File: src/routers/peers.py:190-193
Timestamp: 2025-09-05T21:35:05.674Z
Learning: In src/routers/peers.py, the streaming chat endpoint yields raw chunk.content from HonchoLLMCallStreamChunk without SSE formatting, and this works correctly with their custom streaming client setup that consumes Anthropic API responses.

Applied to files:

  • src/utils/clients.py
🧬 Code graph analysis (4)
src/dialectic/chat.py (3)
src/dialectic/prompts.py (2)
  • dialectic_prompt (7-144)
  • estimate_dialectic_prompt_tokens (148-167)
src/utils/tokens.py (1)
  • estimate_tokens (6-15)
src/prometheus.py (1)
  • labels (30-34)
src/deriver/deriver.py (1)
src/prometheus.py (1)
  • labels (30-34)
tests/test_llm_mock.py (2)
src/utils/representation.py (1)
  • PromptRepresentation (29-41)
src/deriver/deriver.py (1)
  • critical_analysis_call (37-75)
src/dialectic/prompts.py (1)
src/utils/tokens.py (1)
  • estimate_tokens (6-15)
🪛 Ruff (0.14.5)
src/dialectic/chat.py

104-104: Possible hardcoded password assigned to argument: "token_type"

(S106)


108-108: Possible hardcoded password assigned to argument: "token_type"

(S106)


182-182: Possible hardcoded password assigned to argument: "token_type"

(S106)


186-186: Missing return type annotation for private function log_streaming_response

(ANN202)


190-190: Possible hardcoded password assigned to argument: "token_type"

(S106)

src/deriver/deriver.py

72-72: Possible hardcoded password assigned to argument: "token_type"

(S106)


347-347: Possible hardcoded password assigned to argument: "token_type"

(S106)

src/dialectic/prompts.py

165-165: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (9)
src/utils/clients.py (4)

96-111: Streaming chunk API extension looks consistent

The addition of output_tokens: int | None on HonchoLLMCallStreamChunk with the “only set on the final chunk” contract matches how the provider-specific streaming branches emit a final metadata-only chunk. This should compose cleanly with downstream code that already only cares about content for intermediate chunks.


751-770: Anthropic streaming token extraction is sound

Using anthropic_stream.get_final_message() and pulling usage.output_tokens into the final HonchoLLMCallStreamChunk aligns with Anthropic’s API and keeps intermediate chunks content-only. This should give accurate output token counts without affecting existing consumers that only stream chunk.content.


772-825: Verify stream_options={"include_usage": True} across all AsyncOpenAI backends

The OpenAI streaming branch now unconditionally passes stream_options={"include_usage": True} and expects chunk.usage.completion_tokens on a dedicated usage chunk. That’s correct for first‑party OpenAI, but AsyncOpenAI() is also used for "custom" and "vllm" providers here.

If any OpenAI‑compatible backend doesn’t support stream_options or the usage field on streamed chunks, this path could raise or silently never set output_tokens (only triggering the “stream ended without usage chunk” warning).

I recommend verifying:

  • That each configured AsyncOpenAI backend (OpenAI, custom base_url, vLLM) accepts stream_options.include_usage.
  • That they actually emit a usage chunk with completion_tokens.

If not, we may need a provider‑aware branch (only set stream_options for known‑supporting providers) or a fallback that estimates tokens instead of relying on streamed usage.


826-880: Gemini streaming output token handling aligns with non‑streaming, but confirm field shape

The streaming Gemini branch pulls final_chunk.usage_metadata.candidates_token_count into gemini_output_tokens, consistent with the non‑streaming path. Assuming candidates_token_count is an int (or optional int) as in the client library, this will correctly populate output_tokens on the final chunk while leaving earlier chunks untouched.

It’s worth double‑checking against the exact google.genai version that candidates_token_count isn’t a list or nested structure; if it is, we’d need to sum or index it rather than pass it through directly.

src/dialectic/chat.py (1)

63-74: Dialectic input/output token metrics for non‑streaming calls look correct

The pattern of:

  • deriving base_prompt_tokens = estimate_dialectic_prompt_tokens(), then
  • estimating dynamic input tokens from the concatenation of query, working_representation, history, and peer cards, and
  • incrementing DIALECTIC_TOKENS_PROCESSED once for token_type="input" and once for token_type="output"

matches the deriver’s established approach of “base template + dynamic inputs” and should avoid double‑counting prompt wrapper overhead. The metric wiring and label usage also align with the new DIALECTIC_TOKENS_PROCESSED definition.

Based on learnings

Also applies to: 86-110

src/prometheus.py (1)

96-110: New token metrics look consistent; confirm intentional deriver metric rename

The updated metrics definitions are coherent:

  • DERIVER_TOKENS_PROCESSED: "deriver_tokens_processed_total" with labels (namespace, task_type, token_type) matches how deriver code now passes both task_type and token_type.
  • DIALECTIC_TOKENS_PROCESSED: "dialectic_tokens_processed_total" with (namespace, token_type) matches the dialectic usage.

One thing to double‑check: renaming the deriver metric from its previous name (*_tokens_processed_total) is a breaking change for any existing Prometheus rules, dashboards, or alerts. If this rename is intentional, it’s worth noting in release notes or migration docs so observability doesn’t silently regress.

Also applies to: 112-124

src/deriver/deriver.py (1)

36-75: Deriver output token tracking is wired correctly through critical_analysis_call

critical_analysis_call now returns a PromptRepresentation and increments DERIVER_TOKENS_PROCESSED with:

  • task_type="representation"
  • token_type="output"
  • amount response.output_tokens

This matches the new metric signature and keeps the output token accounting close to the actual LLM call site, which is a good place for this responsibility.

src/dialectic/prompts.py (2)

1-1: LGTM: Imports support token estimation functionality.

The new imports correctly support the base prompt token estimation feature, with cache enabling efficient caching and estimate_tokens providing the token counting utility.

Also applies to: 4-5


147-168: LGTM: Token estimation follows established pattern with appropriate error handling.

The implementation correctly follows the deriver's approach of estimating base prompt tokens with empty inputs to avoid double-counting. The @cache decorator is well-suited since the base template only changes on redeploys. The broad exception handling flagged by static analysis is intentionally designed for robustness—it ensures the metrics pipeline doesn't fail on estimation errors and provides a conservative fallback for capacity planning.

Based on learnings.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)
src/prometheus.py (1)

84-112: Update DERIVER_TOKENS_PROCESSED doc comment to reflect new usage.

The metric is now incremented from both the deriver (task_type="representation") and the summarizer (task_type="summary"), not just “after the critical analysis call is made”. Consider documenting the broader usage so future readers don’t assume it’s deriver‑only.

tests/utils/test_clients.py (3)

218-265: Consider asserting on output tokens in Anthropic streaming test.

You now mock mock_final_message.usage.output_tokens for the Anthropic stream, but the test only checks content and finish reasons. To fully exercise the new token‑parsing behavior, consider asserting that the final chunk (or aggregate result) exposes the expected output token count derived from usage.


627-679: Add coverage for Google/Gemini streaming output tokens.

The final Google streaming chunk now carries usage_metadata.candidates_token_count=35, but the test doesn’t assert any token value coming out of handle_streaming_response. Adding an assertion that the surfaced token count matches candidates_token_count would better validate the new output‑token parsing for this provider.


950-989: Extend main streaming call test to verify output token propagation.

test_streaming_call now wires mock_final_message.usage.output_tokens=28, but doesn’t assert that this value is available to callers (e.g., on the final chunk or in any aggregation). Given the PR’s focus on token metrics, consider adding an assertion that whatever surface you’ve chosen for streaming output tokens reflects this value.

src/utils/summarizer.py (1)

362-377: Remove debug print statements from _create_and_save_summary.

The print("latest_summary", latest_summary) and related token‑count print calls are debug‑style output and will pollute logs/stdout in production. These should either be removed or converted to logger.debug with an appropriate message key.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c7e97d5 and 6ef8278.

📒 Files selected for processing (4)
  • src/deriver/deriver.py (4 hunks)
  • src/prometheus.py (1 hunks)
  • src/utils/summarizer.py (7 hunks)
  • tests/utils/test_clients.py (3 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 144
File: src/crud.py:1250-1250
Timestamp: 2025-06-26T18:39:54.942Z
Learning: Rajat-Ahuja1997 is comfortable exposing token counts in MessageCreate schema as a public property when it improves code maintainability and avoids type checking errors.
📚 Learning: 2025-09-25T19:53:21.360Z
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.

Applied to files:

  • src/prometheus.py
  • src/deriver/deriver.py
  • src/utils/summarizer.py
📚 Learning: 2025-10-09T20:37:29.932Z
Learnt from: dr-frmr
Repo: plastic-labs/honcho PR: 230
File: src/deriver/deriver.py:439-444
Timestamp: 2025-10-09T20:37:29.932Z
Learning: In src/deriver/deriver.py, the team prefers to use `accumulate_metric` with unit "blob" to log the full peer card content for debugging purposes. The content size is acceptable, and the logging infrastructure is designed to handle blob metrics separately from numeric metrics.

Applied to files:

  • src/deriver/deriver.py
🧬 Code graph analysis (2)
src/deriver/deriver.py (4)
src/utils/summarizer.py (1)
  • track_input_tokens (787-809)
src/prometheus.py (1)
  • labels (30-34)
src/utils/tokens.py (1)
  • estimate_tokens (6-15)
src/utils/representation.py (1)
  • is_empty (129-133)
src/utils/summarizer.py (6)
src/utils/clients.py (5)
  • HonchoLLMCallResponse (80-93)
  • honcho_llm_call (114-130)
  • honcho_llm_call (134-149)
  • honcho_llm_call (153-168)
  • honcho_llm_call (172-324)
src/utils/logging.py (4)
  • accumulate_metric (152-165)
  • conditional_observe (36-38)
  • conditional_observe (42-45)
  • conditional_observe (48-80)
src/utils/tokens.py (1)
  • estimate_tokens (6-15)
src/models.py (1)
  • Message (204-277)
src/deriver/deriver.py (1)
  • track_input_tokens (36-72)
src/prometheus.py (1)
  • labels (30-34)
🪛 Ruff (0.14.5)
src/deriver/deriver.py

46-46: Possible hardcoded password assigned to argument: "token_type"

(S106)


52-52: Possible hardcoded password assigned to argument: "token_type"

(S106)


58-58: Possible hardcoded password assigned to argument: "token_type"

(S106)


64-64: Possible hardcoded password assigned to argument: "token_type"

(S106)


70-70: Possible hardcoded password assigned to argument: "token_type"

(S106)


111-111: Possible hardcoded password assigned to argument: "token_type"

(S106)


249-250: Explicitly concatenated string should be implicitly concatenated

Remove redundant '+' operator to implicitly concatenate

(ISC003)


249-250: Logging statement uses +

(G003)

src/utils/summarizer.py

161-161: Do not catch blind exception: Exception

(BLE001)


177-177: Do not catch blind exception: Exception

(BLE001)


394-394: Possible hardcoded password assigned to argument: "token_type"

(S106)


795-795: Possible hardcoded password assigned to argument: "token_type"

(S106)


801-801: Possible hardcoded password assigned to argument: "token_type"

(S106)


807-807: Possible hardcoded password assigned to argument: "token_type"

(S106)

🔇 Additional comments (4)
src/utils/summarizer.py (1)

787-809: Summary input‑token tracking helper looks consistent with deriver metrics.

The track_input_tokens helper neatly decomposes summary inputs into base_prompt, messages, and previous_summary components and uses task_type="summary" to distinguish from representation. This aligns with the expanded label set on DERIVER_TOKENS_PROCESSED and should make downstream PromQL reasonably straightforward.

src/deriver/deriver.py (3)

36-73: Input token tracking helper for representation aligns with prior design.

Breaking representation inputs into peer_card, working_representation, base_prompt, new_turns, and session_context components under task_type="representation" gives you the dimensionality requested in the earlier review while keeping label cardinality bounded. Looks good.


109-113: Output‑token metric on critical_analysis_call is well‑placed.

Incrementing DERIVER_TOKENS_PROCESSED for task_type="representation", token_type="output", component="total" using response.output_tokens cleanly ties LLM output usage to the deriver call. This should be easy to consume alongside the new input‑component metrics.


222-247: Confirm intended use of estimated_input_tokens now that it’s only for logging.

After removing it from critical_analysis_call, estimated_input_tokens is only used in the debug log (and to compute available_context_tokens). That’s fine, but worth confirming there’s no remaining consumer expecting this value elsewhere (e.g., observability dashboards). If nothing external depends on it, the current usage is coherent.

Comment on lines +149 to +180

@cache
def estimate_short_summary_base_prompt_tokens() -> int:
"""Estimate tokens for the base short summary prompt (without messages/previous_summary)."""
try:
return estimate_tokens(
short_summary_prompt(
messages=[],
output_words=0,
previous_summary_text="",
)
)
except Exception:
# Return a rough estimate if estimation fails
return 200


@cache
def estimate_long_summary_base_prompt_tokens() -> int:
"""Estimate tokens for the base long summary prompt (without messages/previous_summary)."""
try:
return estimate_tokens(
long_summary_prompt(
messages=[],
output_words=0,
previous_summary_text="",
)
)
except Exception:
# Return a rough estimate if estimation fails
return 200

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Narrow or log the broad exception in base‑prompt token estimators.

Both estimate_short_summary_base_prompt_tokens and estimate_long_summary_base_prompt_tokens catch a bare Exception and silently fall back to 200. That’s fine as a last resort, but you might want to either (a) catch only the expected estimation failures or (b) at least log the exception so systematic tokenizer issues don’t go unnoticed.

🧰 Tools
🪛 Ruff (0.14.5)

161-161: Do not catch blind exception: Exception

(BLE001)


177-177: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In src/utils/summarizer.py around lines 149 to 180, the two base-prompt token
estimators swallow all exceptions and silently return 200; update the except
blocks to at least log the caught exception (or narrow to a specific
tokenizer/estimation error if available). Add a module-level logger (import
logging; logger = logging.getLogger(__name__)), then change each except to
except Exception as e: logger.exception("Failed to estimate base prompt tokens,
returning fallback"); return 200 (or replace Exception with the concrete
estimation error type if your tokenizer library exposes one).
Comment on lines +182 to +207
@conditional_observe(name="Create Short Summary")
async def create_short_summary(
messages: list[models.Message],
input_tokens: int,
previous_summary: str | None = None,
) -> HonchoLLMCallResponse[str]:
# input_tokens indicates how many tokens the message list + previous summary take up
# we want to optimize short summaries to be smaller than the actual content being summarized
# so we ask the agent to produce a word count roughly equal to either the input, or the max
# size if the input is larger. the word/token ratio is roughly 4:3 so we multiply by 0.75.
# LLMs *seem* to respond better to getting asked for a word count but should workshop this.
output_words = int(min(input_tokens, settings.SUMMARY.MAX_TOKENS_SHORT) * 0.75)

if previous_summary:
previous_summary_text = previous_summary
else:
previous_summary_text = "There is no previous summary -- the messages are the beginning of the conversation."

prompt = short_summary_prompt(messages, output_words, previous_summary_text)

return await honcho_llm_call(
llm_settings=settings.SUMMARY,
prompt=prompt,
max_tokens=settings.SUMMARY.MAX_TOKENS_SHORT,
)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Double‑check how default “no previous summary” text is accounted in token metrics.

When previous_summary is None, you inject a fixed sentence into <previous_summary>���</previous_summary>, but previous_summary_tokens is 0 in that case and estimate_*_base_prompt_tokens() was computed with previous_summary_text="". That default text’s tokens therefore aren’t counted in any of the summary input components. If you care about precise accounting, consider either:

  • Treating that default sentence as part of the base prompt (update the estimator to include it), or
  • Treating it as “previous_summary” and adding its estimated tokens into previous_summary_tokens.

Would you like a small helper to estimate and include those tokens when latest_summary is absent?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/utils/tokens.py (1)

3-3: Consistent tokenizer change - same verification needed.

The tokenizer has been updated to o200k_base, consistent with the change in src/schemas.py. This module-level change affects all token estimation throughout the codebase, including deriver and dialectic components mentioned in the PR objectives.

This is the same encoding change as in src/schemas.py line 139, so the same verification questions apply here regarding model compatibility and alignment with actual token consumption.

Based on learnings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6ef8278 and e92bc51.

📒 Files selected for processing (2)
  • src/schemas.py (1 hunks)
  • src/utils/tokens.py (1 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 144
File: src/crud.py:1250-1250
Timestamp: 2025-06-26T18:39:54.942Z
Learning: Rajat-Ahuja1997 is comfortable exposing token counts in MessageCreate schema as a public property when it improves code maintainability and avoids type checking errors.
📚 Learning: 2025-09-25T19:52:22.185Z
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:14-14
Timestamp: 2025-09-25T19:52:22.185Z
Learning: The Honcho project uses a fine-tuned model with a distinct tokenizer that differs from standard tiktoken encodings, so they use cl100k encoding as a consistent approximation for token counting rather than model-specific encodings.

Applied to files:

  • src/utils/tokens.py
📚 Learning: 2025-09-25T19:50:53.145Z
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/config.py:223-224
Timestamp: 2025-09-25T19:50:53.145Z
Learning: In src/config.py, the MAX_INPUT_TOKENS field has le=23000 constraint which is intentional due to model limitations, not a configuration restriction. The underlying AI model has a hard cap of 23000 tokens for input.

Applied to files:

  • src/utils/tokens.py
🧬 Code graph analysis (1)
src/schemas.py (1)
src/embedding_client.py (1)
  • encoding (389-391)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test-python
Copy link
Copy Markdown
Collaborator

@VVoruganti VVoruganti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logic all looks good but can we also instrument the llm call that makes a new peer card. Can label it as its own task type in prometheus even though it runs as a part of the representation task.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/utils/tokens.py (1)

3-33: Input-token helper and encoder change look consistent; consider minor hardening

  • Using o200k_base as the shared tokenizer and reusing estimate_tokens here matches the project’s decision to use o200k for LLM/message accounting and keep embedding counting separate, so no change needed on that front. Based on learnings.
  • track_input_tokens correctly fans out per‑component counts into DERIVER_TOKENS_PROCESSED with labels task_type, token_type, and component, which matches the metric’s label schema and NamespacedCounter’s implicit namespace.
  • If there’s any chance a caller might pass a negative token_count (e.g., from subtractive adjustments), Counter.inc will raise; if that’s a realistic future risk, you could defensively skip or clamp negatives here, otherwise a brief comment that counts are guaranteed non‑negative would document the assumption.
  • The Ruff S106 warning on "token_type" is just linter noise for this metric label; if it’s blocking CI, you’ll need a # noqa: S106 rather than # nosec B106, which only affects Bandit.
♻️ Duplicate comments (2)
src/utils/summarizer.py (1)

150-163: Narrow or log the broad exception in base-prompt token estimators.

Both estimate_short_summary_prompt_tokens and estimate_long_summary_prompt_tokens catch a bare Exception and silently fall back to conservative estimates (200). Consider either catching only expected estimation failures or logging the exception so systematic tokenizer issues don't go unnoticed.

src/deriver/deriver.py (1)

227-236: Minor: Use implicit string concatenation in debug log.

The debug log uses explicit + for string concatenation, which triggers lint warnings (ISC003/G003). Use implicit concatenation instead by placing the string literals adjacent.

Apply this diff:

     logger.debug(
-        "Token estimation - Peer card: %d, Working rep: %d, Base prompt: %d, "
-        + "New turns: %d, Session context: %d, Total estimated: %d",
+        "Token estimation - Peer card: %d, Working rep: %d, Base prompt: %d, "
+        "New turns: %d, Session context: %d, Total estimated: %d",
         peer_card_tokens,
         working_rep_tokens,
         prompt_tokens,
📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e92bc51 and 5c0234c.

📒 Files selected for processing (7)
  • src/deriver/deriver.py (6 hunks)
  • src/deriver/prompts.py (1 hunks)
  • src/dialectic/chat.py (6 hunks)
  • src/dialectic/prompts.py (2 hunks)
  • src/prometheus.py (2 hunks)
  • src/utils/summarizer.py (8 hunks)
  • src/utils/tokens.py (2 hunks)
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 274
File: src/schemas.py:139-139
Timestamp: 2025-11-19T18:39:52.512Z
Learning: In the Honcho project, message token counting uses o200k_base (for gpt-4o-mini LLM) while embedding token counting uses cl100k_base (for text-embedding-3-small/gemini-embedding-001). These do not need to align because they serve different purposes: LLM consumption tracking vs embedding API input validation.
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 144
File: src/crud.py:1250-1250
Timestamp: 2025-06-26T18:39:54.942Z
Learning: Rajat-Ahuja1997 is comfortable exposing token counts in MessageCreate schema as a public property when it improves code maintainability and avoids type checking errors.
📚 Learning: 2025-09-25T19:53:21.360Z
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.

Applied to files:

  • src/dialectic/chat.py
  • src/deriver/prompts.py
  • src/dialectic/prompts.py
  • src/deriver/deriver.py
  • src/utils/summarizer.py
  • src/utils/tokens.py
📚 Learning: 2025-10-09T20:37:29.932Z
Learnt from: dr-frmr
Repo: plastic-labs/honcho PR: 230
File: src/deriver/deriver.py:439-444
Timestamp: 2025-10-09T20:37:29.932Z
Learning: In src/deriver/deriver.py, the team prefers to use `accumulate_metric` with unit "blob" to log the full peer card content for debugging purposes. The content size is acceptable, and the logging infrastructure is designed to handle blob metrics separately from numeric metrics.

Applied to files:

  • src/deriver/deriver.py
📚 Learning: 2025-11-19T18:39:52.512Z
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 274
File: src/schemas.py:139-139
Timestamp: 2025-11-19T18:39:52.512Z
Learning: In the Honcho project, message token counting uses o200k_base (for gpt-4o-mini LLM) while embedding token counting uses cl100k_base (for text-embedding-3-small/gemini-embedding-001). These do not need to align because they serve different purposes: LLM consumption tracking vs embedding API input validation.

Applied to files:

  • src/utils/tokens.py
📚 Learning: 2025-09-25T19:50:53.145Z
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/config.py:223-224
Timestamp: 2025-09-25T19:50:53.145Z
Learning: In src/config.py, the MAX_INPUT_TOKENS field has le=23000 constraint which is intentional due to model limitations, not a configuration restriction. The underlying AI model has a hard cap of 23000 tokens for input.

Applied to files:

  • src/utils/tokens.py
📚 Learning: 2025-09-25T19:52:22.185Z
Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:14-14
Timestamp: 2025-09-25T19:52:22.185Z
Learning: The Honcho project uses a fine-tuned model with a distinct tokenizer that differs from standard tiktoken encodings, so they use cl100k encoding as a consistent approximation for token counting rather than model-specific encodings.

Applied to files:

  • src/utils/tokens.py
🧬 Code graph analysis (6)
src/dialectic/chat.py (3)
src/dialectic/prompts.py (2)
  • dialectic_prompt (7-144)
  • estimate_dialectic_prompt_tokens (148-167)
src/utils/tokens.py (1)
  • estimate_tokens (8-17)
src/prometheus.py (1)
  • labels (30-34)
src/deriver/prompts.py (2)
src/utils/representation.py (1)
  • Representation (101-319)
src/utils/tokens.py (1)
  • estimate_tokens (8-17)
src/dialectic/prompts.py (1)
src/utils/tokens.py (1)
  • estimate_tokens (8-17)
src/deriver/deriver.py (4)
src/utils/tokens.py (2)
  • estimate_tokens (8-17)
  • track_input_tokens (20-33)
src/deriver/prompts.py (3)
  • critical_analysis_prompt (16-146)
  • estimate_critical_analysis_prompt_tokens (216-234)
  • estimate_peer_card_prompt_tokens (238-252)
src/utils/representation.py (2)
  • str_no_timestamps (77-79)
  • str_no_timestamps (200-229)
src/prometheus.py (1)
  • labels (30-34)
src/utils/summarizer.py (6)
src/utils/clients.py (5)
  • HonchoLLMCallResponse (80-93)
  • honcho_llm_call (114-130)
  • honcho_llm_call (134-149)
  • honcho_llm_call (153-168)
  • honcho_llm_call (172-324)
src/utils/logging.py (3)
  • conditional_observe (36-38)
  • conditional_observe (42-45)
  • conditional_observe (48-80)
src/utils/tokens.py (2)
  • estimate_tokens (8-17)
  • track_input_tokens (20-33)
src/schemas.py (2)
  • Message (154-168)
  • Summary (236-250)
src/models.py (1)
  • Message (204-277)
src/prometheus.py (1)
  • labels (30-34)
src/utils/tokens.py (1)
src/prometheus.py (1)
  • labels (30-34)
🪛 Ruff (0.14.5)
src/dialectic/chat.py

104-104: Possible hardcoded password assigned to argument: "token_type"

(S106)


108-108: Possible hardcoded password assigned to argument: "token_type"

(S106)


182-182: Possible hardcoded password assigned to argument: "token_type"

(S106)


186-186: Missing return type annotation for private function log_streaming_response

(ANN202)


191-191: Possible hardcoded password assigned to argument: "token_type"

(S106)

src/deriver/prompts.py

232-232: Do not catch blind exception: Exception

(BLE001)


250-250: Do not catch blind exception: Exception

(BLE001)

src/dialectic/prompts.py

165-165: Do not catch blind exception: Exception

(BLE001)

src/deriver/deriver.py

73-73: Possible hardcoded password assigned to argument: "token_type"

(S106)


119-119: Possible hardcoded password assigned to argument: "token_type"

(S106)


228-229: Explicitly concatenated string should be implicitly concatenated

Remove redundant '+' operator to implicitly concatenate

(ISC003)


228-229: Logging statement uses +

(G003)

src/utils/summarizer.py

161-161: Do not catch blind exception: Exception

(BLE001)


177-177: Do not catch blind exception: Exception

(BLE001)


394-394: Possible hardcoded password assigned to argument: "token_type"

(S106)

src/utils/tokens.py

31-31: Possible hardcoded password assigned to argument: "token_type"

(S106)

🔇 Additional comments (14)
src/dialectic/chat.py (2)

139-195: Streaming dialectic token metrics wiring looks correct and consistent

The streaming path now mirrors the non‑streaming estimation (estimate_dialectic_prompt_tokens + estimate_tokens on dynamic inputs) and increments input tokens once before streaming, which is consistent with how you handle deriver token accounting. The wrapper that checks if chunk.is_done and chunk.output_tokens is not None before incrementing output tokens avoids the earlier zero/None edge case, and the TODO about Groq documents the current provider gap without affecting behavior.

No further changes needed here.


63-109: ****

The review comment incorrectly assumes honcho_llm_call() can return a response with output_tokens=None. Examining the actual implementation confirms the opposite: all non-streaming code paths in honcho_llm_call() return output_tokens as an int—either the real token count or 0 as a fallback. The type annotation correctly declares output_tokens: int (non-optional) for HonchoLLMCallResponse. Only streaming chunks, typed separately as HonchoLLMCallStreamChunk, have optional output_tokens (int | None), and the streaming path in dialectic_stream already guards correctly. The non-streaming path in dialectic_call (line 109) is safe and follows the same pattern used in deriver.py.

Likely an incorrect or invalid review comment.

src/prometheus.py (1)

84-126: Remove unfounded breaking-change concern; label schemas are correct

All calls to DERIVER_TOKENS_PROCESSED correctly include the component label (component="total" in deriver.py and summarizer.py, dynamic in tokens.py), and DIALECTIC_TOKENS_PROCESSED calls correctly provide only token_type. No evidence of a metric rename exists in the codebase—this appears to be a new metric addition, not a rename of an existing one. The operational warning about breaking dashboards/alerts applies only to actual renames; remove that concern from the review.

Likely an incorrect or invalid review comment.

src/utils/summarizer.py (6)

84-113: LGTM! Well-structured prompt builder.

The short_summary_prompt function is clear, properly parameterized, and uses cleandoc for clean formatting.


116-147: LGTM! Long summary prompt appropriately comprehensive.

The long_summary_prompt includes additional contextual factors (emotional state, personality traits, themes) that are well-suited for thorough summaries.


182-207: LGTM! Summary creation logic is sound.

The create_short_summary function correctly handles the case where no previous summary exists and appropriately calculates output word targets based on input tokens.

Note: A past review flagged that the default "no previous summary" text at line 198 isn't counted in token estimates. This is a known concern that applies to both short and long summary functions.


209-229: LGTM! Consistent with short summary approach.

The create_long_summary function follows the same pattern as create_short_summary with appropriate adjustments for long-form summaries.


366-396: Excellent fallback handling for token metrics!

The conditional token tracking (if not is_fallback) correctly prevents recording metrics when summaries fall back to the hardcoded 50-token placeholder. This addresses the concern from previous reviews about misleading metrics during fallback scenarios.

The granular component tracking (prompt, messages, previous_summary) provides good observability for token usage analysis.

Based on learnings about the project's token tracking approach.


427-486: LGTM! Return type change enables proper fallback detection.

Returning tuple[Summary, bool] allows the caller to distinguish between real LLM-generated summaries and fallback placeholders, which is essential for accurate token metrics. The is_fallback flag is correctly set only in the exception path.

src/deriver/deriver.py (5)

71-75: LGTM! Output-only tracking at call site is appropriate.

Moving input token tracking to the batch level (where all components are available) while keeping output tracking at the call site (where response.output_tokens is available) is a sensible separation of concerns.


106-122: Excellent granular token tracking for peer card generation.

The component breakdown (prompt, old_peer_card, new_observations) provides good observability. The approach follows the established pattern of using cached base prompt estimates plus actual content token counts.

Based on learnings about the deriver token estimation approach.


187-225: LGTM! Comprehensive token estimation across all input components.

The token estimation logic correctly accounts for:

  • Peer card tokens
  • Working representation tokens
  • Base prompt tokens (using cached estimator)
  • New conversation turns tokens
  • Session context tokens (added after retrieval)

The two-phase calculation (initial estimate + session context) is appropriate since session context depends on available token budget.

Based on learnings about the deriver token estimation approach.


239-248: Excellent granular component breakdown for representation task metrics.

The component tracking provides the detailed breakdown requested in earlier discussions (peer_card, working_representation, prompt, new_turns, session_context), enabling precise analysis of token usage patterns.


340-377: LGTM! Removal of estimated_input_tokens parameter simplifies the interface.

The refactoring to track input tokens at the batch level (where all components are known) rather than passing them through the call chain is cleaner. The critical_analysis_call can focus on its core responsibility of generating representations.

Comment on lines 215 to +252
@cache
def estimate_base_prompt_tokens() -> int:
"""Estimate base prompt tokens by calling critical_analysis_prompt with empty values.
def estimate_critical_analysis_prompt_tokens() -> int:
"""Estimate critical analysis prompt tokens by calling critical_analysis_prompt with empty values.

This value is cached since it only changes on redeploys when the prompt template changes.
"""

try:
base_prompt = critical_analysis_prompt(
prompt = critical_analysis_prompt(
peer_id="",
peer_card=None,
message_created_at=datetime.datetime.now(datetime.timezone.utc),
working_representation=Representation(),
history="",
new_turns=[],
)
return estimate_tokens(base_prompt)
return estimate_tokens(prompt)
except Exception:
# Return a conservative estimate if estimation fails
return 500


@cache
def estimate_peer_card_prompt_tokens() -> int:
"""Estimate peer card prompt tokens by calling peer_card_prompt with empty values.

This value is cached since it only changes on redeploys when the prompt template changes.
"""

try:
prompt = peer_card_prompt(
old_peer_card=None,
new_observations="",
)
return estimate_tokens(prompt)
except Exception:
# Return a conservative estimate if estimation fails
return 400
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

New deriver prompt estimators match the base‑prompt pattern; maybe log or annotate broad exceptions

The two estimators follow the same “call prompt with empty values → estimate_tokens → cache” approach you already use elsewhere, so they should avoid double‑counting template overhead and stay stable across redeploys. Based on learnings.

Given these are heuristics, catching Exception and returning conservative fallbacks (500/400) is reasonable, but it does hide any real failures in prompt construction or tokenization. Two small tweaks you might consider:

  • Log the exception at debug/info once before returning the fallback so template issues don’t go completely silent.
  • If Ruff’s BLE001 is enforced, either narrow the exception type or explicitly mark this as intentional with # noqa: BLE001.
🧰 Tools
🪛 Ruff (0.14.5)

232-232: Do not catch blind exception: Exception

(BLE001)


250-250: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In src/deriver/prompts.py around lines 215 to 252, the two estimator functions
swallow all Exceptions and return conservative defaults without any logging or
lint-safety; modify the except blocks to catch Exception as e and log the error
at debug (e.g., process_logger.debug or logging.getLogger(__name__).debug)
including the exception info before returning the fallback, and if Ruff’s BLE001
would complain about broad excepts either narrow the exception type if possible
or keep Exception but append a comment like "# noqa: BLE001" to indicate
intentional broad handling.
Comment on lines +147 to +167
@cache
def estimate_dialectic_prompt_tokens() -> int:
"""Estimate base dialectic prompt tokens by calling dialectic_prompt with empty values.

This value is cached since it only changes on redeploys when the prompt template changes.
"""
try:
prompt = dialectic_prompt(
query="",
working_representation="",
recent_conversation_history=None,
observer_peer_card=None,
observed_peer_card=None,
observer="",
observed="",
)

return estimate_tokens(prompt)
except Exception:
# Return a conservative estimate if estimation fails
return 750
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Cached base dialectic prompt estimator fits the overall token-accounting pattern

estimate_dialectic_prompt_tokens() cleanly centralizes the “empty-input prompt → estimate_tokens → cached” pattern, which lines up with how the deriver handles base prompt overhead and keeps dialectic token estimates consistent between call/stream paths. The 750 fallback also matches the existing hardcoded system‑prompt allowance in chat(), so behavior stays conservative even on failure. Based on learnings.

As with the deriver estimators, the blind except Exception is acceptable for a best‑effort heuristic, but if Ruff’s BLE001 is enforced you may want to either:

  • Log the exception once (e.g., at debug) before returning 750, or
  • Add an explicit # noqa: BLE001 to mark the broad catch as intentional.
🧰 Tools
🪛 Ruff (0.14.5)

165-165: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In src/dialectic/prompts.py around lines 147 to 167 the function uses a bare
except Exception which can trigger Ruff BLE001; fix by either (A) logging the
caught exception once (e.g., import a module logger and call
logger.debug("estimate_dialectic_prompt_tokens failed", exc_info=True) before
returning 750) or (B) explicitly mark the broad catch as intentional by
appending "# noqa: BLE001" to the except line; if you choose logging ensure the
logger is imported/available in the module.
@VVoruganti VVoruganti merged commit 0f1e1de into main Nov 20, 2025
11 checks passed
@VVoruganti VVoruganti deleted the rajat/DEV-1258 branch November 20, 2025 02:20
This was referenced Jan 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants