metrics for deriver / dialectic input + output tokens by Rajat-Ahuja1997 · Pull Request #274 · plastic-labs/honcho

Rajat-Ahuja1997 · 2025-11-18T23:20:09Z

Summary by CodeRabbit

New Features
- Streaming responses now emit final-chunk output-token counts.
Improvements
- More granular token metrics: separate input vs. output tracking, per-component labels, and new dialectic/deriver metrics for better observability.
- Summary and prompt flows now include explicit input-token accounting and prompt-token estimators for more accurate usage reporting.
Other
- Token encoding updated, which may change reported token counts.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-11-18T23:20:18Z

Walkthrough

Token accounting was made explicit: added track_input_tokens and per-component input tracking; streaming chunks carry final output_tokens; removed internal estimated_input_tokens state and parameter from critical analysis paths; adjusted prompts, estimators, and Prometheus metrics to record input/output token components.

Changes

Cohort / File(s)	Summary
Deriver core `src/deriver/deriver.py`	Added `track_input_tokens(...)`; removed `estimated_input_tokens` parameter from `critical_analysis_call` and from `CertaintyReasoner` constructor/usage; updated batch processing to compute prompt/peer/session component tokens and call `track_input_tokens`; updated metrics to record only response.output_tokens for outputs.
Deriver prompts `src/deriver/prompts.py`	Renamed `estimate_base_prompt_tokens()` → `estimate_critical_analysis_prompt_tokens()`; added `estimate_peer_card_prompt_tokens()`.
Dialectic prompts & chat `src/dialectic/prompts.py`, `src/dialectic/chat.py`	Added cached `estimate_dialectic_prompt_tokens()`; `dialectic_prompt` signature changed (union types, keyword-only observer/observed); chat computes prompt-token estimates and increments Prometheus input/output counters for direct and streaming flows.
Prometheus metrics `src/prometheus.py`	Expanded `DERIVER_QUEUE_ITEMS_PROCESSED` labels to include `task_type`; renamed/expanded `DERIVER_TOKENS_PROCESSED` to `deriver_tokens_processed_total` with labels `namespace,task_type,token_type,component`; added `DIALECTIC_TOKENS_PROCESSED` (`dialectic_tokens_processed_total`).
LLM client streaming `src/utils/clients.py`	`HonchoLLMCallStreamChunk` gains `output_tokens: int
Token utilities `src/utils/tokens.py`	Switched tokenizer encoding from `cl100k_base` → `o200k_base`; added `track_input_tokens(task_type, components)` to increment Prometheus input-token counters per component.
Summarizer helpers `src/utils/summarizer.py`	Added short/long summary prompt builders, token-estimators, `create_short_summary`/`create_long_summary`, `track_input_tokens` for summaries; summary creation now returns (Summary, is_fallback) and tracks metrics when real LLM calls occur.
Schemas `src/schemas.py`	Message token encoding model changed from `cl100k_base` to `o200k_base` in validation.
Tests `tests/test_llm_mock.py`, `tests/utils/test_clients.py`	Removed `estimated_input_tokens` argument in tests; added `PromptRepresentation` type annotation; updated streaming mocks to include `usage`/`usage_metadata` and final-chunk `output_tokens` expectations.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant Client
    participant Deriver as Deriver.process_representation_tasks_batch
    participant Prom as Prometheus
    participant Reasoner as CertaintyReasoner.reason
    participant LLM as critical_analysis_call
    participant Provider as LLM Provider (streaming/non-streaming)

    Client->>Deriver: submit tasks
    Deriver->>Deriver: estimate components (peer, rep, prompt, new_turns)
    Deriver->>Deriver: get session_context_tokens
    Deriver->>Prom: track_input_tokens(task_type, {peer, rep, prompt, new_turns, session_context})
    Deriver->>Reasoner: run reason()
    Reasoner->>LLM: critical_analysis_call(...)  -- no estimated_input_tokens param
    LLM->>Provider: call provider
    Provider-->>LLM: stream chunks (final chunk may include output_tokens)
    LLM-->>Reasoner: response (response.output_tokens when available)
    Reasoner->>Prom: increment output metric (token_type="output", component="total")
    Reasoner-->>Deriver: representation result
    Deriver-->>Client: final response

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Pay extra attention to:
- src/utils/clients.py — provider-specific streaming logic and final output_tokens propagation.
- src/deriver/deriver.py — removal of estimated_input_tokens, new track_input_tokens usage, and CertaintyReasoner signature changes.
- src/prometheus.py — new/renamed metrics and label consistency at call sites.
- src/dialectic/prompts.py & src/dialectic/chat.py — prompt signature changes, cached estimator, and callers.

Possibly related PRs

Peer Cards #180 — likely overlaps critical_analysis_call / peer_card changes and deriver token accounting.
create Representation class and use it to unify all formatting #214 — modifies deriver/dialectic call sites and token-tracking utilities; high code-level overlap.
Rollup representation tasks in queue for batch processing #206 — touches process_representation_tasks_batch and CertaintyReasoner signatures; may conflict with batching/token changes.

Suggested reviewers

VVoruganti
dr-frmr

Poem

🐰 I hopped through prompts and counted each byte,

Input pebbles tucked neat, outputs glowing bright.
Streams now whisper tokens on the last tiny drop,
Prometheus tallies — I give a carrot-hop. 🥕

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Out of Scope Changes check	⚠️ Warning	All changes are directly related to token tracking metrics; however, tokenizer model updates (cl100k_base to o200k_base) extend beyond the stated scope of adding metrics.	Clarify whether tokenizer model changes in src/schemas.py and src/utils/tokens.py are intentional and necessary for the token tracking implementation, or if they should be addressed separately.
Docstring Coverage	⚠️ Warning	Docstring coverage is 76.92% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: implementing metrics for tracking input and output tokens in both the deriver and dialectic components.
Linked Issues check	✅ Passed	The PR addresses all coding objectives from DEV-1258: tracking input/output tokens for deriver and dialectic, and confirming LLM provider output token parsing.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch rajat/DEV-1258

Tip

📝 Customizable high-level summaries are now available in beta!

You can now customize how CodeRabbit generates the high-level summary in your pull requests — including its content, structure, tone, and formatting.

Provide your own instructions using the high_level_summary_instructions setting.
Format the summary however you like (bullet lists, tables, multi-section layouts, contributor stats, etc.).
Use high_level_summary_in_walkthrough to move the summary from the description to the walkthrough section.

Example instruction:

"Divide the high-level summary into five sections:

📝 Description — Summarize the main change in 50–60 words, explaining what was done.

📓 References — List relevant issues, discussions, documentation, or related PRs.

📦 Dependencies & Requirements — Mention any new/updated dependencies, environment variable changes, or configuration updates.

📊 Contributor Summary — Include a Markdown table showing contributions:
| Contributor | Lines Added | Lines Removed | Files Changed |

✔️ Additional Notes — Add any extra reviewer context.
Keep each section concise (under 200 words) and use bullet or numbered lists for clarity."

Note: This feature is currently in beta for Pro-tier users, and pricing will be announced later.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)

tests/test_llm_mock.py (1)
7-13: Fix invalid and ineffective isinstance(..., str | object) checks

Using isinstance(short_result, str | object) / isinstance(long_result, str | object) is problematic:

At runtime, str | object is a PEP‑604 union, not a plain type or tuple, so isinstance may raise TypeError depending on the Python version.

Even if it were accepted, object matches everything, so the assertion is always true and doesn’t validate anything useful.

Given the subsequent logic already distinguishes between string and “object with content”, a simpler pattern is:
-    # For functions with return_call_response=True, we should get a string or object with content
-    # The existing mock returns a string, so we check if it's a string
-    assert isinstance(short_result, str | object)
-    assert isinstance(long_result, str | object)
-    # If it's not a string, check for content attribute
-    if not isinstance(short_result, str):
-        assert hasattr(short_result, "content")
-    if not isinstance(long_result, str):
-        assert hasattr(long_result, "content")
+    # For functions with return_call_response=True, we should get either a string
+    # or an object with a `content` attribute.
+    assert isinstance(short_result, str) or hasattr(short_result, "content")
+    assert isinstance(long_result, str) or hasattr(long_result, "content")
This avoids runtime issues and directly encodes the behavior you care about.

Also applies to: 23-55, 80-90
src/deriver/deriver.py (1)
166-182: Input token estimation looks good, but task_type label differs between input and output

The input token estimation in process_representation_tasks_batch:

uses estimate_base_prompt_tokens() + dynamic components (peer card, working_representation, new turns, session context), and

passes their sum into CertaintyReasoner as estimated_input_tokens,

is consistent with the prior deriver estimation approach and should give a reasonable upper bound without double‑counting template overhead. Based on learnings.

In CertaintyReasoner.reason, the subsequent metric increment is:
prometheus.DERIVER_TOKENS_PROCESSED.labels(
    task_type="critical_analysis",
    token_type="input",
).inc(self.estimated_input_tokens)
while the corresponding output tokens for the same logical call are recorded earlier with:
prometheus.DERIVER_TOKENS_PROCESSED.labels(
    task_type="representation",
    token_type="output",
).inc(response.output_tokens)
This means input and output tokens for a single critical‑analysis/representation pass are labeled with different task_type values, which makes per‑task input/output ratios harder to derive.

If the intent is to attribute both input and output tokens to the same logical task, consider aligning the task_type label (e.g., both "representation" or both "critical_analysis"). If they’re meant to be distinct series, a brief comment explaining the distinction would help future readers.

Also applies to: 241-247, 345-348
src/utils/clients.py (1)

882-907: Wire Groq streaming output_tokens by enabling and extracting usage data

Groq's streaming API can return usage tokens on the final chunk, but it requires passing stream_options={"include_usage": true} in the request. The current code at lines 882-907 doesn't include this parameter and doesn't check for chunk.usage on the final chunk, so output tokens are never surfaced.

This creates a parity gap: Anthropic, OpenAI, and Gemini streaming implementations all populate output_tokens on final chunks, allowing dialectic_stream (src/dialectic/chat.py:188) to increment output-token metrics. Groq streaming calls never increment these metrics because output_tokens remains None.

Add stream_options to groq_params and extract completion_tokens from the final chunk's usage field (similar to how OpenAI handles it at line 812), then populate the final chunk's output_tokens field.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 2945f6b and c7e97d5.

📒 Files selected for processing (6)

src/deriver/deriver.py (2 hunks)
src/dialectic/chat.py (6 hunks)
src/dialectic/prompts.py (2 hunks)
src/prometheus.py (1 hunks)
src/utils/clients.py (5 hunks)
tests/test_llm_mock.py (2 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 144
File: src/crud.py:1250-1250
Timestamp: 2025-06-26T18:39:54.942Z
Learning: Rajat-Ahuja1997 is comfortable exposing token counts in MessageCreate schema as a public property when it improves code maintainability and avoids type checking errors.

📚 Learning: 2025-09-25T19:53:21.360Z

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.

Applied to files:

src/dialectic/chat.py
src/deriver/deriver.py
src/dialectic/prompts.py

📚 Learning: 2025-09-05T21:35:05.674Z

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 202
File: src/routers/peers.py:190-193
Timestamp: 2025-09-05T21:35:05.674Z
Learning: In src/routers/peers.py, the streaming chat endpoint yields raw chunk.content from HonchoLLMCallStreamChunk without SSE formatting, and this works correctly with their custom streaming client setup that consumes Anthropic API responses.

Applied to files:

src/utils/clients.py

🧬 Code graph analysis (4)

src/dialectic/chat.py (3)

src/dialectic/prompts.py (2)

dialectic_prompt (7-144)

estimate_dialectic_prompt_tokens (148-167)

src/utils/tokens.py (1)

estimate_tokens (6-15)

src/prometheus.py (1)

labels (30-34)

src/deriver/deriver.py (1)

src/prometheus.py (1)

labels (30-34)

tests/test_llm_mock.py (2)

src/utils/representation.py (1)

PromptRepresentation (29-41)

src/deriver/deriver.py (1)

critical_analysis_call (37-75)

src/dialectic/prompts.py (1)

src/utils/tokens.py (1)

estimate_tokens (6-15)

🪛 Ruff (0.14.5)

src/dialectic/chat.py

104-104: Possible hardcoded password assigned to argument: "token_type"

(S106)

108-108: Possible hardcoded password assigned to argument: "token_type"

(S106)

182-182: Possible hardcoded password assigned to argument: "token_type"

(S106)

186-186: Missing return type annotation for private function log_streaming_response

(ANN202)

190-190: Possible hardcoded password assigned to argument: "token_type"

(S106)

src/deriver/deriver.py

72-72: Possible hardcoded password assigned to argument: "token_type"

(S106)

347-347: Possible hardcoded password assigned to argument: "token_type"

(S106)

src/dialectic/prompts.py

165-165: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (9)

src/utils/clients.py (4)

96-111: Streaming chunk API extension looks consistent

The addition of output_tokens: int | None on HonchoLLMCallStreamChunk with the “only set on the final chunk” contract matches how the provider-specific streaming branches emit a final metadata-only chunk. This should compose cleanly with downstream code that already only cares about content for intermediate chunks.

751-770: Anthropic streaming token extraction is sound

Using anthropic_stream.get_final_message() and pulling usage.output_tokens into the final HonchoLLMCallStreamChunk aligns with Anthropic’s API and keeps intermediate chunks content-only. This should give accurate output token counts without affecting existing consumers that only stream chunk.content.

772-825: Verify stream_options={"include_usage": True} across all AsyncOpenAI backends

The OpenAI streaming branch now unconditionally passes stream_options={"include_usage": True} and expects chunk.usage.completion_tokens on a dedicated usage chunk. That’s correct for first‑party OpenAI, but AsyncOpenAI() is also used for "custom" and "vllm" providers here.

If any OpenAI‑compatible backend doesn’t support stream_options or the usage field on streamed chunks, this path could raise or silently never set output_tokens (only triggering the “stream ended without usage chunk” warning).

I recommend verifying:

That each configured AsyncOpenAI backend (OpenAI, custom base_url, vLLM) accepts stream_options.include_usage.

That they actually emit a usage chunk with completion_tokens.

If not, we may need a provider‑aware branch (only set stream_options for known‑supporting providers) or a fallback that estimates tokens instead of relying on streamed usage.

826-880: Gemini streaming output token handling aligns with non‑streaming, but confirm field shape

The streaming Gemini branch pulls final_chunk.usage_metadata.candidates_token_count into gemini_output_tokens, consistent with the non‑streaming path. Assuming candidates_token_count is an int (or optional int) as in the client library, this will correctly populate output_tokens on the final chunk while leaving earlier chunks untouched.

It’s worth double‑checking against the exact google.genai version that candidates_token_count isn’t a list or nested structure; if it is, we’d need to sum or index it rather than pass it through directly.

src/dialectic/chat.py (1)

63-74: Dialectic input/output token metrics for non‑streaming calls look correct

The pattern of:

deriving base_prompt_tokens = estimate_dialectic_prompt_tokens(), then

estimating dynamic input tokens from the concatenation of query, working_representation, history, and peer cards, and

incrementing DIALECTIC_TOKENS_PROCESSED once for token_type="input" and once for token_type="output"

matches the deriver’s established approach of “base template + dynamic inputs” and should avoid double‑counting prompt wrapper overhead. The metric wiring and label usage also align with the new DIALECTIC_TOKENS_PROCESSED definition.

Based on learnings

Also applies to: 86-110

src/prometheus.py (1)

96-110: New token metrics look consistent; confirm intentional deriver metric rename

The updated metrics definitions are coherent:

DERIVER_TOKENS_PROCESSED: "deriver_tokens_processed_total" with labels (namespace, task_type, token_type) matches how deriver code now passes both task_type and token_type.

DIALECTIC_TOKENS_PROCESSED: "dialectic_tokens_processed_total" with (namespace, token_type) matches the dialectic usage.

One thing to double‑check: renaming the deriver metric from its previous name (*_tokens_processed_total) is a breaking change for any existing Prometheus rules, dashboards, or alerts. If this rename is intentional, it’s worth noting in release notes or migration docs so observability doesn’t silently regress.

Also applies to: 112-124

src/deriver/deriver.py (1)

36-75: Deriver output token tracking is wired correctly through critical_analysis_call

critical_analysis_call now returns a PromptRepresentation and increments DERIVER_TOKENS_PROCESSED with:

task_type="representation"

token_type="output"

amount response.output_tokens

This matches the new metric signature and keeps the output token accounting close to the actual LLM call site, which is a good place for this responsibility.

src/dialectic/prompts.py (2)

1-1: LGTM: Imports support token estimation functionality.

The new imports correctly support the base prompt token estimation feature, with cache enabling efficient caching and estimate_tokens providing the token counting utility.

Also applies to: 4-5

147-168: LGTM: Token estimation follows established pattern with appropriate error handling.

The implementation correctly follows the deriver's approach of estimating base prompt tokens with empty inputs to avoid double-counting. The @cache decorator is well-suited since the base template only changes on redeploys. The broad exception handling flagged by static analysis is intentionally designed for robustness—it ensures the metrics pipeline doesn't fail on estimation errors and provides a conservative fallback for capacity planning.

Based on learnings.

src/dialectic/chat.py

src/deriver/deriver.py

src/utils/clients.py

…ation

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

src/prometheus.py (1)

84-112: Update DERIVER_TOKENS_PROCESSED doc comment to reflect new usage.

The metric is now incremented from both the deriver (task_type="representation") and the summarizer (task_type="summary"), not just “after the critical analysis call is made”. Consider documenting the broader usage so future readers don’t assume it’s deriver‑only.

tests/utils/test_clients.py (3)

218-265: Consider asserting on output tokens in Anthropic streaming test.

You now mock mock_final_message.usage.output_tokens for the Anthropic stream, but the test only checks content and finish reasons. To fully exercise the new token‑parsing behavior, consider asserting that the final chunk (or aggregate result) exposes the expected output token count derived from usage.

627-679: Add coverage for Google/Gemini streaming output tokens.

The final Google streaming chunk now carries usage_metadata.candidates_token_count=35, but the test doesn’t assert any token value coming out of handle_streaming_response. Adding an assertion that the surfaced token count matches candidates_token_count would better validate the new output‑token parsing for this provider.

950-989: Extend main streaming call test to verify output token propagation.

test_streaming_call now wires mock_final_message.usage.output_tokens=28, but doesn’t assert that this value is available to callers (e.g., on the final chunk or in any aggregation). Given the PR’s focus on token metrics, consider adding an assertion that whatever surface you’ve chosen for streaming output tokens reflects this value.

src/utils/summarizer.py (1)

362-377: Remove debug print statements from _create_and_save_summary.

The print("latest_summary", latest_summary) and related token‑count print calls are debug‑style output and will pollute logs/stdout in production. These should either be removed or converted to logger.debug with an appropriate message key.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c7e97d5 and 6ef8278.

📒 Files selected for processing (4)

src/deriver/deriver.py (4 hunks)
src/prometheus.py (1 hunks)
src/utils/summarizer.py (7 hunks)
tests/utils/test_clients.py (3 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 144
File: src/crud.py:1250-1250
Timestamp: 2025-06-26T18:39:54.942Z
Learning: Rajat-Ahuja1997 is comfortable exposing token counts in MessageCreate schema as a public property when it improves code maintainability and avoids type checking errors.

📚 Learning: 2025-09-25T19:53:21.360Z

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.

Applied to files:

src/prometheus.py
src/deriver/deriver.py
src/utils/summarizer.py

📚 Learning: 2025-10-09T20:37:29.932Z

Learnt from: dr-frmr
Repo: plastic-labs/honcho PR: 230
File: src/deriver/deriver.py:439-444
Timestamp: 2025-10-09T20:37:29.932Z
Learning: In src/deriver/deriver.py, the team prefers to use `accumulate_metric` with unit "blob" to log the full peer card content for debugging purposes. The content size is acceptable, and the logging infrastructure is designed to handle blob metrics separately from numeric metrics.

Applied to files:

src/deriver/deriver.py

🧬 Code graph analysis (2)

src/deriver/deriver.py (4)

src/utils/summarizer.py (1)

track_input_tokens (787-809)

src/prometheus.py (1)

labels (30-34)

src/utils/tokens.py (1)

estimate_tokens (6-15)

src/utils/representation.py (1)

is_empty (129-133)

src/utils/summarizer.py (6)

src/utils/clients.py (5)

HonchoLLMCallResponse (80-93)

honcho_llm_call (114-130)

honcho_llm_call (134-149)

honcho_llm_call (153-168)

honcho_llm_call (172-324)

src/utils/logging.py (4)

accumulate_metric (152-165)

conditional_observe (36-38)

conditional_observe (42-45)

conditional_observe (48-80)

src/utils/tokens.py (1)

estimate_tokens (6-15)

src/models.py (1)

Message (204-277)

src/deriver/deriver.py (1)

track_input_tokens (36-72)

src/prometheus.py (1)

labels (30-34)

🪛 Ruff (0.14.5)

src/deriver/deriver.py

46-46: Possible hardcoded password assigned to argument: "token_type"

(S106)

52-52: Possible hardcoded password assigned to argument: "token_type"

(S106)

58-58: Possible hardcoded password assigned to argument: "token_type"

(S106)

64-64: Possible hardcoded password assigned to argument: "token_type"

(S106)

70-70: Possible hardcoded password assigned to argument: "token_type"

(S106)

111-111: Possible hardcoded password assigned to argument: "token_type"

(S106)

249-250: Explicitly concatenated string should be implicitly concatenated

Remove redundant '+' operator to implicitly concatenate

(ISC003)

249-250: Logging statement uses +

(G003)

src/utils/summarizer.py

161-161: Do not catch blind exception: Exception

(BLE001)

177-177: Do not catch blind exception: Exception

(BLE001)

394-394: Possible hardcoded password assigned to argument: "token_type"

(S106)

795-795: Possible hardcoded password assigned to argument: "token_type"

(S106)

801-801: Possible hardcoded password assigned to argument: "token_type"

(S106)

807-807: Possible hardcoded password assigned to argument: "token_type"

(S106)

🔇 Additional comments (4)

src/utils/summarizer.py (1)

787-809: Summary input‑token tracking helper looks consistent with deriver metrics.

The track_input_tokens helper neatly decomposes summary inputs into base_prompt, messages, and previous_summary components and uses task_type="summary" to distinguish from representation. This aligns with the expanded label set on DERIVER_TOKENS_PROCESSED and should make downstream PromQL reasonably straightforward.

src/deriver/deriver.py (3)

36-73: Input token tracking helper for representation aligns with prior design.

Breaking representation inputs into peer_card, working_representation, base_prompt, new_turns, and session_context components under task_type="representation" gives you the dimensionality requested in the earlier review while keeping label cardinality bounded. Looks good.

109-113: Output‑token metric on critical_analysis_call is well‑placed.

Incrementing DERIVER_TOKENS_PROCESSED for task_type="representation", token_type="output", component="total" using response.output_tokens cleanly ties LLM output usage to the deriver call. This should be easy to consume alongside the new input‑component metrics.

222-247: Confirm intended use of estimated_input_tokens now that it’s only for logging.

After removing it from critical_analysis_call, estimated_input_tokens is only used in the debug log (and to compute available_context_tokens). That’s fine, but worth confirming there’s no remaining consumer expecting this value elsewhere (e.g., observability dashboards). If nothing external depends on it, the current usage is coherent.

src/deriver/deriver.py

coderabbitai · 2025-11-19T16:38:03Z

src/utils/summarizer.py

+
+@cache
+def estimate_short_summary_base_prompt_tokens() -> int:
+    """Estimate tokens for the base short summary prompt (without messages/previous_summary)."""
+    try:
+        return estimate_tokens(
+            short_summary_prompt(
+                messages=[],
+                output_words=0,
+                previous_summary_text="",
+            )
+        )
+    except Exception:
+        # Return a rough estimate if estimation fails
+        return 200
+
+
+@cache
+def estimate_long_summary_base_prompt_tokens() -> int:
+    """Estimate tokens for the base long summary prompt (without messages/previous_summary)."""
+    try:
+        return estimate_tokens(
+            long_summary_prompt(
+                messages=[],
+                output_words=0,
+                previous_summary_text="",
+            )
+        )
+    except Exception:
+        # Return a rough estimate if estimation fails
+        return 200
+


⚠️ Potential issue | 🟡 Minor

Narrow or log the broad exception in base‑prompt token estimators.

Both estimate_short_summary_base_prompt_tokens and estimate_long_summary_base_prompt_tokens catch a bare Exception and silently fall back to 200. That’s fine as a last resort, but you might want to either (a) catch only the expected estimation failures or (b) at least log the exception so systematic tokenizer issues don’t go unnoticed.

🧰 Tools

🪛 Ruff (0.14.5)

161-161: Do not catch blind exception: Exception

(BLE001)

177-177: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In src/utils/summarizer.py around lines 149 to 180, the two base-prompt token estimators swallow all exceptions and silently return 200; update the except blocks to at least log the caught exception (or narrow to a specific tokenizer/estimation error if available). Add a module-level logger (import logging; logger = logging.getLogger(__name__)), then change each except to except Exception as e: logger.exception("Failed to estimate base prompt tokens, returning fallback"); return 200 (or replace Exception with the concrete estimation error type if your tokenizer library exposes one).

coderabbitai · 2025-11-19T16:38:03Z

src/utils/summarizer.py

+@conditional_observe(name="Create Short Summary")
+async def create_short_summary(
+    messages: list[models.Message],
+    input_tokens: int,
+    previous_summary: str | None = None,
+) -> HonchoLLMCallResponse[str]:
+    # input_tokens indicates how many tokens the message list + previous summary take up
+    # we want to optimize short summaries to be smaller than the actual content being summarized
+    # so we ask the agent to produce a word count roughly equal to either the input, or the max
+    # size if the input is larger. the word/token ratio is roughly 4:3 so we multiply by 0.75.
+    # LLMs *seem* to respond better to getting asked for a word count but should workshop this.
+    output_words = int(min(input_tokens, settings.SUMMARY.MAX_TOKENS_SHORT) * 0.75)
+
+    if previous_summary:
+        previous_summary_text = previous_summary
+    else:
+        previous_summary_text = "There is no previous summary -- the messages are the beginning of the conversation."
+
+    prompt = short_summary_prompt(messages, output_words, previous_summary_text)
+
+    return await honcho_llm_call(
+        llm_settings=settings.SUMMARY,
+        prompt=prompt,
+        max_tokens=settings.SUMMARY.MAX_TOKENS_SHORT,
+    )
+


🧹 Nitpick | 🔵 Trivial

Double‑check how default “no previous summary” text is accounted in token metrics.

When previous_summary is None, you inject a fixed sentence into <previous_summary>��</previous_summary>, but previous_summary_tokens is 0 in that case and estimate_*_base_prompt_tokens() was computed with previous_summary_text="". That default text’s tokens therefore aren’t counted in any of the summary input components. If you care about precise accounting, consider either:

Treating that default sentence as part of the base prompt (update the estimator to include it), or

Treating it as “previous_summary” and adding its estimated tokens into previous_summary_tokens.

Would you like a small helper to estimate and include those tokens when latest_summary is absent?

src/utils/summarizer.py

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (1)

src/utils/tokens.py (1)

3-3: Consistent tokenizer change - same verification needed.

The tokenizer has been updated to o200k_base, consistent with the change in src/schemas.py. This module-level change affects all token estimation throughout the codebase, including deriver and dialectic components mentioned in the PR objectives.

This is the same encoding change as in src/schemas.py line 139, so the same verification questions apply here regarding model compatibility and alignment with actual token consumption.

Based on learnings.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6ef8278 and e92bc51.

📒 Files selected for processing (2)

src/schemas.py (1 hunks)
src/utils/tokens.py (1 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 144
File: src/crud.py:1250-1250
Timestamp: 2025-06-26T18:39:54.942Z
Learning: Rajat-Ahuja1997 is comfortable exposing token counts in MessageCreate schema as a public property when it improves code maintainability and avoids type checking errors.

📚 Learning: 2025-09-25T19:52:22.185Z

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:14-14
Timestamp: 2025-09-25T19:52:22.185Z
Learning: The Honcho project uses a fine-tuned model with a distinct tokenizer that differs from standard tiktoken encodings, so they use cl100k encoding as a consistent approximation for token counting rather than model-specific encodings.

Applied to files:

src/utils/tokens.py

📚 Learning: 2025-09-25T19:50:53.145Z

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/config.py:223-224
Timestamp: 2025-09-25T19:50:53.145Z
Learning: In src/config.py, the MAX_INPUT_TOKENS field has le=23000 constraint which is intentional due to model limitations, not a configuration restriction. The underlying AI model has a hard cap of 23000 tokens for input.

Applied to files:

src/utils/tokens.py

🧬 Code graph analysis (1)

src/schemas.py (1)

src/embedding_client.py (1)

encoding (389-391)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test-python

src/schemas.py

VVoruganti

Logic all looks good but can we also instrument the llm call that makes a new peer card. Can label it as its own task type in prometheus even though it runs as a part of the representation task.

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/utils/tokens.py (1)

3-33: Input-token helper and encoder change look consistent; consider minor hardening

Using o200k_base as the shared tokenizer and reusing estimate_tokens here matches the project’s decision to use o200k for LLM/message accounting and keep embedding counting separate, so no change needed on that front. Based on learnings.

track_input_tokens correctly fans out per‑component counts into DERIVER_TOKENS_PROCESSED with labels task_type, token_type, and component, which matches the metric’s label schema and NamespacedCounter’s implicit namespace.

If there’s any chance a caller might pass a negative token_count (e.g., from subtractive adjustments), Counter.inc will raise; if that’s a realistic future risk, you could defensively skip or clamp negatives here, otherwise a brief comment that counts are guaranteed non‑negative would document the assumption.

The Ruff S106 warning on "token_type" is just linter noise for this metric label; if it’s blocking CI, you’ll need a # noqa: S106 rather than # nosec B106, which only affects Bandit.

♻️ Duplicate comments (2)

src/utils/summarizer.py (1)

150-163: Narrow or log the broad exception in base-prompt token estimators.

Both estimate_short_summary_prompt_tokens and estimate_long_summary_prompt_tokens catch a bare Exception and silently fall back to conservative estimates (200). Consider either catching only expected estimation failures or logging the exception so systematic tokenizer issues don't go unnoticed.
src/deriver/deriver.py (1)
227-236: Minor: Use implicit string concatenation in debug log.

The debug log uses explicit + for string concatenation, which triggers lint warnings (ISC003/G003). Use implicit concatenation instead by placing the string literals adjacent.

Apply this diff:
     logger.debug(
-        "Token estimation - Peer card: %d, Working rep: %d, Base prompt: %d, "
-        + "New turns: %d, Session context: %d, Total estimated: %d",
+        "Token estimation - Peer card: %d, Working rep: %d, Base prompt: %d, "
+        "New turns: %d, Session context: %d, Total estimated: %d",
         peer_card_tokens,
         working_rep_tokens,
         prompt_tokens,

📜 Review details

Configuration used: CodeRabbit UI

Review profile: ASSERTIVE

Plan: Pro

Disabled knowledge base sources:

Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between e92bc51 and 5c0234c.

📒 Files selected for processing (7)

src/deriver/deriver.py (6 hunks)
src/deriver/prompts.py (1 hunks)
src/dialectic/chat.py (6 hunks)
src/dialectic/prompts.py (2 hunks)
src/prometheus.py (2 hunks)
src/utils/summarizer.py (8 hunks)
src/utils/tokens.py (2 hunks)

🧰 Additional context used

🧠 Learnings (6)

📓 Common learnings

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 274
File: src/schemas.py:139-139
Timestamp: 2025-11-19T18:39:52.512Z
Learning: In the Honcho project, message token counting uses o200k_base (for gpt-4o-mini LLM) while embedding token counting uses cl100k_base (for text-embedding-3-small/gemini-embedding-001). These do not need to align because they serve different purposes: LLM consumption tracking vs embedding API input validation.

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 144
File: src/crud.py:1250-1250
Timestamp: 2025-06-26T18:39:54.942Z
Learning: Rajat-Ahuja1997 is comfortable exposing token counts in MessageCreate schema as a public property when it improves code maintainability and avoids type checking errors.

📚 Learning: 2025-09-25T19:53:21.360Z

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:169-183
Timestamp: 2025-09-25T19:53:21.360Z
Learning: In the deriver token estimation, the approach uses empty inputs to get the base prompt template token count via `estimate_base_prompt_tokens()`, then manually adds tokens for actual inputs (peer_card, working_representation, new_turns). This avoids double-counting wrapper overhead since the base prompt already includes template structure.

Applied to files:

src/dialectic/chat.py
src/deriver/prompts.py
src/dialectic/prompts.py
src/deriver/deriver.py
src/utils/summarizer.py
src/utils/tokens.py

📚 Learning: 2025-10-09T20:37:29.932Z

Learnt from: dr-frmr
Repo: plastic-labs/honcho PR: 230
File: src/deriver/deriver.py:439-444
Timestamp: 2025-10-09T20:37:29.932Z
Learning: In src/deriver/deriver.py, the team prefers to use `accumulate_metric` with unit "blob" to log the full peer card content for debugging purposes. The content size is acceptable, and the logging infrastructure is designed to handle blob metrics separately from numeric metrics.

Applied to files:

src/deriver/deriver.py

📚 Learning: 2025-11-19T18:39:52.512Z

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 274
File: src/schemas.py:139-139
Timestamp: 2025-11-19T18:39:52.512Z
Learning: In the Honcho project, message token counting uses o200k_base (for gpt-4o-mini LLM) while embedding token counting uses cl100k_base (for text-embedding-3-small/gemini-embedding-001). These do not need to align because they serve different purposes: LLM consumption tracking vs embedding API input validation.

Applied to files:

src/utils/tokens.py

📚 Learning: 2025-09-25T19:50:53.145Z

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/config.py:223-224
Timestamp: 2025-09-25T19:50:53.145Z
Learning: In src/config.py, the MAX_INPUT_TOKENS field has le=23000 constraint which is intentional due to model limitations, not a configuration restriction. The underlying AI model has a hard cap of 23000 tokens for input.

Applied to files:

src/utils/tokens.py

📚 Learning: 2025-09-25T19:52:22.185Z

Learnt from: Rajat-Ahuja1997
Repo: plastic-labs/honcho PR: 218
File: src/deriver/deriver.py:14-14
Timestamp: 2025-09-25T19:52:22.185Z
Learning: The Honcho project uses a fine-tuned model with a distinct tokenizer that differs from standard tiktoken encodings, so they use cl100k encoding as a consistent approximation for token counting rather than model-specific encodings.

Applied to files:

src/utils/tokens.py

🧬 Code graph analysis (6)

src/dialectic/chat.py (3)

src/dialectic/prompts.py (2)

dialectic_prompt (7-144)

estimate_dialectic_prompt_tokens (148-167)

src/utils/tokens.py (1)

estimate_tokens (8-17)

src/prometheus.py (1)

labels (30-34)

src/deriver/prompts.py (2)

src/utils/representation.py (1)

Representation (101-319)

src/utils/tokens.py (1)

estimate_tokens (8-17)

src/dialectic/prompts.py (1)

src/utils/tokens.py (1)

estimate_tokens (8-17)

src/deriver/deriver.py (4)

src/utils/tokens.py (2)

estimate_tokens (8-17)

track_input_tokens (20-33)

src/deriver/prompts.py (3)

critical_analysis_prompt (16-146)

estimate_critical_analysis_prompt_tokens (216-234)

estimate_peer_card_prompt_tokens (238-252)

src/utils/representation.py (2)

str_no_timestamps (77-79)

str_no_timestamps (200-229)

src/prometheus.py (1)

labels (30-34)

src/utils/summarizer.py (6)

src/utils/clients.py (5)

HonchoLLMCallResponse (80-93)

honcho_llm_call (114-130)

honcho_llm_call (134-149)

honcho_llm_call (153-168)

honcho_llm_call (172-324)

src/utils/logging.py (3)

conditional_observe (36-38)

conditional_observe (42-45)

conditional_observe (48-80)

src/utils/tokens.py (2)

estimate_tokens (8-17)

track_input_tokens (20-33)

src/schemas.py (2)

Message (154-168)

Summary (236-250)

src/models.py (1)

Message (204-277)

src/prometheus.py (1)

labels (30-34)

src/utils/tokens.py (1)

src/prometheus.py (1)

labels (30-34)

🪛 Ruff (0.14.5)

src/dialectic/chat.py

104-104: Possible hardcoded password assigned to argument: "token_type"

(S106)

108-108: Possible hardcoded password assigned to argument: "token_type"

(S106)

182-182: Possible hardcoded password assigned to argument: "token_type"

(S106)

186-186: Missing return type annotation for private function log_streaming_response

(ANN202)

191-191: Possible hardcoded password assigned to argument: "token_type"

(S106)

src/deriver/prompts.py

232-232: Do not catch blind exception: Exception

(BLE001)

250-250: Do not catch blind exception: Exception

(BLE001)

src/dialectic/prompts.py

165-165: Do not catch blind exception: Exception

(BLE001)

src/deriver/deriver.py

73-73: Possible hardcoded password assigned to argument: "token_type"

(S106)

119-119: Possible hardcoded password assigned to argument: "token_type"

(S106)

228-229: Explicitly concatenated string should be implicitly concatenated

Remove redundant '+' operator to implicitly concatenate

(ISC003)

228-229: Logging statement uses +

(G003)

src/utils/summarizer.py

161-161: Do not catch blind exception: Exception

(BLE001)

177-177: Do not catch blind exception: Exception

(BLE001)

394-394: Possible hardcoded password assigned to argument: "token_type"

(S106)

src/utils/tokens.py

31-31: Possible hardcoded password assigned to argument: "token_type"

(S106)

🔇 Additional comments (14)

src/dialectic/chat.py (2)

139-195: Streaming dialectic token metrics wiring looks correct and consistent

The streaming path now mirrors the non‑streaming estimation (estimate_dialectic_prompt_tokens + estimate_tokens on dynamic inputs) and increments input tokens once before streaming, which is consistent with how you handle deriver token accounting. The wrapper that checks if chunk.is_done and chunk.output_tokens is not None before incrementing output tokens avoids the earlier zero/None edge case, and the TODO about Groq documents the current provider gap without affecting behavior.

No further changes needed here.

63-109: ****

The review comment incorrectly assumes honcho_llm_call() can return a response with output_tokens=None. Examining the actual implementation confirms the opposite: all non-streaming code paths in honcho_llm_call() return output_tokens as an int—either the real token count or 0 as a fallback. The type annotation correctly declares output_tokens: int (non-optional) for HonchoLLMCallResponse. Only streaming chunks, typed separately as HonchoLLMCallStreamChunk, have optional output_tokens (int | None), and the streaming path in dialectic_stream already guards correctly. The non-streaming path in dialectic_call (line 109) is safe and follows the same pattern used in deriver.py.

Likely an incorrect or invalid review comment.

src/prometheus.py (1)

84-126: Remove unfounded breaking-change concern; label schemas are correct

All calls to DERIVER_TOKENS_PROCESSED correctly include the component label (component="total" in deriver.py and summarizer.py, dynamic in tokens.py), and DIALECTIC_TOKENS_PROCESSED calls correctly provide only token_type. No evidence of a metric rename exists in the codebase—this appears to be a new metric addition, not a rename of an existing one. The operational warning about breaking dashboards/alerts applies only to actual renames; remove that concern from the review.

Likely an incorrect or invalid review comment.

src/utils/summarizer.py (6)

84-113: LGTM! Well-structured prompt builder.

The short_summary_prompt function is clear, properly parameterized, and uses cleandoc for clean formatting.

116-147: LGTM! Long summary prompt appropriately comprehensive.

The long_summary_prompt includes additional contextual factors (emotional state, personality traits, themes) that are well-suited for thorough summaries.

182-207: LGTM! Summary creation logic is sound.

The create_short_summary function correctly handles the case where no previous summary exists and appropriately calculates output word targets based on input tokens.

Note: A past review flagged that the default "no previous summary" text at line 198 isn't counted in token estimates. This is a known concern that applies to both short and long summary functions.

209-229: LGTM! Consistent with short summary approach.

The create_long_summary function follows the same pattern as create_short_summary with appropriate adjustments for long-form summaries.

366-396: Excellent fallback handling for token metrics!

The conditional token tracking (if not is_fallback) correctly prevents recording metrics when summaries fall back to the hardcoded 50-token placeholder. This addresses the concern from previous reviews about misleading metrics during fallback scenarios.

The granular component tracking (prompt, messages, previous_summary) provides good observability for token usage analysis.

Based on learnings about the project's token tracking approach.

427-486: LGTM! Return type change enables proper fallback detection.

Returning tuple[Summary, bool] allows the caller to distinguish between real LLM-generated summaries and fallback placeholders, which is essential for accurate token metrics. The is_fallback flag is correctly set only in the exception path.

src/deriver/deriver.py (5)

71-75: LGTM! Output-only tracking at call site is appropriate.

Moving input token tracking to the batch level (where all components are available) while keeping output tracking at the call site (where response.output_tokens is available) is a sensible separation of concerns.

106-122: Excellent granular token tracking for peer card generation.

The component breakdown (prompt, old_peer_card, new_observations) provides good observability. The approach follows the established pattern of using cached base prompt estimates plus actual content token counts.

Based on learnings about the deriver token estimation approach.

187-225: LGTM! Comprehensive token estimation across all input components.

The token estimation logic correctly accounts for:

Peer card tokens

Working representation tokens

Base prompt tokens (using cached estimator)

New conversation turns tokens

Session context tokens (added after retrieval)

The two-phase calculation (initial estimate + session context) is appropriate since session context depends on available token budget.

Based on learnings about the deriver token estimation approach.

239-248: Excellent granular component breakdown for representation task metrics.

The component tracking provides the detailed breakdown requested in earlier discussions (peer_card, working_representation, prompt, new_turns, session_context), enabling precise analysis of token usage patterns.

340-377: LGTM! Removal of estimated_input_tokens parameter simplifies the interface.

The refactoring to track input tokens at the batch level (where all components are known) rather than passing them through the call chain is cleaner. The critical_analysis_call can focus on its core responsibility of generating representations.

coderabbitai · 2025-11-19T21:44:17Z

src/deriver/prompts.py

 @cache
-def estimate_base_prompt_tokens() -> int:
-    """Estimate base prompt tokens by calling critical_analysis_prompt with empty values.
+def estimate_critical_analysis_prompt_tokens() -> int:
+    """Estimate critical analysis prompt tokens by calling critical_analysis_prompt with empty values.

    This value is cached since it only changes on redeploys when the prompt template changes.
    """

    try:
-        base_prompt = critical_analysis_prompt(
+        prompt = critical_analysis_prompt(
            peer_id="",
            peer_card=None,
            message_created_at=datetime.datetime.now(datetime.timezone.utc),
            working_representation=Representation(),
            history="",
            new_turns=[],
        )
-        return estimate_tokens(base_prompt)
+        return estimate_tokens(prompt)
    except Exception:
        # Return a conservative estimate if estimation fails
        return 500
+
+
+@cache
+def estimate_peer_card_prompt_tokens() -> int:
+    """Estimate peer card prompt tokens by calling peer_card_prompt with empty values.
+
+    This value is cached since it only changes on redeploys when the prompt template changes.
+    """
+
+    try:
+        prompt = peer_card_prompt(
+            old_peer_card=None,
+            new_observations="",
+        )
+        return estimate_tokens(prompt)
+    except Exception:
+        # Return a conservative estimate if estimation fails
+        return 400


🧹 Nitpick | 🔵 Trivial

New deriver prompt estimators match the base‑prompt pattern; maybe log or annotate broad exceptions

The two estimators follow the same “call prompt with empty values → estimate_tokens → cache” approach you already use elsewhere, so they should avoid double‑counting template overhead and stay stable across redeploys. Based on learnings.

Given these are heuristics, catching Exception and returning conservative fallbacks (500/400) is reasonable, but it does hide any real failures in prompt construction or tokenization. Two small tweaks you might consider:

Log the exception at debug/info once before returning the fallback so template issues don’t go completely silent.

If Ruff’s BLE001 is enforced, either narrow the exception type or explicitly mark this as intentional with # noqa: BLE001.

🧰 Tools

🪛 Ruff (0.14.5)

232-232: Do not catch blind exception: Exception

(BLE001)

250-250: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In src/deriver/prompts.py around lines 215 to 252, the two estimator functions swallow all Exceptions and return conservative defaults without any logging or lint-safety; modify the except blocks to catch Exception as e and log the error at debug (e.g., process_logger.debug or logging.getLogger(__name__).debug) including the exception info before returning the fallback, and if Ruff’s BLE001 would complain about broad excepts either narrow the exception type if possible or keep Exception but append a comment like "# noqa: BLE001" to indicate intentional broad handling.

coderabbitai · 2025-11-19T21:44:17Z

src/dialectic/prompts.py

+@cache
+def estimate_dialectic_prompt_tokens() -> int:
+    """Estimate base dialectic prompt tokens by calling dialectic_prompt with empty values.
+
+    This value is cached since it only changes on redeploys when the prompt template changes.
+    """
+    try:
+        prompt = dialectic_prompt(
+            query="",
+            working_representation="",
+            recent_conversation_history=None,
+            observer_peer_card=None,
+            observed_peer_card=None,
+            observer="",
+            observed="",
+        )
+
+        return estimate_tokens(prompt)
+    except Exception:
+        # Return a conservative estimate if estimation fails
+        return 750


🧹 Nitpick | 🔵 Trivial

Cached base dialectic prompt estimator fits the overall token-accounting pattern

estimate_dialectic_prompt_tokens() cleanly centralizes the “empty-input prompt → estimate_tokens → cached” pattern, which lines up with how the deriver handles base prompt overhead and keeps dialectic token estimates consistent between call/stream paths. The 750 fallback also matches the existing hardcoded system‑prompt allowance in chat(), so behavior stays conservative even on failure. Based on learnings.

As with the deriver estimators, the blind except Exception is acceptable for a best‑effort heuristic, but if Ruff’s BLE001 is enforced you may want to either:

Log the exception once (e.g., at debug) before returning 750, or

Add an explicit # noqa: BLE001 to mark the broad catch as intentional.

🧰 Tools

🪛 Ruff (0.14.5)

165-165: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents

In src/dialectic/prompts.py around lines 147 to 167 the function uses a bare except Exception which can trigger Ruff BLE001; fix by either (A) logging the caught exception once (e.g., import a module logger and call logger.debug("estimate_dialectic_prompt_tokens failed", exc_info=True) before returning 750) or (B) explicitly mark the broad catch as intentional by appending "# noqa: BLE001" to the except line; if you choose logging ensure the logger is imported/available in the module.

Rajat-Ahuja1997 added 3 commits November 18, 2025 17:01

feat: separate deriver input / output tokens

c8e7cb2

fix: track dialectic input / output tokens

70d5b8e

feat: track dialectic output tokens in streaming API

c7e97d5

coderabbitai bot reviewed Nov 18, 2025

View reviewed changes

src/dialectic/chat.py Show resolved Hide resolved

VVoruganti requested changes Nov 19, 2025

View reviewed changes

src/deriver/deriver.py Outdated Show resolved Hide resolved

src/utils/clients.py Show resolved Hide resolved

Rajat-Ahuja1997 added 3 commits November 19, 2025 10:33

fix: add component to metric ad change critical_analysis -> represent…

2bbe462

…ation

feat: track summary metrics in prometheus

0c9ea38

test: fix client streaming mocks

6ef8278

Rajat-Ahuja1997 requested review from VVoruganti November 19, 2025 16:33

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

fix: update tokenizer

e92bc51

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

src/schemas.py Show resolved Hide resolved

VVoruganti requested changes Nov 19, 2025

View reviewed changes

Rajat-Ahuja1997 added 2 commits November 19, 2025 16:10

fix: add helper method; instrument peer card

dfdc147

fix: count previous summary if not fallback

5c0234c

coderabbitai bot reviewed Nov 19, 2025

View reviewed changes

VVoruganti approved these changes Nov 20, 2025

View reviewed changes

VVoruganti merged commit 0f1e1de into main Nov 20, 2025
11 checks passed

VVoruganti deleted the rajat/DEV-1258 branch November 20, 2025 02:20

coderabbitai bot mentioned this pull request Dec 30, 2025

feat: agentic dreamer and agentic dialectic #309

Merged

This was referenced Jan 21, 2026

Telemetry Overhaul #333

Merged

switch OTEL metrics to prometheus #344

Merged

coderabbitai bot mentioned this pull request Feb 24, 2026

Add Stricter limits to Summary & Peer Card #400

Merged

coderabbitai bot mentioned this pull request Mar 14, 2026

fix: optimizes prefix-based caching #425

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metrics for deriver / dialectic input + output tokens#274

metrics for deriver / dialectic input + output tokens#274
VVoruganti merged 9 commits intomainfrom
rajat/DEV-1258

Rajat-Ahuja1997 commented Nov 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 18, 2025 •

edited

Loading

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Nov 19, 2025

coderabbitai bot Nov 19, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

VVoruganti left a comment

coderabbitai bot left a comment

coderabbitai bot Nov 19, 2025

coderabbitai bot Nov 19, 2025

Uh oh!

Labels

2 participants

Conversation

Rajat-Ahuja1997 commented Nov 18, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

coderabbitai bot commented Nov 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Pre-merge checks and finishing touches

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

VVoruganti left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

coderabbitai bot Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Labels

2 participants

Rajat-Ahuja1997 commented Nov 18, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 18, 2025 •

edited

Loading