Skip to content

feat: enhance dialectic agent with two-phase search->synth#338

Open
dr-frmr wants to merge 6 commits intomainfrom
ben/dialectic-two-phase-cost
Open

feat: enhance dialectic agent with two-phase search->synth#338
dr-frmr wants to merge 6 commits intomainfrom
ben/dialectic-two-phase-cost

Conversation

@dr-frmr
Copy link
Copy Markdown
Contributor

@dr-frmr dr-frmr commented Jan 22, 2026

  • Introduced a new two-phase cost calculation method for dialectic reasoning, allowing separate search and synthesis phases.
  • Updated the dialectic configuration to support synthesis model settings, enabling more flexible model usage.
  • Enhanced the DialecticAgent to handle both single-model and two-phase modes, improving cost estimation accuracy.
  • Added telemetry metrics for detailed analysis of search and synthesis phases, including input/output token tracking and tool call counts.
  • Updated related documentation and usage notes to reflect the new capabilities and configurations.

Benchmarking

Two-phase dialectic at medium config scores 0.644 on BEAM 100K -- results attached

beam_100K_20260123_105206.json

Summary by CodeRabbit

  • New Features

    • Two-phase dialectic workflow (search + synthesis) with per-phase cost breakdown and mode display.
    • Per-level synthesis model configuration to enable two-model reasoning.
  • Improvements

    • Cost reporting shows realistic vs worst-case, includes serialization overhead, and separates search vs synthesis costs.
    • Conversation history preserved across phases and streaming; tables show Mode, Search Model, and Synthesis Model.
    • Telemetry emits per-phase metrics; event schema version bumped.
  • Tests

    • Updated telemetry test to expect the new schema version.
  • Documentation

    • Minor formatting/EOF adjustments.

✏️ Tip: You can customize this high-level summary in your review settings.

- Introduced a new two-phase cost calculation method for dialectic reasoning, allowing separate search and synthesis phases.
- Updated the dialectic configuration to support synthesis model settings, enabling more flexible model usage.
- Enhanced the `DialecticAgent` to handle both single-model and two-phase modes, improving cost estimation accuracy.
- Added telemetry metrics for detailed analysis of search and synthesis phases, including input/output token tracking and tool call counts.
- Updated related documentation and usage notes to reflect the new capabilities and configurations.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 22, 2026

Walkthrough

Adds optional two‑phase dialectic (search with tools + synthesis without tools), per‑level synthesis model config, two‑phase cost accounting and UI/reporting, telemetry schema/events for per‑phase metrics, message propagation through clients, and related validators.

Changes

Cohort / File(s) Summary
Configuration & Models
src/config.py
Adds SynthesisModelSettings, new SYNTHESIS field on DialecticLevelSettings, validators for backups and token budgets, and updates level defaults (providers/models, thinking budgets, max tool iterations).
Two‑Phase Execution Core
src/dialectic/core.py
Implements search (tool-enabled) + synthesis (tool-less) flows, adds _stringify_tool_result_content, _build_synthesis_messages, _log_two_phase_metrics, two‑phase streaming/fallbacks, and emits per‑phase telemetry.
Telemetry Schema & Exports
src/telemetry/events/dialectic.py, src/telemetry/events/__init__.py, tests/telemetry/test_events.py
Adds DialecticPhaseMetrics, bumps DialecticCompletedEvent schema v1→v2 with two_phase_mode and phases list; updates exports and tests.
Client API & Providers
src/utils/clients.py
Threads messages through honcho_llm_call/streaming, adds messages to response/stream types, and registers/validates synthesis providers and backups per level.
Cost Calculator & Reporting
scripts/dialectic_cost_calculator.py
Adds two‑phase cost pathway (calculate_two_phase_cost, dispatcher calculate_level_cost, calculate_single_model_cost), TEXT_SERIALIZATION_OVERHEAD = 1.20, new model pricing entries, and UI/table columns for Mode/Search/Synthesis costs.
Docs
docs/v3/documentation/introduction/overview.mdx
Minor whitespace/formatting change (no semantic change).

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant SearchPhase as Search Phase
    participant Tools
    participant SynthesisPhase as Synthesis Phase
    participant LLM
    participant Telemetry

    Client->>SearchPhase: Start request (level, input)
    SearchPhase->>LLM: Call search model (messages, tool-capable)
    LLM-->>SearchPhase: Search model response
    SearchPhase->>Tools: Execute tool calls (if any)
    Tools-->>SearchPhase: Tool results
    SearchPhase->>SearchPhase: Aggregate conversation & metrics

    alt SYNTHESIS configured
        SearchPhase->>SynthesisPhase: Build synthesis messages (serialize search context)
        SynthesisPhase->>LLM: Call synthesis model (no tools, include thinking tokens)
        LLM-->>SynthesisPhase: Synthesis response
        SynthesisPhase->>Telemetry: Log per-phase metrics (search + synthesis)
        SynthesisPhase-->>Client: Final response (with per-phase metrics/costs)
    else single-model
        SearchPhase->>Telemetry: Log single-phase metrics
        SearchPhase-->>Client: Final response
    end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested reviewers

  • VVoruganti
  • Rajat-Ahuja1997

Poem

🐇
I nibbled tokens, stitched a two‑step art,
Search then synth — two phases, one heart,
Metrics hummed, costs tallied in light,
A rabbit's hop turned code into bright,
Hop—done! 🥕

🚥 Pre-merge checks | ✅ 3
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'feat: enhance dialectic agent with two-phase search->synth' accurately describes the main change: introducing two-phase cost calculation for dialectic reasoning with separate search and synthesis phases.
Docstring Coverage ✅ Passed Docstring coverage is 86.96% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/dialectic/core.py (1)

630-805: Ensure search max_tokens never dips below thinking_budget.

In Line 706-711 and Line 898-902, search_max_tokens falls back to 1024 when MAX_OUTPUT_TOKENS is unset. If a config sets THINKING_BUDGET_TOKENS > 1024, the search call can violate provider constraints. Guard the fallback with a max against the thinking budget (apply in both answer and answer_stream).

🛠️ Proposed fix (apply to both answer + answer_stream)
-        search_max_tokens = (
-            level_settings.MAX_OUTPUT_TOKENS
-            if level_settings.MAX_OUTPUT_TOKENS is not None
-            else 1024  # Lower default for search - just tool call JSON
-        )
+        search_max_tokens = level_settings.MAX_OUTPUT_TOKENS or 1024
+        search_max_tokens = max(
+            search_max_tokens, level_settings.THINKING_BUDGET_TOKENS
+        )

Also applies to: 811-1015

src/utils/clients.py (1)

490-512: Populate messages for non-tool calls to avoid empty histories.

With the new messages field, responses from the non-tool path can still return an empty history, even when messages were supplied. This breaks two‑phase flows that rely on the full conversation when no tools are called. Consider setting messages right after the decorated call when returning a HonchoLLMCallResponse.

🔧 Proposed fix
@@
     if not tools or not tool_executor:
         result: (
             HonchoLLMCallResponse[Any] | AsyncIterator[HonchoLLMCallStreamChunk]
         ) = await decorated()
+        if isinstance(result, HonchoLLMCallResponse) and not result.messages:
+            result.messages = (
+                messages.copy()
+                if messages is not None
+                else [{"role": "user", "content": prompt}]
+            )
         if trace_name and isinstance(result, HonchoLLMCallResponse):
             log_reasoning_trace(
🤖 Fix all issues with AI agents
In `@scripts/dialectic_cost_calculator.py`:
- Around line 710-742: The printed output uses the multiplication character '×'
which triggers linting; update the two f-strings that build the output token
explanations (the lines referencing estimates.realistic_thinking_per_tool +
estimates.realistic_tool_call_output and the line using
estimates.max_output_tokens) to replace '×' with the ASCII letter 'x' (e.g., "
{max_result['max_iterations'] - 1} x {estimates.realistic_thinking_per_tool +
estimates.realistic_tool_call_output}" and "{max_result['max_iterations']} x
{max_result['thinking_tokens'] + estimates.max_output_tokens}"), keeping spacing
and formatting consistent; these changes are in the console.print calls that
reference max_result, estimates, and MODEL_PRICING.

In `@src/config.py`:
- Around line 302-345: Add a post-validation on SynthesisModelSettings that
ensures THINKING_BUDGET_TOKENS does not exceed the effective max output token
cap: compute effective_max = self.MAX_OUTPUT_TOKENS if set else the global
DIALECTIC.MAX_OUTPUT_TOKENS (or the project global constant used for dialectic
max), then raise a ValueError if THINKING_BUDGET_TOKENS > effective_max;
implement this as a `@model_validator`(mode="after") method (e.g.,
_validate_thinking_vs_output) referencing the SynthesisModelSettings class and
the fields THINKING_BUDGET_TOKENS and MAX_OUTPUT_TOKENS so per-level override or
global fallback is enforced.

In `@src/dialectic/core.py`:
- Around line 375-464: In _build_synthesis_messages, the user branch currently
only extracts list blocks with type "tool_result" and drops text blocks; update
the user message handling (same area where role == "user") to iterate list
content like the assistant branch, extracting blocks with type "text" and
appending "[USER]: <text>" and also handling "tool_result" blocks by robustly
stringifying their content (e.g., json.dumps for non-str payloads) so
images/attachments or dicts aren't lost; follow the patterns used in the
assistant/system handlers for Anthropic-style blocks and reuse the existing
cast/json approach to ensure stable string output.
- Updated the `_validate_token_budgets` method to ensure effective output token limits exceed thinking budgets for both levels and synthesis settings.
- Improved error messages to provide clearer feedback on validation failures.
- Introduced a new method `_stringify_tool_result_content` in DialecticAgent for consistent string representation of tool result payloads, accommodating various content types.
- Enhanced handling of Anthropic-style content blocks in synthesis message construction to ensure proper formatting and representation.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/dialectic/core.py`:
- Around line 771-789: The except block that wraps the honcho_llm_call assigning
to search_response currently swallows asyncio.CancelledError; modify the
exception handling so cancellation is re-raised before applying the fallback
(e.g., replace "except Exception as e:" with first checking for
asyncio.CancelledError and re-raising it, or use "except asyncio.CancelledError:
raise" followed by "except Exception as e:"), and apply the identical change to
the other try/except around the second honcho_llm_call (the block at lines
~962-981) so both search_response and the later LLM call do not swallow task
cancellations; keep the existing logger.warning/fallback logic for
non-cancellation exceptions.
🧹 Nitpick comments (1)
src/dialectic/core.py (1)

517-521: Use the same stringifier for role == "tool" payloads.

This keeps formatting consistent and avoids raw reprs if tool content is non-string.

♻️ Suggested change
-            elif role == "tool":
-                # OpenAI-style tool result
-                tool_id = msg.get("tool_call_id", "unknown")
-                content = msg.get("content", "")
-                search_context_parts.append(f"[TOOL RESULT ({tool_id})]: {content}")
+            elif role == "tool":
+                # OpenAI-style tool result
+                tool_id = msg.get("tool_call_id", "unknown")
+                content = self._stringify_tool_result_content(msg.get("content"))
+                if content:
+                    search_context_parts.append(
+                        f"[TOOL RESULT ({tool_id})]: {content}"
+                    )
@dr-frmr dr-frmr changed the title feat: enhance dialectic cost calculator with two-phase support Jan 23, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/dialectic/core.py`:
- Around line 433-436: The docstring for _build_synthesis_messages incorrectly
documents a non-existent parameter synthesis_provider; update the docstring in
the _build_synthesis_messages definition to remove the synthesis_provider entry
(or replace it with any actual parameters used, e.g., search_messages) so the
Args section matches the function signature and only documents real parameters
like search_messages and any others present in the function.
🧹 Nitpick comments (3)
src/config.py (1)

325-345: Prefer domain-specific ValidationException for settings validation. The new validators raise ValueError; the repo guideline calls for custom exceptions in src/exceptions.py, and this also addresses Ruff TRY003. Consider switching these new raises to ValidationException (and import it once at the top). As per coding guidelines, ...

♻️ Proposed refactor
+from src.exceptions import ValidationException
@@
-            raise ValueError(
+            raise ValidationException(
                 "BACKUP_PROVIDER and BACKUP_MODEL must both be set or both be None"
             )
@@
-                raise ValueError(
+                raise ValidationException(
                     f"Effective SYNTHESIS.MAX_OUTPUT_TOKENS ({effective_synthesis_max_output_tokens}) must be greater than SYNTHESIS.THINKING_BUDGET_TOKENS ({synthesis_settings.THINKING_BUDGET_TOKENS}) for level '{level}'"
                 )

Also applies to: 473-505

src/dialectic/core.py (2)

396-421: Narrow the exception handler to expected serialization errors.
Catching Exception here will also mask unrelated failures. Limit the handler to known json.dumps errors so genuine bugs surface.

♻️ Suggested change
-        except Exception:
-            return str(cast(object, content))
+        except (TypeError, ValueError):
+            return str(cast(object, content))
As per coding guidelines, use explicit exception types.

517-521: Normalize OpenAI tool results with the shared stringifier.
Tool messages can carry dict/list payloads; using _stringify_tool_result_content keeps synthesis context consistent and readable.

♻️ Suggested change
-                content = msg.get("content", "")
-                search_context_parts.append(f"[TOOL RESULT ({tool_id})]: {content}")
+                content = self._stringify_tool_result_content(msg.get("content"))
+                if content:
+                    search_context_parts.append(
+                        f"[TOOL RESULT ({tool_id})]: {content}"
+                    )
@plastic-labs plastic-labs deleted a comment Jan 28, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/dialectic/core.py`:
- Around line 376-422: The except Exception in _stringify_tool_result_content
silently swallows errors; update the handler to log the caught exception before
returning the fallback string so it matches the class pattern (see other
handlers that call logger.warning). Specifically, inside
_stringify_tool_result_content's except block call logger.warning with a
descriptive message and include the exception info (e.g., exception instance or
exc_info) and then return str(cast(object, content)) as before.
🧹 Nitpick comments (1)
src/dialectic/core.py (1)

517-521: Normalize OpenAI tool result content before embedding.
Use _stringify_tool_result_content so dict/list outputs render consistently with Anthropic tool results.

♻️ Proposed tweak
-                tool_id = msg.get("tool_call_id", "unknown")
-                content = msg.get("content", "")
-                search_context_parts.append(f"[TOOL RESULT ({tool_id})]: {content}")
+                tool_id = msg.get("tool_call_id", "unknown")
+                content = self._stringify_tool_result_content(msg.get("content"))
+                if content:
+                    search_context_parts.append(
+                        f"[TOOL RESULT ({tool_id})]: {content}"
+                    )
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant