feat: enhance dialectic agent with two-phase search->synth#338
feat: enhance dialectic agent with two-phase search->synth#338
Conversation
- Introduced a new two-phase cost calculation method for dialectic reasoning, allowing separate search and synthesis phases. - Updated the dialectic configuration to support synthesis model settings, enabling more flexible model usage. - Enhanced the `DialecticAgent` to handle both single-model and two-phase modes, improving cost estimation accuracy. - Added telemetry metrics for detailed analysis of search and synthesis phases, including input/output token tracking and tool call counts. - Updated related documentation and usage notes to reflect the new capabilities and configurations.
WalkthroughAdds optional two‑phase dialectic (search with tools + synthesis without tools), per‑level synthesis model config, two‑phase cost accounting and UI/reporting, telemetry schema/events for per‑phase metrics, message propagation through clients, and related validators. Changes
Sequence Diagram(s)sequenceDiagram
participant Client
participant SearchPhase as Search Phase
participant Tools
participant SynthesisPhase as Synthesis Phase
participant LLM
participant Telemetry
Client->>SearchPhase: Start request (level, input)
SearchPhase->>LLM: Call search model (messages, tool-capable)
LLM-->>SearchPhase: Search model response
SearchPhase->>Tools: Execute tool calls (if any)
Tools-->>SearchPhase: Tool results
SearchPhase->>SearchPhase: Aggregate conversation & metrics
alt SYNTHESIS configured
SearchPhase->>SynthesisPhase: Build synthesis messages (serialize search context)
SynthesisPhase->>LLM: Call synthesis model (no tools, include thinking tokens)
LLM-->>SynthesisPhase: Synthesis response
SynthesisPhase->>Telemetry: Log per-phase metrics (search + synthesis)
SynthesisPhase-->>Client: Final response (with per-phase metrics/costs)
else single-model
SearchPhase->>Telemetry: Log single-phase metrics
SearchPhase-->>Client: Final response
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/dialectic/core.py (1)
630-805: Ensure search max_tokens never dips below thinking_budget.In Line 706-711 and Line 898-902,
search_max_tokensfalls back to1024whenMAX_OUTPUT_TOKENSis unset. If a config setsTHINKING_BUDGET_TOKENS > 1024, the search call can violate provider constraints. Guard the fallback with a max against the thinking budget (apply in bothanswerandanswer_stream).🛠️ Proposed fix (apply to both answer + answer_stream)
- search_max_tokens = ( - level_settings.MAX_OUTPUT_TOKENS - if level_settings.MAX_OUTPUT_TOKENS is not None - else 1024 # Lower default for search - just tool call JSON - ) + search_max_tokens = level_settings.MAX_OUTPUT_TOKENS or 1024 + search_max_tokens = max( + search_max_tokens, level_settings.THINKING_BUDGET_TOKENS + )Also applies to: 811-1015
src/utils/clients.py (1)
490-512: Populatemessagesfor non-tool calls to avoid empty histories.With the new
messagesfield, responses from the non-tool path can still return an empty history, even whenmessageswere supplied. This breaks two‑phase flows that rely on the full conversation when no tools are called. Consider settingmessagesright after the decorated call when returning aHonchoLLMCallResponse.🔧 Proposed fix
@@ if not tools or not tool_executor: result: ( HonchoLLMCallResponse[Any] | AsyncIterator[HonchoLLMCallStreamChunk] ) = await decorated() + if isinstance(result, HonchoLLMCallResponse) and not result.messages: + result.messages = ( + messages.copy() + if messages is not None + else [{"role": "user", "content": prompt}] + ) if trace_name and isinstance(result, HonchoLLMCallResponse): log_reasoning_trace(
🤖 Fix all issues with AI agents
In `@scripts/dialectic_cost_calculator.py`:
- Around line 710-742: The printed output uses the multiplication character '×'
which triggers linting; update the two f-strings that build the output token
explanations (the lines referencing estimates.realistic_thinking_per_tool +
estimates.realistic_tool_call_output and the line using
estimates.max_output_tokens) to replace '×' with the ASCII letter 'x' (e.g., "
{max_result['max_iterations'] - 1} x {estimates.realistic_thinking_per_tool +
estimates.realistic_tool_call_output}" and "{max_result['max_iterations']} x
{max_result['thinking_tokens'] + estimates.max_output_tokens}"), keeping spacing
and formatting consistent; these changes are in the console.print calls that
reference max_result, estimates, and MODEL_PRICING.
In `@src/config.py`:
- Around line 302-345: Add a post-validation on SynthesisModelSettings that
ensures THINKING_BUDGET_TOKENS does not exceed the effective max output token
cap: compute effective_max = self.MAX_OUTPUT_TOKENS if set else the global
DIALECTIC.MAX_OUTPUT_TOKENS (or the project global constant used for dialectic
max), then raise a ValueError if THINKING_BUDGET_TOKENS > effective_max;
implement this as a `@model_validator`(mode="after") method (e.g.,
_validate_thinking_vs_output) referencing the SynthesisModelSettings class and
the fields THINKING_BUDGET_TOKENS and MAX_OUTPUT_TOKENS so per-level override or
global fallback is enforced.
In `@src/dialectic/core.py`:
- Around line 375-464: In _build_synthesis_messages, the user branch currently
only extracts list blocks with type "tool_result" and drops text blocks; update
the user message handling (same area where role == "user") to iterate list
content like the assistant branch, extracting blocks with type "text" and
appending "[USER]: <text>" and also handling "tool_result" blocks by robustly
stringifying their content (e.g., json.dumps for non-str payloads) so
images/attachments or dicts aren't lost; follow the patterns used in the
assistant/system handlers for Anthropic-style blocks and reuse the existing
cast/json approach to ensure stable string output.
- Updated the `_validate_token_budgets` method to ensure effective output token limits exceed thinking budgets for both levels and synthesis settings. - Improved error messages to provide clearer feedback on validation failures. - Introduced a new method `_stringify_tool_result_content` in DialecticAgent for consistent string representation of tool result payloads, accommodating various content types. - Enhanced handling of Anthropic-style content blocks in synthesis message construction to ensure proper formatting and representation.
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/dialectic/core.py`:
- Around line 771-789: The except block that wraps the honcho_llm_call assigning
to search_response currently swallows asyncio.CancelledError; modify the
exception handling so cancellation is re-raised before applying the fallback
(e.g., replace "except Exception as e:" with first checking for
asyncio.CancelledError and re-raising it, or use "except asyncio.CancelledError:
raise" followed by "except Exception as e:"), and apply the identical change to
the other try/except around the second honcho_llm_call (the block at lines
~962-981) so both search_response and the later LLM call do not swallow task
cancellations; keep the existing logger.warning/fallback logic for
non-cancellation exceptions.
🧹 Nitpick comments (1)
src/dialectic/core.py (1)
517-521: Use the same stringifier for role == "tool" payloads.This keeps formatting consistent and avoids raw reprs if tool content is non-string.
♻️ Suggested change
- elif role == "tool": - # OpenAI-style tool result - tool_id = msg.get("tool_call_id", "unknown") - content = msg.get("content", "") - search_context_parts.append(f"[TOOL RESULT ({tool_id})]: {content}") + elif role == "tool": + # OpenAI-style tool result + tool_id = msg.get("tool_call_id", "unknown") + content = self._stringify_tool_result_content(msg.get("content")) + if content: + search_context_parts.append( + f"[TOOL RESULT ({tool_id})]: {content}" + )
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/dialectic/core.py`:
- Around line 433-436: The docstring for _build_synthesis_messages incorrectly
documents a non-existent parameter synthesis_provider; update the docstring in
the _build_synthesis_messages definition to remove the synthesis_provider entry
(or replace it with any actual parameters used, e.g., search_messages) so the
Args section matches the function signature and only documents real parameters
like search_messages and any others present in the function.
🧹 Nitpick comments (3)
src/config.py (1)
325-345: Prefer domain-specific ValidationException for settings validation. The new validators raiseValueError; the repo guideline calls for custom exceptions insrc/exceptions.py, and this also addresses Ruff TRY003. Consider switching these new raises toValidationException(and import it once at the top). As per coding guidelines, ...♻️ Proposed refactor
+from src.exceptions import ValidationException @@ - raise ValueError( + raise ValidationException( "BACKUP_PROVIDER and BACKUP_MODEL must both be set or both be None" ) @@ - raise ValueError( + raise ValidationException( f"Effective SYNTHESIS.MAX_OUTPUT_TOKENS ({effective_synthesis_max_output_tokens}) must be greater than SYNTHESIS.THINKING_BUDGET_TOKENS ({synthesis_settings.THINKING_BUDGET_TOKENS}) for level '{level}'" )Also applies to: 473-505
src/dialectic/core.py (2)
396-421: Narrow the exception handler to expected serialization errors.
CatchingExceptionhere will also mask unrelated failures. Limit the handler to knownjson.dumpserrors so genuine bugs surface.As per coding guidelines, use explicit exception types.♻️ Suggested change
- except Exception: - return str(cast(object, content)) + except (TypeError, ValueError): + return str(cast(object, content))
517-521: Normalize OpenAI tool results with the shared stringifier.
Tool messages can carry dict/list payloads; using_stringify_tool_result_contentkeeps synthesis context consistent and readable.♻️ Suggested change
- content = msg.get("content", "") - search_context_parts.append(f"[TOOL RESULT ({tool_id})]: {content}") + content = self._stringify_tool_result_content(msg.get("content")) + if content: + search_context_parts.append( + f"[TOOL RESULT ({tool_id})]: {content}" + )
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/dialectic/core.py`:
- Around line 376-422: The except Exception in _stringify_tool_result_content
silently swallows errors; update the handler to log the caught exception before
returning the fallback string so it matches the class pattern (see other
handlers that call logger.warning). Specifically, inside
_stringify_tool_result_content's except block call logger.warning with a
descriptive message and include the exception info (e.g., exception instance or
exc_info) and then return str(cast(object, content)) as before.
🧹 Nitpick comments (1)
src/dialectic/core.py (1)
517-521: Normalize OpenAI tool result content before embedding.
Use_stringify_tool_result_contentso dict/list outputs render consistently with Anthropic tool results.♻️ Proposed tweak
- tool_id = msg.get("tool_call_id", "unknown") - content = msg.get("content", "") - search_context_parts.append(f"[TOOL RESULT ({tool_id})]: {content}") + tool_id = msg.get("tool_call_id", "unknown") + content = self._stringify_tool_result_content(msg.get("content")) + if content: + search_context_parts.append( + f"[TOOL RESULT ({tool_id})]: {content}" + )
DialecticAgentto handle both single-model and two-phase modes, improving cost estimation accuracy.Benchmarking
Two-phase dialectic at medium config scores 0.644 on BEAM 100K -- results attached
beam_100K_20260123_105206.json
Summary by CodeRabbit
New Features
Improvements
Tests
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.