Skip to content

Refactor clients.py to add modern features and more flexible configuration#459

Open
VVoruganti wants to merge 11 commits intomainfrom
vineeth/dev-1454-full
Open

Refactor clients.py to add modern features and more flexible configuration#459
VVoruganti wants to merge 11 commits intomainfrom
vineeth/dev-1454-full

Conversation

@VVoruganti
Copy link
Copy Markdown
Collaborator

@VVoruganti VVoruganti commented Mar 28, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Unified LLM configuration structure with provider-agnostic model settings.
    • Added support for advanced LLM features: thinking tokens, structured outputs, and prompt caching across multiple providers.
    • Introduced reasoning effort configuration for compatible models.
  • Configuration

    • Updated environment variables and TOML configuration format for LLM settings.
    • Removed legacy OpenAI-compatible and vLLM provider options.
  • Tests

    • Added live LLM integration tests and expanded unit test coverage.
VVoruganti and others added 3 commits March 26, 2026 16:24
… and Gemini thinking budget support

LengthFinishReasonError from OpenAI-compatible providers (custom, openai, groq) was crashing the deriver
with 14k+ occurrences in production. The vLLM path already had repair logic but it was gated on
provider=="vllm", unreachable when routing through litellm as a custom provider.

- Extract shared _repair_response_model_json() helper for all providers
- Catch LengthFinishReasonError in OpenAI/custom parse() path and repair truncated JSON
- Add repair fallback to Anthropic and Gemini response_model paths
- Add repair fallback to Groq response_model path
- Pass thinking_budget_tokens to Gemini 2.5 models via thinking_config
- Add 14 tests covering repair paths for all providers and Gemini thinking budget

Fixes HONCHO-YC

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 28, 2026

Walkthrough

The PR introduces a unified LLM provider backend architecture by replacing flat provider/model settings with nested MODEL_CONFIG structure separating transport selection from model ID; implementing a ProviderBackend protocol with Anthropic/OpenAI/Gemini/Groq implementations; refactoring configuration to use nested keys; adding credential/capability resolution; and updating all LLM call sites to use the new model_config parameter.

Changes

Cohort / File(s) Summary
Configuration Files
.env.template, config.toml.example, docs/v3/contributing/configuration.mdx
Restructured LLM configuration from flat PROVIDER/MODEL/BACKUP_PROVIDER fields to nested MODEL_CONFIG__TRANSPORT/MODEL_CONFIG__MODEL/MODEL_CONFIG__FALLBACK_TRANSPORT keys; replaced OpenAI-compatible/vLLM settings with transport-specific base URL defaults (LLM_*_BASE_URL); updated all DERIVER/DIALECTIC/SUMMARY/DREAM feature configs and specialist models to use nested structure.
Core Configuration Schema
src/config.py
Added Pydantic models for ModelTransport, ModelOverrideSettings, ConfiguredModelSettings, and ModelConfig; implemented legacy normalization for "provider/model" strings and thinking_effort aliasing; refactored DeriverSettings, DialecticLevelSettings, SummarySettings, and DreamSettings to use nested MODEL_CONFIG instead of flat provider/model/backup fields; added resolve_model_config() for runtime resolution.
LLM Backend Protocol & Factory
src/llm/backend.py, src/llm/__init__.py
Introduced ProviderBackend protocol defining async complete() and stream() methods; added normalized result types: ToolCallResult, CompletionResult, StreamChunk; implemented get_backend(config: ModelConfig) factory that resolves credentials and selects provider-specific backend; exported backend construction and credential/capability APIs.
Provider Backends
src/llm/backends/anthropic.py, src/llm/backends/openai.py, src/llm/backends/gemini.py, src/llm/backends/groq.py, src/llm/backends/__init__.py
Implemented four provider-specific ProviderBackend subclasses with native SDK integration: Anthropic handles thinking blocks/caching control; OpenAI manages structured output parsing and reasoning content; Gemini supports cached-content caching and function-call parsing; Groq wraps OpenAI-compatible API; each normalizes responses into common CompletionResult/StreamChunk format.
LLM Utilities
src/llm/credentials.py, src/llm/capabilities.py, src/llm/history_adapters.py, src/llm/caching.py
Added credential resolution from ModelConfig with transport-specific defaults; defined ModelCapabilities for transport-specific feature metadata; implemented provider-specific HistoryAdapter for message formatting (tool calls, thinking, reasoning); added prompt caching primitives (PromptCachePolicy, cache key building, in-memory Gemini cache).
Request Execution Layer
src/llm/request_builder.py, src/llm/structured_output.py
Added execute_completion() and execute_stream() functions that build requests via resolved backend, merging model config parameters (max_output_tokens, thinking budget, temperature) with caller-provided arguments; implemented structured output validation/repair pipeline with failure policies and JSON repair fallback.
Feature Integration
src/deriver/deriver.py, src/dialectic/core.py, src/dreamer/specialists.py
Updated LLM call sites to pass model_config parameter instead of llm_settings; added _get_*_model_config() helpers to extract feature-specific configs; refactored specialist model selection from get_model()get_model_config() returning ConfiguredModelSettings.
LLM Call Site Updates
src/utils/clients.py, src/utils/summarizer.py, src/telemetry/reasoning_traces.py
Replaced llm_settings parameter with model_config: ModelConfig | ConfiguredModelSettings in honcho_llm_call; refactored backend selection, credential resolution, and provider client management to be per-call instead of pre-initialized; integrated history adapters and backend abstraction; updated reasoning trace logging to derive provider from transport.
Cleanup
src/embedding_client.py, src/utils/types.py, pyproject.toml
Removed openrouter provider initialization and legacy custom/vllm from SupportedProviders; added pytest markers for live LLM and provider-specific tests.
Test Infrastructure
tests/conftest.py, tests/__init__.py, pyproject.toml
Added --live-llm pytest CLI flag and auto-skip mechanism; introduced clear_live_gemini_cache_store fixture; extended mock blocklist for LLM-specific tests; created tests/unit/llm/conftest.py with FakeBackend test double.
Live LLM Tests
tests/live_llm/*
Added comprehensive live integration tests for Anthropic, OpenAI, and Gemini covering structured output, prompt caching, thinking blocks, tool replay, and reasoning capabilities; introduced model_matrix.py for typed model specification and conftest.py with backend/config construction helpers.
Unit Tests
tests/unit/llm/test_*.py, tests/*/test_model_config_usage.py, tests/utils/test_*.py
Added 900+ lines of unit tests covering: backend implementations (request/response normalization), credential/capability resolution, history adapters, request building, structured output repair, model config validation, and feature-level integration (deriver/dialectic/dreamer/summarizer model config usage).

Sequence Diagram(s)

sequenceDiagram
    actor Caller as Caller (e.g., Deriver)
    participant HLC as honcho_llm_call
    participant RB as request_builder.execute_completion
    participant CR as credentials.resolve_credentials
    participant CB as llm.__init__.get_backend
    participant PB as ProviderBackend<br/>(Anthropic/OpenAI/Gemini/Groq)
    participant API as LLM Provider API

    Caller->>HLC: honcho_llm_call(model_config, messages, ...)
    HLC->>RB: execute_completion(backend, config, ...)
    RB->>CR: resolve_credentials(model_config)
    CR-->>RB: {api_key, api_base}
    RB->>CB: get_backend(config)
    CB->>CB: Determine transport (anthropic/openai/gemini/groq)
    CB-->>RB: ProviderBackend instance
    RB->>PB: complete(model=..., messages=..., api_key=..., api_base=...)
    PB->>API: Native SDK call (e.g., messages.create, generate_content)
    API-->>PB: Raw provider response
    PB->>PB: _normalize_response(): Extract content,<br/>tool_calls, thinking, reasoning
    PB-->>RB: CompletionResult(content, tool_calls, ...)
    RB-->>HLC: CompletionResult
    HLC-->>Caller: HonchoLLMCallResponse
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Suggested reviewers

  • Rajat-Ahuja1997

Poem

🐰 Whiskers twitch with glee,
Backends now unified—no more provider scatter!
Transport and model, neatly nested, oh what a sight,
One protocol to rule them all: Anthropic, OpenAI bright,
Gemini, Groq, and more, dancing in the light!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.32% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main refactoring work in this PR: restructuring configuration from a per-provider/model approach to a nested MODEL_CONFIG schema and adding new LLM provider backends and capabilities.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch vineeth/dev-1454-full

Comment @coderabbitai help to get the list of available commands and usage tips.

@VVoruganti VVoruganti marked this pull request as ready for review March 30, 2026 15:13
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 16

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
src/embedding_client.py (1)

29-51: ⚠️ Potential issue | 🟠 Major

Reject unsupported embedding providers instead of silently routing them to OpenAI.

Lines 32-51 and 365-369 currently treat every non-gemini value as OpenAI. A typo or a lingering legacy provider value will therefore instantiate the OpenAI client with the wrong credentials/base URL and fail much later than the actual config mistake.

🐛 Suggested fix
+from src.exceptions import ValidationException
+
 from .config import settings
-        else:  # openai
+        elif self.provider == "openai":
             if api_key is None:
                 api_key = settings.LLM.OPENAI_API_KEY
             if not api_key:
                 raise ValueError("OpenAI API key is required")
             self.client = AsyncOpenAI(api_key=api_key)
             self.model = "text-embedding-3-small"
             self.max_embedding_tokens = settings.MAX_EMBEDDING_TOKENS
             self.max_batch_size = 2048  # OpenAI batch limit
+        else:
+            raise ValidationException(
+                f"Unsupported embedding provider: {self.provider}"
+            )
-                    if provider == "gemini":
-                        api_key = settings.LLM.GEMINI_API_KEY
-                    else:
-                        api_key = settings.LLM.OPENAI_API_KEY
+                    if provider == "gemini":
+                        api_key = settings.LLM.GEMINI_API_KEY
+                    elif provider == "openai":
+                        api_key = settings.LLM.OPENAI_API_KEY
+                    else:
+                        raise ValidationException(
+                            f"Unsupported embedding provider: {provider}"
+                        )

As per coding guidelines, "Use explicit error handling with appropriate exception types" and "Use custom exceptions defined in src/exceptions.py with specific exception types (ResourceNotFoundException, ValidationException, etc.)".

Also applies to: 362-373

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/embedding_client.py` around lines 29 - 51, The constructor (__init__)
currently treats any non-"gemini" provider as OpenAI, which hides configuration
typos; change __init__ to explicitly accept only supported providers ("gemini"
and "openai") and raise a ValidationException (from src.exceptions import
ValidationException) for any other value of self.provider, ensuring you do not
silently instantiate AsyncOpenAI for unknown providers; keep the existing gemini
and openai branches (symbols: self.provider, genai.Client, AsyncOpenAI,
self.model, self.client, self.max_embedding_tokens, self.max_batch_size) but add
the explicit else that raises ValidationException with a clear message. Also
apply the same explicit-provider validation and exception raising to the other
provider-handling block in this file (the second location handling provider
around lines ~362-373) so both paths reject unsupported providers instead of
defaulting to OpenAI.
src/dreamer/specialists.py (1)

207-228: ⚠️ Potential issue | 🟠 Major

Configured specialist token caps are still a no-op here.

This path always passes an explicit max_tokens=self.get_max_tokens(), so the new DREAM_MODEL_CONFIG.max_output_tokens / specialist inherited overrides never influence the actual LLM request.

🔧 Minimal fix
         model_config = self.get_model_config()
+        max_tokens = model_config.max_output_tokens or self.get_max_tokens()
 
         # Track iterations via callback
         iteration_count = 0
@@
         response: HonchoLLMCallResponse[str] = await honcho_llm_call(
             model_config=model_config,
             prompt="",  # Ignored since we pass messages
-            max_tokens=self.get_max_tokens(),
+            max_tokens=max_tokens,
             tools=self.get_tools(peer_card_enabled=peer_card_enabled),
             tool_choice=None,
             tool_executor=tool_executor,
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/dreamer/specialists.py` around lines 207 - 228, The code always passes
max_tokens=self.get_max_tokens() into honcho_llm_call, which ignores per-model
or specialist overrides (e.g. DREAM_MODEL_CONFIG.max_output_tokens); update the
honcho_llm_call invocation in this function so max_tokens is derived from the
resolved model_config (use model_config.max_output_tokens when set, falling back
to self.get_max_tokens()), thereby ensuring get_model_config() and any
specialist overrides actually control the LLM request.
🧹 Nitpick comments (10)
tests/unit/llm/test_backends/test_openai.py (1)

65-68: Unusual return type annotation formatting.

The function return type annotation is split across multiple lines in an unconventional way. While technically valid Python, it's non-standard.

-async def test_openai_backend_passes_thinking_effort_through_for_non_gpt5_models() -> (
-    None
-):
+async def test_openai_backend_passes_thinking_effort_through_for_non_gpt5_models() -> None:
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/llm/test_backends/test_openai.py` around lines 65 - 68, The test
function test_openai_backend_passes_thinking_effort_through_for_non_gpt5_models
has its return type annotation split across lines; consolidate the annotation so
the signature is on a single line (e.g., move "-> None" to the same line as the
def) to match standard Python formatting and improve readability.
tests/unit/llm/conftest.py (1)

9-26: Clean test double implementation.

The FakeBackend provides a useful test helper for capturing call arguments and serving canned responses. One consideration: if tests don't queue enough responses, next(self._responses) will raise StopIteration, which will bubble up as-is. This is generally acceptable for tests (failing clearly), but a more descriptive error could aid debugging.

💡 Optional: wrap with descriptive error
     async def complete(self, **kwargs: Any) -> CompletionResult:
         self.calls.append(kwargs)
-        return next(self._responses)
+        try:
+            return next(self._responses)
+        except StopIteration:
+            raise RuntimeError("FakeBackend: no more queued responses") from None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/llm/conftest.py` around lines 9 - 26, The FakeBackend test double
currently calls next(self._responses) in both complete and stream which will
raise StopIteration with no context if tests run out of canned responses; update
FakeBackend (constructor: _responses, methods: complete and stream) to catch
StopIteration around next(self._responses) and raise a more descriptive error
(e.g. RuntimeError or AssertionError) that includes the method name and a hint
that no more fake responses were queued, so test failures clearly indicate
missing canned responses.
src/llm/__init__.py (1)

55-80: Don't memoize override clients by raw credential tuples.

@cache keeps every (base_url, api_key) pair for the full process lifetime. If these overrides are test-, tenant-, or rotation-specific, this turns into an unbounded client cache and keeps secret-bearing keys resident in memory longer than necessary.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/__init__.py` around lines 55 - 80, The override client factory
functions (_get_openai_override_client, _get_anthropic_override_client,
_get_groq_override_client, _get_gemini_override_client) must not be memoized by
raw (base_url, api_key) tuples because that caches secret-bearing keys for the
process lifetime; remove the `@cache` decorator from those functions and make them
return a fresh client per call (or replace with a bounded/ephemeral cache if you
truly need caching), and update callers to create/use/close clients per
operation rather than relying on a global persistent cache.
src/utils/clients.py (1)

616-638: Groq provider uses OpenAIHistoryAdapter by default.

_history_adapter_for_provider returns OpenAIHistoryAdapter for both "openai" and "groq" providers (via the fallback return). This is likely intentional since Groq uses OpenAI-compatible APIs, but consider adding an explicit case for clarity.

💡 Explicit groq handling
 def _history_adapter_for_provider(
     provider: SupportedProviders,
 ) -> AnthropicHistoryAdapter | GeminiHistoryAdapter | OpenAIHistoryAdapter:
     if provider == "anthropic":
         return AnthropicHistoryAdapter()
     if provider == "google":
         return GeminiHistoryAdapter()
+    # Groq uses OpenAI-compatible message format
     return OpenAIHistoryAdapter()
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/clients.py` around lines 616 - 638, The
_history_adapter_for_provider function currently falls through to
OpenAIHistoryAdapter for both "openai" and "groq"; make the Groq handling
explicit by adding an if branch for provider == "groq" that returns
OpenAIHistoryAdapter(), keeping the existing return type and behavior; update
the function body in _history_adapter_for_provider to include the explicit
"groq" case (so callers like _history_adapter_for_provider("groq") clearly
return OpenAIHistoryAdapter).
src/llm/request_builder.py (1)

102-152: Consider extracting shared parameter-building logic to reduce duplication.

execute_stream duplicates the entire parameter-building logic from execute_completion. This could be extracted into a shared helper.

♻️ Suggested extraction (conceptual)
def _build_request_params(
    config: ModelConfig,
    max_tokens: int,
    cache_policy: PromptCachePolicy | None,
    extra_params: dict[str, Any] | None,
) -> tuple[ModelCapabilities, dict[str, Any], int, int | None, dict[str, Any]]:
    capabilities = get_model_capabilities(config)
    credentials = resolve_credentials(config)
    thinking_budget_tokens = (
        config.thinking_budget_tokens
        if config.thinking_budget_tokens is not None
        and config.thinking_budget_tokens > 0
        else None
    )
    requested_output_tokens = config.max_output_tokens or max_tokens
    effective_max_tokens = _adjust_max_tokens_for_explicit_budget(
        capabilities, requested_output_tokens, thinking_budget_tokens
    )
    merged_extra_params = {
        **_build_config_extra_params(config),
        **(extra_params or {}),
    }
    if cache_policy is not None:
        merged_extra_params["cache_policy"] = cache_policy
    return capabilities, credentials, effective_max_tokens, thinking_budget_tokens, merged_extra_params
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/request_builder.py` around lines 102 - 152, execute_stream duplicates
the parameter-building logic from execute_completion; extract that shared logic
into a helper (e.g., _build_request_params) that returns capabilities,
credentials, effective_max_tokens, thinking_budget_tokens, and
merged_extra_params. Replace the duplicated block in execute_stream with a call
to the new helper and pass returned values into backend.stream; do the same
refactor in execute_completion so both functions reuse the helper and avoid
divergent logic (refer to execute_stream, execute_completion,
_adjust_max_tokens_for_explicit_budget, _build_config_extra_params, and
resolve_credentials).
src/llm/backends/openai.py (1)

263-274: Model-specific branching uses string containment check.

The "gpt-5" in model check is fragile and may match unintended models (e.g., hypothetical "not-gpt-5-model"). Consider a more explicit check or extracting this to a model capabilities lookup.

💡 Alternative approach
GPT5_MODEL_PREFIXES = ("gpt-5", "o1-", "o3-")  # Models requiring max_completion_tokens

def _is_reasoning_model(model: str) -> bool:
    return any(model.startswith(prefix) for prefix in GPT5_MODEL_PREFIXES)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/backends/openai.py` around lines 263 - 274, The model check using
`"gpt-5" in model` is fragile; replace it with an explicit capability lookup
helper (e.g., add a GPT5_MODEL_PREFIXES tuple and implement a function like
_is_reasoning_model(model: str) that returns any(model.startswith(prefix) for
prefix in GPT5_MODEL_PREFIXES)) and then use that helper in the openai.py
branching where the code sets params["max_completion_tokens"] vs
params["max_tokens"] (the block containing the "gpt-5" in model check and the
thinking_effort/verbosity handling); update references to
extra_params/temperature logic to use the same helper so only intended models
use max_completion_tokens and reasoning flags.
src/llm/backends/groq.py (1)

58-76: Add defensive check for empty choices array.

Line 59 accesses response.choices[0] without verifying that choices exist. While unlikely in normal operation, a malformed response could cause an IndexError.

🛡️ Defensive check
         response = await self._client.chat.completions.create(**params)
-        if response.choices[0].message.content is None:
+        if not response.choices:
+            raise ValueError("No choices in response")
+        if response.choices[0].message.content is None:
             raise ValueError("No content in response")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/backends/groq.py` around lines 58 - 76, Add a defensive check that
response.choices is non-empty before accessing response.choices[0] in the chat
completion handling after calling self._client.chat.completions.create; if
choices is empty or None, raise a clear ValueError (or handle gracefully) with
context (e.g., include response metadata) instead of indexing into it. Update
the block that reads response.choices[0].message.content,
response.choices[0].finish_reason, and the JSON parsing/repair code paths
(response_format.model_validate and repair_response_model_json) to only run
after this validation so you never trigger IndexError on a malformed response.
tests/live_llm/test_live_gemini.py (1)

146-157: Consider iterable unpacking for cleaner list construction.

The static analysis tool flags this concatenation pattern. Using unpacking is slightly more idiomatic and avoids intermediate list creation.

♻️ Suggested refactor
-    replay_messages = initial_messages + [
-        adapter.format_assistant_tool_message(first),
-        *adapter.format_tool_results(
-            [
-                {
-                    "tool_id": tool_call.id,
-                    "tool_name": tool_call.name,
-                    "result": tool_result,
-                }
-            ]
-        ),
-    ]
+    replay_messages = [
+        *initial_messages,
+        adapter.format_assistant_tool_message(first),
+        *adapter.format_tool_results(
+            [
+                {
+                    "tool_id": tool_call.id,
+                    "tool_name": tool_call.name,
+                    "result": tool_result,
+                }
+            ]
+        ),
+    ]
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/live_llm/test_live_gemini.py` around lines 146 - 157, Replace the
current list concatenation when building replay_messages with iterable unpacking
to avoid creating intermediate lists: spread initial_messages, then include
adapter.format_assistant_tool_message(first) as an element, and spread the
result of adapter.format_tool_results(...) so the final construction uses
[*initial_messages, adapter.format_assistant_tool_message(first),
*adapter.format_tool_results(...)] while keeping the same input dict for
tool_call (tool_call.id, tool_call.name, tool_result) to preserve behavior.
.env.template (1)

97-111: Minor: Variable ordering could be more consistent.

DERIVER_DEDUPLICATE (line 100) is interleaved within the MODEL_CONFIG variables, breaking the logical grouping. Consider moving it before or after the MODEL_CONFIG block for better readability.

📝 Suggested reordering
 # DERIVER_MODEL_CONFIG__TRANSPORT=gemini
 # DERIVER_MODEL_CONFIG__MODEL=gemini-2.5-flash-lite
 # DERIVER_MODEL_CONFIG__TEMPERATURE=
-# DERIVER_DEDUPLICATE=true
 # DERIVER_MODEL_CONFIG__MAX_OUTPUT_TOKENS=4096
 # DERIVER_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024
+# DERIVER_DEDUPLICATE=true
 # DERIVER_LOG_OBSERVATIONS=false
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.env.template around lines 97 - 111, Move the DERIVER_DEDUPLICATE variable
out of the middle of the DERIVER_MODEL_CONFIG__* block so related MODEL_CONFIG
keys stay grouped; specifically, relocate DERIVER_DEDUPLICATE (currently between
DERIVER_MODEL_CONFIG__TEMPERATURE and DERIVER_MODEL_CONFIG__MAX_OUTPUT_TOKENS)
to either just before the DERIVER_MODEL_CONFIG__* block or immediately after it,
keeping all DERIVER_MODEL_CONFIG__* entries contiguous for clearer grouping and
ordering.
src/llm/backends/anthropic.py (1)

74-90: Consider extracting duplicated schema injection logic.

The schema injection (schema_json creation and appending to message content) is duplicated in lines 80-84 and 86-90. While the conditions differ, the core logic is identical.

♻️ Optional refactor to reduce duplication
+        # Inject JSON schema into the last message if response_format is a Pydantic model
+        if response_format and isinstance(response_format, type):
+            schema_json = json.dumps(response_format.model_json_schema(), indent=2)
+            params["messages"][-1]["content"] += (
+                f"\n\nRespond with valid JSON matching this schema:\n{schema_json}"
+            )
+
         use_json_prefill = (
             bool(response_format or self._json_mode(extra_params))
             and not thinking_budget_tokens
             and self._supports_assistant_prefill(model)
         )
         if use_json_prefill:
-            if response_format and isinstance(response_format, type):
-                schema_json = json.dumps(response_format.model_json_schema(), indent=2)
-                params["messages"][-1]["content"] += (
-                    f"\n\nRespond with valid JSON matching this schema:\n{schema_json}"
-                )
             params["messages"].append({"role": "assistant", "content": "{"})
-        elif response_format and isinstance(response_format, type):
-            schema_json = json.dumps(response_format.model_json_schema(), indent=2)
-            params["messages"][-1]["content"] += (
-                f"\n\nRespond with valid JSON matching this schema:\n{schema_json}"
-            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/backends/anthropic.py` around lines 74 - 90, Move the duplicated
schema injection into a small helper and call it from both branches: create a
helper (e.g., _append_schema_to_last_message or build_and_append_schema) that
takes response_format and params (and uses response_format.model_json_schema(),
json.dumps, and appends to params["messages"][-1]["content"]); then replace the
duplicate blocks in the use_json_prefill branch and the else branch with calls
to that helper. Ensure the helper is only invoked when response_format is truthy
and a type, and keep the existing use_json_prefill logic (including appending
the assistant "{" when applicable) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/v3/contributing/configuration.mdx`:
- Line 402: The example line "DERIVER_MODEL_CONFIG__TEMPERATURE=  # Optional
temperature override (unset by default)" currently assigns an empty string;
change the docs to avoid a blank env assignment by either removing the
assignment or showing it commented out and providing a concrete example value —
e.g. replace with a note that the variable is unset by default and show
"DERIVER_MODEL_CONFIG__TEMPERATURE=0.7" as an example, or present "#
DERIVER_MODEL_CONFIG__TEMPERATURE=" to indicate omission; ensure the symbol
DERIVER_MODEL_CONFIG__TEMPERATURE is referenced and the text explicitly says
"omit the variable to leave it unset" rather than showing a blank assignment.

In `@src/config.py`:
- Around line 601-604: The issue is that nested MODEL_CONFIG blocks get fully
replaced by TOML/env partial overrides instead of being merged, causing
validation failures when only a few nested keys (e.g., MAX_TOOL_ITERATIONS or
TOOL_CHOICE) are supplied. Fix by giving MODEL_CONFIG a proper default instance
so Pydantic will merge values rather than drop the whole object: change the
annotation to a concrete field with a default_factory that constructs a
ConfiguredModelSettings (e.g., replace MODEL_CONFIG:
Annotated[ConfiguredModelSettings, Field(validation_alias="model_config")] with
a field like model_config: ConfiguredModelSettings =
Field(default_factory=ConfiguredModelSettings, validation_alias="model_config")
and apply the same pattern for other nested setting blocks referenced in
AppSettings to ensure partial overrides merge correctly.
- Around line 20-21: The dotenv loading currently calls
load_dotenv(override=True) which lets .env values overwrite existing environment
variables; update the call in src/config.py (the load_dotenv invocation guarded
by os.getenv("PYTHON_DOTENV_DISABLED")) to use override=False so that existing
environment variables take precedence over .env file entries, preserving
deployment-time configuration order.

In `@src/llm/backends/anthropic.py`:
- Around line 82-85: The code mutates the incoming request_messages by directly
editing params["messages"][-1]["content"] and appending to params["messages"],
which can produce side effects; fix it by creating a copy of the messages before
modification (e.g., make params["messages"] = [m.copy() for m in
request_messages] or use copy.deepcopy(request_messages]) and then modify the
copied last message content and append the assistant message to the copied list
so request_messages remains unchanged; update the logic around
params["messages"] and any use of request_messages to operate on the copied list
instead.
- Around line 149-153: The stream() implementation uses a different JSON-prefill
check than complete(): update stream() to compute use_json_prefill the same way
as complete() by calling self._json_mode(extra_params) (in addition to checking
response_format, thinking_budget_tokens, and
self._supports_assistant_prefill(model)), and ensure extra_params is not deleted
before this check (or pass the original extra_params into _json_mode) so JSON
prefill triggers when extra_params={"json_mode": True}; reference the stream()
and complete() functions, the use_json_prefill variable, and the helper
_json_mode(extra_params) when making the change.

In `@src/llm/backends/gemini.py`:
- Around line 192-200: The code currently treats thinking_budget_tokens and
thinking_effort with truthiness checks which drop explicit zero values (e.g.,
thinking_budget_tokens = 0) so Gemini's thinking_config is omitted; update the
conditional logic in gemini.py around thinking_config, thinking_budget_tokens,
and thinking_effort to test for presence explicitly (e.g., use "is not None" or
explicit comparisons) rather than truthiness, so zeros are preserved when set,
and keep the existing mutual-exclusion check (len(thinking_config) > 1) intact.
- Around line 68-86: When cache_policy is a PromptCachePolicy (detected in
cache_policy and handled by _attach_cached_content), avoid re-sending the full
cached prompt to self._client.aio.models.generate_content: after
_attach_cached_content returns (it provides the cached_content handle), filter
or replace the original contents passed to generate_content so it contains only
the new/suffix messages (not the already-cached messages). Update the flow
around the cache check in complete() to detect PromptCachePolicy, obtain the
cached_content from _attach_cached_content, compute the suffix/new messages from
the original contents, and pass only that suffix as contents to generate_content
(keeping config and the cached_content handle as needed).

In `@src/llm/backends/groq.py`:
- Around line 38-42: Replace the silent deletion of tools and tool_choice in the
Groq backend with an explicit error: in src/llm/backends/groq.py where variables
tools and tool_choice are currently deleted (and alongside the existing check
for thinking_budget_tokens/thinking_effort), add a guard that raises
ValueError("Groq backend does not support tool calling") when either tools is
not None or tool_choice is not None; keep the existing ValueError for
thinking_budget_tokens/thinking_effort intact (look for the GroqBackend
constructor or the function handling these kwargs to locate the spot).

In `@src/llm/backends/openai.py`:
- Around line 146-159: The current bare except around the fallback path masks
error details; update the handler in the block that calls
_create_structured_response, _parse_or_repair_structured_content, and
_normalize_response to either catch specific exceptions you expect (e.g.,
ValueError, JSONDecodeError, OpenAIError) or, if you must catch all exceptions,
change it to except Exception as e: and log the exception (type and message)
using the module/logger (e.g., self.logger.exception or process_logger.error)
before proceeding to build the fallback_response so failures are visible in
logs.

In `@src/llm/caching.py`:
- Around line 45-65: The InMemoryGeminiCacheStore leaks expired entries because
eviction only happens in get(); update the set method (and/or constructor) to
purge expired handles before inserting to prevent unbounded growth: inside
InMemoryGeminiCacheStore.set (and while holding self._lock) iterate through
self._handles and remove any entries whose expires_at <=
datetime.now(timezone.utc) before assigning self._handles[handle.key] = handle
(ensure you use the same timezone-aware comparison as in get); this minimal
change reclaims expired entries on writes and prevents memory leaks for keys
that are never read again.

In `@src/llm/credentials.py`:
- Around line 3-25: The function _default_transport_credentials currently raises
a bare ValueError for unknown transports; change this to raise the project's
ValidationException (imported from src.exceptions) so misconfigurations use the
repo's typed exception system—update the imports at the top of
src/llm/credentials.py to include ValidationException, and replace the final
raise ValueError(f"Unknown transport: {transport}") with raise
ValidationException(f"Unknown transport: {transport}") so resolve_credentials
and callers receive the correct exception type.

In `@src/llm/structured_output.py`:
- Around line 82-92: The repair path currently only catches
StructuredOutputError in attempt_structured_output_repair when calling
repair_response_model_json, so pydantic ValidationError can still bubble and
bypass failure_policy; import and catch pydantic's ValidationError (e.g., from
pydantic import ValidationError) alongside StructuredOutputError and return None
on either exception so the failure_policy branches still run, and make the
identical change to the other equivalent try/except block later in the file that
follows the same pattern.

In `@src/utils/clients.py`:
- Around line 1542-1601: Remove the dead helper _repair_response_model_json (and
its pyright ignore) because the project uses the public
repair_response_model_json instead; delete the entire function definition for
_repair_response_model_json and any exclusively-used local references (and clean
up now-unused imports if they become unused after removal), ensuring all callers
use repair_response_model_json (from structured_output) instead.

In `@tests/live_llm/conftest.py`:
- Around line 38-45: The require_provider_key helper raises KeyError for
model_spec.provider == "groq" because the mapping only includes
"anthropic"/"openai"/"gemini"; update require_provider_key to include an
explicit "groq": bool(settings.LLM.GROQ_API_KEY) entry (and optionally handle
unknown providers by calling pytest.skip with a clear message) so that
require_provider_key and its use of model_spec.provider no longer blow up for
Groq LiveModelSpec.

In `@tests/unit/llm/test_model_config.py`:
- Around line 254-267: The test test_env_template_uses_nested_model_config_keys
is asserting the wrong env var prefix for nested dialectic settings; update the
assertions to expect the double-underscore nested form
(DIALECTIC__LEVELS__minimal__MODEL_CONFIG__MODEL and
DIALECTIC__LEVELS__minimal__PROVIDER=) instead of DIALECTIC_LEVELS__... so the
test validates the correct nested override key shape for
settings.DIALECTIC.LEVELS; change both occurences in that test to use
DIALECTIC__LEVELS__minimal__... to match the new per-level provider
configuration naming convention.

In `@tests/unit/llm/test_request_builder.py`:
- Around line 13-32: The async test functions (e.g.,
test_gemini_explicit_budget_adjusts_transport_max_tokens and the other three
async def tests in tests/unit/llm/test_request_builder.py) are missing the
`@pytest.mark.asyncio` decorator; add `@pytest.mark.asyncio` above each async test
function and ensure pytest.mark is imported (import pytest) at the top of the
file so pytest will properly await the coroutines during test runs.

---

Outside diff comments:
In `@src/dreamer/specialists.py`:
- Around line 207-228: The code always passes max_tokens=self.get_max_tokens()
into honcho_llm_call, which ignores per-model or specialist overrides (e.g.
DREAM_MODEL_CONFIG.max_output_tokens); update the honcho_llm_call invocation in
this function so max_tokens is derived from the resolved model_config (use
model_config.max_output_tokens when set, falling back to self.get_max_tokens()),
thereby ensuring get_model_config() and any specialist overrides actually
control the LLM request.

In `@src/embedding_client.py`:
- Around line 29-51: The constructor (__init__) currently treats any
non-"gemini" provider as OpenAI, which hides configuration typos; change
__init__ to explicitly accept only supported providers ("gemini" and "openai")
and raise a ValidationException (from src.exceptions import ValidationException)
for any other value of self.provider, ensuring you do not silently instantiate
AsyncOpenAI for unknown providers; keep the existing gemini and openai branches
(symbols: self.provider, genai.Client, AsyncOpenAI, self.model, self.client,
self.max_embedding_tokens, self.max_batch_size) but add the explicit else that
raises ValidationException with a clear message. Also apply the same
explicit-provider validation and exception raising to the other
provider-handling block in this file (the second location handling provider
around lines ~362-373) so both paths reject unsupported providers instead of
defaulting to OpenAI.

---

Nitpick comments:
In @.env.template:
- Around line 97-111: Move the DERIVER_DEDUPLICATE variable out of the middle of
the DERIVER_MODEL_CONFIG__* block so related MODEL_CONFIG keys stay grouped;
specifically, relocate DERIVER_DEDUPLICATE (currently between
DERIVER_MODEL_CONFIG__TEMPERATURE and DERIVER_MODEL_CONFIG__MAX_OUTPUT_TOKENS)
to either just before the DERIVER_MODEL_CONFIG__* block or immediately after it,
keeping all DERIVER_MODEL_CONFIG__* entries contiguous for clearer grouping and
ordering.

In `@src/llm/__init__.py`:
- Around line 55-80: The override client factory functions
(_get_openai_override_client, _get_anthropic_override_client,
_get_groq_override_client, _get_gemini_override_client) must not be memoized by
raw (base_url, api_key) tuples because that caches secret-bearing keys for the
process lifetime; remove the `@cache` decorator from those functions and make them
return a fresh client per call (or replace with a bounded/ephemeral cache if you
truly need caching), and update callers to create/use/close clients per
operation rather than relying on a global persistent cache.

In `@src/llm/backends/anthropic.py`:
- Around line 74-90: Move the duplicated schema injection into a small helper
and call it from both branches: create a helper (e.g.,
_append_schema_to_last_message or build_and_append_schema) that takes
response_format and params (and uses response_format.model_json_schema(),
json.dumps, and appends to params["messages"][-1]["content"]); then replace the
duplicate blocks in the use_json_prefill branch and the else branch with calls
to that helper. Ensure the helper is only invoked when response_format is truthy
and a type, and keep the existing use_json_prefill logic (including appending
the assistant "{" when applicable) unchanged.

In `@src/llm/backends/groq.py`:
- Around line 58-76: Add a defensive check that response.choices is non-empty
before accessing response.choices[0] in the chat completion handling after
calling self._client.chat.completions.create; if choices is empty or None, raise
a clear ValueError (or handle gracefully) with context (e.g., include response
metadata) instead of indexing into it. Update the block that reads
response.choices[0].message.content, response.choices[0].finish_reason, and the
JSON parsing/repair code paths (response_format.model_validate and
repair_response_model_json) to only run after this validation so you never
trigger IndexError on a malformed response.

In `@src/llm/backends/openai.py`:
- Around line 263-274: The model check using `"gpt-5" in model` is fragile;
replace it with an explicit capability lookup helper (e.g., add a
GPT5_MODEL_PREFIXES tuple and implement a function like
_is_reasoning_model(model: str) that returns any(model.startswith(prefix) for
prefix in GPT5_MODEL_PREFIXES)) and then use that helper in the openai.py
branching where the code sets params["max_completion_tokens"] vs
params["max_tokens"] (the block containing the "gpt-5" in model check and the
thinking_effort/verbosity handling); update references to
extra_params/temperature logic to use the same helper so only intended models
use max_completion_tokens and reasoning flags.

In `@src/llm/request_builder.py`:
- Around line 102-152: execute_stream duplicates the parameter-building logic
from execute_completion; extract that shared logic into a helper (e.g.,
_build_request_params) that returns capabilities, credentials,
effective_max_tokens, thinking_budget_tokens, and merged_extra_params. Replace
the duplicated block in execute_stream with a call to the new helper and pass
returned values into backend.stream; do the same refactor in execute_completion
so both functions reuse the helper and avoid divergent logic (refer to
execute_stream, execute_completion, _adjust_max_tokens_for_explicit_budget,
_build_config_extra_params, and resolve_credentials).

In `@src/utils/clients.py`:
- Around line 616-638: The _history_adapter_for_provider function currently
falls through to OpenAIHistoryAdapter for both "openai" and "groq"; make the
Groq handling explicit by adding an if branch for provider == "groq" that
returns OpenAIHistoryAdapter(), keeping the existing return type and behavior;
update the function body in _history_adapter_for_provider to include the
explicit "groq" case (so callers like _history_adapter_for_provider("groq")
clearly return OpenAIHistoryAdapter).

In `@tests/live_llm/test_live_gemini.py`:
- Around line 146-157: Replace the current list concatenation when building
replay_messages with iterable unpacking to avoid creating intermediate lists:
spread initial_messages, then include
adapter.format_assistant_tool_message(first) as an element, and spread the
result of adapter.format_tool_results(...) so the final construction uses
[*initial_messages, adapter.format_assistant_tool_message(first),
*adapter.format_tool_results(...)] while keeping the same input dict for
tool_call (tool_call.id, tool_call.name, tool_result) to preserve behavior.

In `@tests/unit/llm/conftest.py`:
- Around line 9-26: The FakeBackend test double currently calls
next(self._responses) in both complete and stream which will raise StopIteration
with no context if tests run out of canned responses; update FakeBackend
(constructor: _responses, methods: complete and stream) to catch StopIteration
around next(self._responses) and raise a more descriptive error (e.g.
RuntimeError or AssertionError) that includes the method name and a hint that no
more fake responses were queued, so test failures clearly indicate missing
canned responses.

In `@tests/unit/llm/test_backends/test_openai.py`:
- Around line 65-68: The test function
test_openai_backend_passes_thinking_effort_through_for_non_gpt5_models has its
return type annotation split across lines; consolidate the annotation so the
signature is on a single line (e.g., move "-> None" to the same line as the def)
to match standard Python formatting and improve readability.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 250f06dc-3d67-4269-844d-c6a531e7eeb6

📥 Commits

Reviewing files that changed from the base of the PR and between ef3426d and f9726ca.

📒 Files selected for processing (50)
  • .env.template
  • config.toml.example
  • docs/v3/contributing/configuration.mdx
  • pyproject.toml
  • src/config.py
  • src/deriver/deriver.py
  • src/dialectic/core.py
  • src/dreamer/specialists.py
  • src/embedding_client.py
  • src/llm/__init__.py
  • src/llm/backend.py
  • src/llm/backends/__init__.py
  • src/llm/backends/anthropic.py
  • src/llm/backends/gemini.py
  • src/llm/backends/groq.py
  • src/llm/backends/openai.py
  • src/llm/caching.py
  • src/llm/capabilities.py
  • src/llm/credentials.py
  • src/llm/history_adapters.py
  • src/llm/request_builder.py
  • src/llm/structured_output.py
  • src/telemetry/reasoning_traces.py
  • src/utils/clients.py
  • src/utils/summarizer.py
  • src/utils/types.py
  • tests/__init__.py
  • tests/conftest.py
  • tests/deriver/test_deriver_processing.py
  • tests/dialectic/test_model_config_usage.py
  • tests/dreamer/test_model_config_usage.py
  • tests/live_llm/README.md
  • tests/live_llm/__init__.py
  • tests/live_llm/conftest.py
  • tests/live_llm/model_matrix.py
  • tests/live_llm/test_live_anthropic.py
  • tests/live_llm/test_live_gemini.py
  • tests/live_llm/test_live_openai.py
  • tests/unit/llm/conftest.py
  • tests/unit/llm/test_backends/test_anthropic.py
  • tests/unit/llm/test_backends/test_gemini.py
  • tests/unit/llm/test_backends/test_openai.py
  • tests/unit/llm/test_capabilities.py
  • tests/unit/llm/test_credentials.py
  • tests/unit/llm/test_history_adapters.py
  • tests/unit/llm/test_model_config.py
  • tests/unit/llm/test_request_builder.py
  • tests/utils/test_clients.py
  • tests/utils/test_length_finish_reason.py
  • tests/utils/test_summarizer.py
DERIVER_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024
DERIVER_MAX_INPUT_TOKENS=23000 # Maximum input tokens for deriver
DERIVER_TEMPERATURE= # Optional temperature override (unset by default)
DERIVER_MODEL_CONFIG__TEMPERATURE= # Optional temperature override (unset by default)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Don't use a blank env assignment to mean "unset".

In .env/shell syntax this still assigns a value, so users copying it verbatim won't get the "leave it unset" behavior the comment describes.

📝 Suggested doc tweak
-DERIVER_MODEL_CONFIG__TEMPERATURE=  # Optional temperature override (unset by default)
+# DERIVER_MODEL_CONFIG__TEMPERATURE=0.2  # Optional temperature override
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
DERIVER_MODEL_CONFIG__TEMPERATURE= # Optional temperature override (unset by default)
# DERIVER_MODEL_CONFIG__TEMPERATURE=0.2 # Optional temperature override
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/v3/contributing/configuration.mdx` at line 402, The example line
"DERIVER_MODEL_CONFIG__TEMPERATURE=  # Optional temperature override (unset by
default)" currently assigns an empty string; change the docs to avoid a blank
env assignment by either removing the assignment or showing it commented out and
providing a concrete example value — e.g. replace with a note that the variable
is unset by default and show "DERIVER_MODEL_CONFIG__TEMPERATURE=0.7" as an
example, or present "# DERIVER_MODEL_CONFIG__TEMPERATURE=" to indicate omission;
ensure the symbol DERIVER_MODEL_CONFIG__TEMPERATURE is referenced and the text
explicitly says "omit the variable to leave it unset" rather than showing a
blank assignment.
Comment on lines +20 to +21
if not os.getenv("PYTHON_DOTENV_DISABLED"):
load_dotenv(override=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# Check the current implementation in src/config.py
head -100 src/config.py | cat -n

Repository: plastic-labs/honcho

Length of output: 3593


🏁 Script executed:

# Search for all load_dotenv calls in the codebase
rg "load_dotenv" -A 2 -B 2

Repository: plastic-labs/honcho

Length of output: 15102


🏁 Script executed:

# Check if there's any documentation or comments about config precedence
rg "precedence|override|environment" src/config.py -i -A 2 -B 2 | head -50

Repository: plastic-labs/honcho

Length of output: 1710


🌐 Web query:

python-dotenv load_dotenv override parameter behavior documentation

💡 Result:

In python-dotenv, load_dotenv(..., override=...) controls whether values from your .env file are allowed to replace variables that are already set in the process environment (os.environ):

  • override=False (default): existing environment variables win; .env entries are only used for variables that are not already set. [1]
  • override=True: variables loaded from the .env file override/replace existing environment variables. [1][2]

Sources: python-dotenv reference docs and package docs. [1][2]


Use override=False to respect environment variable precedence.

The override=True parameter allows .env values to replace deployment-time environment variables, violating the intended configuration precedence where environment variables should have highest priority. Change to override=False so that existing environment variables take precedence over .env file entries.

Suggested fix
 if not os.getenv("PYTHON_DOTENV_DISABLED"):
-    load_dotenv(override=True)
+    load_dotenv(override=False)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if not os.getenv("PYTHON_DOTENV_DISABLED"):
load_dotenv(override=True)
if not os.getenv("PYTHON_DOTENV_DISABLED"):
load_dotenv(override=False)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.py` around lines 20 - 21, The dotenv loading currently calls
load_dotenv(override=True) which lets .env values overwrite existing environment
variables; update the call in src/config.py (the load_dotenv invocation guarded
by os.getenv("PYTHON_DOTENV_DISABLED")) to use override=False so that existing
environment variables take precedence over .env file entries, preserving
deployment-time configuration order.
Comment on lines +601 to 604
MODEL_CONFIG: Annotated[
ConfiguredModelSettings,
Field(validation_alias="model_config"),
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Partial dialectic.levels.* overrides now break settings bootstrap.

The defaults are defined here, but incoming TOML/env level blocks replace each nested level instead of layering on top of it. Any config that only overrides MAX_TOOL_ITERATIONS / TOOL_CHOICE now fails validation because MODEL_CONFIG is missing, which matches the current AppSettings() pipeline failure.

Also applies to: 639-685

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/config.py` around lines 601 - 604, The issue is that nested MODEL_CONFIG
blocks get fully replaced by TOML/env partial overrides instead of being merged,
causing validation failures when only a few nested keys (e.g.,
MAX_TOOL_ITERATIONS or TOOL_CHOICE) are supplied. Fix by giving MODEL_CONFIG a
proper default instance so Pydantic will merge values rather than drop the whole
object: change the annotation to a concrete field with a default_factory that
constructs a ConfiguredModelSettings (e.g., replace MODEL_CONFIG:
Annotated[ConfiguredModelSettings, Field(validation_alias="model_config")] with
a field like model_config: ConfiguredModelSettings =
Field(default_factory=ConfiguredModelSettings, validation_alias="model_config")
and apply the same pattern for other nested setting blocks referenced in
AppSettings to ensure partial overrides merge correctly.
Comment on lines +82 to +85
params["messages"][-1]["content"] += (
f"\n\nRespond with valid JSON matching this schema:\n{schema_json}"
)
params["messages"].append({"role": "assistant", "content": "{"})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Mutating the input message list may cause unintended side effects.

The code modifies params["messages"][-1]["content"] in-place and appends to params["messages"]. Since params["messages"] is assigned from request_messages (the filtered messages list), mutations here could affect the caller's data if they reuse the messages list.

Consider making a deep copy of the messages before modification, or at minimum copying the last message dict before mutating its content.

🛡️ Proposed fix to avoid mutation
         use_json_prefill = (
             bool(response_format or self._json_mode(extra_params))
             and not thinking_budget_tokens
             and self._supports_assistant_prefill(model)
         )
         if use_json_prefill:
             if response_format and isinstance(response_format, type):
                 schema_json = json.dumps(response_format.model_json_schema(), indent=2)
-                params["messages"][-1]["content"] += (
+                # Copy the last message to avoid mutating the caller's data
+                last_msg = dict(params["messages"][-1])
+                last_msg["content"] = last_msg["content"] + (
                     f"\n\nRespond with valid JSON matching this schema:\n{schema_json}"
                 )
+                params["messages"][-1] = last_msg
             params["messages"].append({"role": "assistant", "content": "{"})
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
params["messages"][-1]["content"] += (
f"\n\nRespond with valid JSON matching this schema:\n{schema_json}"
)
params["messages"].append({"role": "assistant", "content": "{"})
# Copy the last message to avoid mutating the caller's data
last_msg = dict(params["messages"][-1])
last_msg["content"] = last_msg["content"] + (
f"\n\nRespond with valid JSON matching this schema:\n{schema_json}"
)
params["messages"][-1] = last_msg
params["messages"].append({"role": "assistant", "content": "{"})
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/backends/anthropic.py` around lines 82 - 85, The code mutates the
incoming request_messages by directly editing params["messages"][-1]["content"]
and appending to params["messages"], which can produce side effects; fix it by
creating a copy of the messages before modification (e.g., make
params["messages"] = [m.copy() for m in request_messages] or use
copy.deepcopy(request_messages]) and then modify the copied last message content
and append the assistant message to the copied list so request_messages remains
unchanged; update the logic around params["messages"] and any use of
request_messages to operate on the copied list instead.
Comment on lines +149 to +153
use_json_prefill = (
response_format is not None
and not thinking_budget_tokens
and self._supports_assistant_prefill(model)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Inconsistent JSON mode handling between complete() and stream().

In complete() (line 74-78), use_json_prefill considers self._json_mode(extra_params), but in stream() it only checks response_format is not None. Additionally, extra_params is deleted at line 120, making _json_mode inaccessible.

This means stream() won't apply JSON prefill when extra_params={"json_mode": True} is passed without an explicit response_format.

🔧 Proposed fix for consistency
-    async def stream(
-        ...
-        extra_params: dict[str, Any] | None = None,
-    ) -> AsyncIterator[StreamChunk]:
-        del max_output_tokens, api_key, api_base, extra_params
+    async def stream(
+        ...
+        extra_params: dict[str, Any] | None = None,
+    ) -> AsyncIterator[StreamChunk]:
+        del max_output_tokens, api_key, api_base
         ...
         use_json_prefill = (
-            response_format is not None
+            bool(response_format or self._json_mode(extra_params))
             and not thinking_budget_tokens
             and self._supports_assistant_prefill(model)
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/backends/anthropic.py` around lines 149 - 153, The stream()
implementation uses a different JSON-prefill check than complete(): update
stream() to compute use_json_prefill the same way as complete() by calling
self._json_mode(extra_params) (in addition to checking response_format,
thinking_budget_tokens, and self._supports_assistant_prefill(model)), and ensure
extra_params is not deleted before this check (or pass the original extra_params
into _json_mode) so JSON prefill triggers when extra_params={"json_mode": True};
reference the stream() and complete() functions, the use_json_prefill variable,
and the helper _json_mode(extra_params) when making the change.
Comment on lines +82 to +92
def attempt_structured_output_repair(
content: object,
response_model: type[BaseModel],
model: str,
) -> BaseModel | None:
if not isinstance(content, str):
return None
try:
return repair_response_model_json(content, response_model, model)
except StructuredOutputError:
return None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Catch ValidationError during repair so failure_policy still applies.

repair_response_model_json() raises ValidationError for irreparable string payloads, but attempt_structured_output_repair() only swallows StructuredOutputError. That means a bad JSON string can skip the repair_then_empty / repair_then_raise branches and bubble a raw validation error instead.

🩹 Suggested fix
 def attempt_structured_output_repair(
     content: object,
     response_model: type[BaseModel],
     model: str,
 ) -> BaseModel | None:
     if not isinstance(content, str):
         return None
     try:
         return repair_response_model_json(content, response_model, model)
-    except StructuredOutputError:
+    except (StructuredOutputError, ValidationError):
         return None

Also applies to: 101-132

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/llm/structured_output.py` around lines 82 - 92, The repair path currently
only catches StructuredOutputError in attempt_structured_output_repair when
calling repair_response_model_json, so pydantic ValidationError can still bubble
and bypass failure_policy; import and catch pydantic's ValidationError (e.g.,
from pydantic import ValidationError) alongside StructuredOutputError and return
None on either exception so the failure_policy branches still run, and make the
identical change to the other equivalent try/except block later in the file that
follows the same pattern.
Comment on lines +1542 to +1601
def _repair_response_model_json( # pyright: ignore[reportUnusedFunction]
raw_content: str,
response_model: type[BaseModel],
model: str,
) -> BaseModel:
"""Attempt to repair truncated/malformed JSON and validate against response_model.

Used by all provider paths when structured output parsing fails.
For PromptRepresentation, falls back to an empty instance.
For other models, re-raises ValidationError.
"""
try:
final = validate_and_repair_json(raw_content)
repaired_data = json.loads(final)

# Schema-aware repair for PromptRepresentation
if (
response_model is PromptRepresentation
and "deductive" in repaired_data
and isinstance(repaired_data["deductive"], list)
):
for i, item in enumerate(repaired_data["deductive"]):
if isinstance(item, dict):
if "conclusion" not in item and "premises" in item:
logger.warning(
f"Deductive observation {i} missing conclusion, adding placeholder"
)
if item["premises"]:
item["conclusion"] = (
f"[Incomplete reasoning from premises: {item['premises'][0][:100]}...]"
)
else:
item["conclusion"] = (
"[Incomplete reasoning - conclusion missing]"
)
if "premises" not in item:
item["premises"] = []

final = json.dumps(repaired_data)
except (json.JSONDecodeError, KeyError, TypeError, ValueError) as repair_err:
final = ""
logger.warning(
f"Could not perform JSON repair on truncated output from {model}: {repair_err}"
)

try:
return response_model.model_validate_json(final)
except ValidationError as ve:
logger.error(
f"Validation error after repair of truncated output from {model}: {ve}"
)
logger.debug(f"Problematic JSON: {final}")

if response_model is PromptRepresentation:
logger.warning(
"Using fallback empty Representation due to truncated output"
)
return PromptRepresentation(explicit=[])
else:
raise
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check if _repair_response_model_json is called anywhere
rg -n "_repair_response_model_json" --type=py
# Check for similar repair functions
rg -n "repair.*json" --type=py -l

Repository: plastic-labs/honcho

Length of output: 346


🏁 Script executed:

#!/bin/bash
# Check for repair_response_model_json (without underscore) in structured_output.py
echo "=== Checking for repair_response_model_json in structured_output.py ==="
rg -n "def repair_response_model_json" src/llm/structured_output.py

# Check all occurrences of repair functions
echo -e "\n=== All repair function definitions ==="
rg -n "^def.*repair.*json" --type=py

# Check git history for _repair_response_model_json
echo -e "\n=== Git history (last 5 commits) for the function ==="
cd src/utils && git log --oneline -5 -- clients.py

Repository: plastic-labs/honcho

Length of output: 1107


🏁 Script executed:

#!/bin/bash
# Check the actual content around the function in structured_output.py
echo "=== Checking structured_output.py for repair functions ==="
rg -n "repair.*json" src/llm/structured_output.py -A 3

Repository: plastic-labs/honcho

Length of output: 734


🏁 Script executed:

#!/bin/bash
# Extract the full implementation of repair_response_model_json from structured_output.py
echo "=== repair_response_model_json from structured_output.py ==="
sed -n '25,88p' src/llm/structured_output.py

# Extract the full implementation of _repair_response_model_json from clients.py
echo -e "\n=== _repair_response_model_json from clients.py ==="
sed -n '1542,1601p' src/utils/clients.py

Repository: plastic-labs/honcho

Length of output: 4793


🏁 Script executed:

#!/bin/bash
# Check where repair_response_model_json is called from
echo "=== Calls to repair_response_model_json ==="
rg -n "repair_response_model_json" --type=py -B 2 -A 2

# Check if _repair_response_model_json was ever imported or used
echo -e "\n=== Any imports or references to _repair_response_model_json ==="
rg "_repair_response_model_json" --type=py

Repository: plastic-labs/honcho

Length of output: 4750


Remove _repair_response_model_json – it is dead code superseded by repair_response_model_json in src/llm/structured_output.py.

The underscore-prefixed function is never imported or called anywhere in the codebase. All LLM backends (anthropic, gemini, groq, openai) use the public repair_response_model_json from src/llm/structured_output.py instead. The # pyright: ignore[reportUnusedFunction] comment confirms this is intentionally suppressed dead code and should be deleted.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/clients.py` around lines 1542 - 1601, Remove the dead helper
_repair_response_model_json (and its pyright ignore) because the project uses
the public repair_response_model_json instead; delete the entire function
definition for _repair_response_model_json and any exclusively-used local
references (and clean up now-unused imports if they become unused after
removal), ensuring all callers use repair_response_model_json (from
structured_output) instead.
Comment on lines +38 to +45
def require_provider_key(model_spec: LiveModelSpec) -> None:
key_present = {
"anthropic": bool(settings.LLM.ANTHROPIC_API_KEY),
"openai": bool(settings.LLM.OPENAI_API_KEY),
"gemini": bool(settings.LLM.GEMINI_API_KEY),
}[model_spec.provider]
if not key_present:
pytest.skip(f"Missing API key for live provider {model_spec.provider}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Handle Groq explicitly in the live provider-key guard.

This helper only covers Anthropic/OpenAI/Gemini. A Groq LiveModelSpec will currently raise KeyError instead of producing a clear skip/failure path.

🔧 Suggested fix
 def require_provider_key(model_spec: LiveModelSpec) -> None:
     key_present = {
         "anthropic": bool(settings.LLM.ANTHROPIC_API_KEY),
         "openai": bool(settings.LLM.OPENAI_API_KEY),
         "gemini": bool(settings.LLM.GEMINI_API_KEY),
-    }[model_spec.provider]
+        "groq": bool(settings.LLM.GROQ_API_KEY),
+    }.get(model_spec.provider)
+    if key_present is None:
+        raise AssertionError(f"Unsupported live provider {model_spec.provider}")
     if not key_present:
         pytest.skip(f"Missing API key for live provider {model_spec.provider}")
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def require_provider_key(model_spec: LiveModelSpec) -> None:
key_present = {
"anthropic": bool(settings.LLM.ANTHROPIC_API_KEY),
"openai": bool(settings.LLM.OPENAI_API_KEY),
"gemini": bool(settings.LLM.GEMINI_API_KEY),
}[model_spec.provider]
if not key_present:
pytest.skip(f"Missing API key for live provider {model_spec.provider}")
def require_provider_key(model_spec: LiveModelSpec) -> None:
key_present = {
"anthropic": bool(settings.LLM.ANTHROPIC_API_KEY),
"openai": bool(settings.LLM.OPENAI_API_KEY),
"gemini": bool(settings.LLM.GEMINI_API_KEY),
"groq": bool(settings.LLM.GROQ_API_KEY),
}.get(model_spec.provider)
if key_present is None:
raise AssertionError(f"Unsupported live provider {model_spec.provider}")
if not key_present:
pytest.skip(f"Missing API key for live provider {model_spec.provider}")
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/live_llm/conftest.py` around lines 38 - 45, The require_provider_key
helper raises KeyError for model_spec.provider == "groq" because the mapping
only includes "anthropic"/"openai"/"gemini"; update require_provider_key to
include an explicit "groq": bool(settings.LLM.GROQ_API_KEY) entry (and
optionally handle unknown providers by calling pytest.skip with a clear message)
so that require_provider_key and its use of model_spec.provider no longer blow
up for Groq LiveModelSpec.
Comment on lines +254 to +267
def test_env_template_uses_nested_model_config_keys() -> None:
env_template_path = Path(__file__).resolve().parents[3] / ".env.template"
env_template = env_template_path.read_text()

assert "DERIVER_MODEL_CONFIG__MODEL" in env_template
assert "DIALECTIC_LEVELS__minimal__MODEL_CONFIG__MODEL" in env_template
assert "SUMMARY_MODEL_CONFIG__MODEL" in env_template
assert "DREAM_MODEL_CONFIG__MODEL" in env_template
assert "DREAM_DEDUCTION_MODEL_CONFIG__MODEL" in env_template

assert "DERIVER_PROVIDER=" not in env_template
assert "SUMMARY_PROVIDER=" not in env_template
assert "DIALECTIC_LEVELS__minimal__PROVIDER=" not in env_template
assert "DREAM_PROVIDER=" not in env_template
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Use the DIALECTIC__LEVELS__... env path here.

This test is currently asserting DIALECTIC_LEVELS__..., so it will still pass if .env.template uses the wrong override key shape. The dialectic nested settings need the extra __ hop after DIALECTIC.

🧪 Suggested fix
-    assert "DIALECTIC_LEVELS__minimal__MODEL_CONFIG__MODEL" in env_template
+    assert "DIALECTIC__LEVELS__minimal__MODEL_CONFIG__MODEL" in env_template
@@
-    assert "DIALECTIC_LEVELS__minimal__PROVIDER=" not in env_template
+    assert "DIALECTIC__LEVELS__minimal__PROVIDER=" not in env_template

Based on learnings: "Use new per-level provider configuration under settings.DIALECTIC.LEVELS. In CI, overrides must use nested environment variables following the double-underscore naming convention: DIALECTIC__LEVELS____PROVIDER."

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/llm/test_model_config.py` around lines 254 - 267, The test
test_env_template_uses_nested_model_config_keys is asserting the wrong env var
prefix for nested dialectic settings; update the assertions to expect the
double-underscore nested form (DIALECTIC__LEVELS__minimal__MODEL_CONFIG__MODEL
and DIALECTIC__LEVELS__minimal__PROVIDER=) instead of DIALECTIC_LEVELS__... so
the test validates the correct nested override key shape for
settings.DIALECTIC.LEVELS; change both occurences in that test to use
DIALECTIC__LEVELS__minimal__... to match the new per-level provider
configuration naming convention.
Comment on lines +13 to +32
async def test_gemini_explicit_budget_adjusts_transport_max_tokens(
fake_backend: FakeBackend,
) -> None:
config = ModelConfig(
model="gemini-2.5-flash",
transport="gemini",
thinking_budget_tokens=256,
)

await execute_completion(
fake_backend,
config,
messages=[{"role": "user", "content": "Hello"}],
max_tokens=100,
)

call = fake_backend.calls[0]
assert call["max_output_tokens"] == 100
assert call["max_tokens"] == 356
assert call["thinking_budget_tokens"] == 256
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Missing @pytest.mark.asyncio decorator on async tests.

All four test functions in this file are async def but lack the @pytest.mark.asyncio decorator. Without it, pytest will not properly await the coroutines, causing tests to appear to pass while the actual async code never executes.

🐛 Proposed fix
+import pytest
+
 from pydantic import BaseModel

 from src.config import ModelConfig
 from src.llm.caching import PromptCachePolicy
 from src.llm.request_builder import execute_completion
 from tests.unit.llm.conftest import FakeBackend


 class SampleResponse(BaseModel):
     answer: str


+@pytest.mark.asyncio
 async def test_gemini_explicit_budget_adjusts_transport_max_tokens(
     fake_backend: FakeBackend,
 ) -> None:

Apply the same decorator to all four test functions.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/unit/llm/test_request_builder.py` around lines 13 - 32, The async test
functions (e.g., test_gemini_explicit_budget_adjusts_transport_max_tokens and
the other three async def tests in tests/unit/llm/test_request_builder.py) are
missing the `@pytest.mark.asyncio` decorator; add `@pytest.mark.asyncio` above each
async test function and ensure pytest.mark is imported (import pytest) at the
top of the file so pytest will properly await the coroutines during test runs.
self.max_batch_size = 2048 # Same as OpenAI
else: # openai
if api_key is None:
api_key = settings.LLM.OPENAI_API_KEY
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should accept the OPENAI_BASE_URL so we can have configurable openai-compatible embedding providers

model_config = SettingsConfigDict(env_prefix="DERIVER_", extra="ignore") # pyright: ignore
class DeriverSettings(HonchoSettings):
model_config = SettingsConfigDict( # pyright: ignore
env_prefix="DERIVER_", env_nested_delimiter="__", extra="ignore"
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing to note is that env_nested_delimiter="__" doesn't merge overrides with the default model config, so for example setting DERIVER_MODEL_CONFIG__THINKING_BUDGET_TOKENS=2048 replaces the whole object instead of patching the default deriver config. this might be intentional but just flagging

Raises:
ValueError: If provider is not configured
"""
runtime_model_config = _resolve_runtime_model_config(model_config)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there's a regression here. previously we resolved reasoning params per-attempt, but now we are resolving them upfront. if our final retry switches transports, we can end up passing invalid params to the fallback transport

"""Best-effort local cache for Gemini cached-content handles."""

def __init__(self) -> None:
self._handles: dict[str, GeminiCacheHandle] = {}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this grows unbounded

Comment on lines +146 to +155
except Exception:
fallback_response = await self._create_structured_response(
params=params,
response_format=response_format,
)
content = self._parse_or_repair_structured_content(
fallback_response,
response_format,
model,
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is broad, I think we want some errors to propagate normally instead of trying to repair in all cases

):
system_messages.append(message["content"])
else:
non_system_messages.append(message)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should copy the message dict before storing it since we modify it later via params["messages"][-1]["content"] += "\n\nRespond with valid JSON matching this schema:...otherwise we would change the caller's original message list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants