Refactor clients.py to add modern features and more flexible configuration#459
Refactor clients.py to add modern features and more flexible configuration#459VVoruganti wants to merge 11 commits intomainfrom
Conversation
… and Gemini thinking budget support LengthFinishReasonError from OpenAI-compatible providers (custom, openai, groq) was crashing the deriver with 14k+ occurrences in production. The vLLM path already had repair logic but it was gated on provider=="vllm", unreachable when routing through litellm as a custom provider. - Extract shared _repair_response_model_json() helper for all providers - Catch LengthFinishReasonError in OpenAI/custom parse() path and repair truncated JSON - Add repair fallback to Anthropic and Gemini response_model paths - Add repair fallback to Groq response_model path - Pass thinking_budget_tokens to Gemini 2.5 models via thinking_config - Add 14 tests covering repair paths for all providers and Gemini thinking budget Fixes HONCHO-YC Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
WalkthroughThe PR introduces a unified LLM provider backend architecture by replacing flat provider/model settings with nested Changes
Sequence Diagram(s)sequenceDiagram
actor Caller as Caller (e.g., Deriver)
participant HLC as honcho_llm_call
participant RB as request_builder.execute_completion
participant CR as credentials.resolve_credentials
participant CB as llm.__init__.get_backend
participant PB as ProviderBackend<br/>(Anthropic/OpenAI/Gemini/Groq)
participant API as LLM Provider API
Caller->>HLC: honcho_llm_call(model_config, messages, ...)
HLC->>RB: execute_completion(backend, config, ...)
RB->>CR: resolve_credentials(model_config)
CR-->>RB: {api_key, api_base}
RB->>CB: get_backend(config)
CB->>CB: Determine transport (anthropic/openai/gemini/groq)
CB-->>RB: ProviderBackend instance
RB->>PB: complete(model=..., messages=..., api_key=..., api_base=...)
PB->>API: Native SDK call (e.g., messages.create, generate_content)
API-->>PB: Raw provider response
PB->>PB: _normalize_response(): Extract content,<br/>tool_calls, thinking, reasoning
PB-->>RB: CompletionResult(content, tool_calls, ...)
RB-->>HLC: CompletionResult
HLC-->>Caller: HonchoLLMCallResponse
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 16
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/embedding_client.py (1)
29-51:⚠️ Potential issue | 🟠 MajorReject unsupported embedding providers instead of silently routing them to OpenAI.
Lines 32-51 and 365-369 currently treat every non-
geminivalue as OpenAI. A typo or a lingering legacy provider value will therefore instantiate the OpenAI client with the wrong credentials/base URL and fail much later than the actual config mistake.🐛 Suggested fix
+from src.exceptions import ValidationException + from .config import settings- else: # openai + elif self.provider == "openai": if api_key is None: api_key = settings.LLM.OPENAI_API_KEY if not api_key: raise ValueError("OpenAI API key is required") self.client = AsyncOpenAI(api_key=api_key) self.model = "text-embedding-3-small" self.max_embedding_tokens = settings.MAX_EMBEDDING_TOKENS self.max_batch_size = 2048 # OpenAI batch limit + else: + raise ValidationException( + f"Unsupported embedding provider: {self.provider}" + )- if provider == "gemini": - api_key = settings.LLM.GEMINI_API_KEY - else: - api_key = settings.LLM.OPENAI_API_KEY + if provider == "gemini": + api_key = settings.LLM.GEMINI_API_KEY + elif provider == "openai": + api_key = settings.LLM.OPENAI_API_KEY + else: + raise ValidationException( + f"Unsupported embedding provider: {provider}" + )As per coding guidelines, "Use explicit error handling with appropriate exception types" and "Use custom exceptions defined in src/exceptions.py with specific exception types (ResourceNotFoundException, ValidationException, etc.)".
Also applies to: 362-373
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/embedding_client.py` around lines 29 - 51, The constructor (__init__) currently treats any non-"gemini" provider as OpenAI, which hides configuration typos; change __init__ to explicitly accept only supported providers ("gemini" and "openai") and raise a ValidationException (from src.exceptions import ValidationException) for any other value of self.provider, ensuring you do not silently instantiate AsyncOpenAI for unknown providers; keep the existing gemini and openai branches (symbols: self.provider, genai.Client, AsyncOpenAI, self.model, self.client, self.max_embedding_tokens, self.max_batch_size) but add the explicit else that raises ValidationException with a clear message. Also apply the same explicit-provider validation and exception raising to the other provider-handling block in this file (the second location handling provider around lines ~362-373) so both paths reject unsupported providers instead of defaulting to OpenAI.src/dreamer/specialists.py (1)
207-228:⚠️ Potential issue | 🟠 MajorConfigured specialist token caps are still a no-op here.
This path always passes an explicit
max_tokens=self.get_max_tokens(), so the newDREAM_MODEL_CONFIG.max_output_tokens/ specialist inherited overrides never influence the actual LLM request.🔧 Minimal fix
model_config = self.get_model_config() + max_tokens = model_config.max_output_tokens or self.get_max_tokens() # Track iterations via callback iteration_count = 0 @@ response: HonchoLLMCallResponse[str] = await honcho_llm_call( model_config=model_config, prompt="", # Ignored since we pass messages - max_tokens=self.get_max_tokens(), + max_tokens=max_tokens, tools=self.get_tools(peer_card_enabled=peer_card_enabled), tool_choice=None, tool_executor=tool_executor,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/dreamer/specialists.py` around lines 207 - 228, The code always passes max_tokens=self.get_max_tokens() into honcho_llm_call, which ignores per-model or specialist overrides (e.g. DREAM_MODEL_CONFIG.max_output_tokens); update the honcho_llm_call invocation in this function so max_tokens is derived from the resolved model_config (use model_config.max_output_tokens when set, falling back to self.get_max_tokens()), thereby ensuring get_model_config() and any specialist overrides actually control the LLM request.
🧹 Nitpick comments (10)
tests/unit/llm/test_backends/test_openai.py (1)
65-68: Unusual return type annotation formatting.The function return type annotation is split across multiple lines in an unconventional way. While technically valid Python, it's non-standard.
-async def test_openai_backend_passes_thinking_effort_through_for_non_gpt5_models() -> ( - None -): +async def test_openai_backend_passes_thinking_effort_through_for_non_gpt5_models() -> None:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/llm/test_backends/test_openai.py` around lines 65 - 68, The test function test_openai_backend_passes_thinking_effort_through_for_non_gpt5_models has its return type annotation split across lines; consolidate the annotation so the signature is on a single line (e.g., move "-> None" to the same line as the def) to match standard Python formatting and improve readability.tests/unit/llm/conftest.py (1)
9-26: Clean test double implementation.The
FakeBackendprovides a useful test helper for capturing call arguments and serving canned responses. One consideration: if tests don't queue enough responses,next(self._responses)will raiseStopIteration, which will bubble up as-is. This is generally acceptable for tests (failing clearly), but a more descriptive error could aid debugging.💡 Optional: wrap with descriptive error
async def complete(self, **kwargs: Any) -> CompletionResult: self.calls.append(kwargs) - return next(self._responses) + try: + return next(self._responses) + except StopIteration: + raise RuntimeError("FakeBackend: no more queued responses") from None🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/unit/llm/conftest.py` around lines 9 - 26, The FakeBackend test double currently calls next(self._responses) in both complete and stream which will raise StopIteration with no context if tests run out of canned responses; update FakeBackend (constructor: _responses, methods: complete and stream) to catch StopIteration around next(self._responses) and raise a more descriptive error (e.g. RuntimeError or AssertionError) that includes the method name and a hint that no more fake responses were queued, so test failures clearly indicate missing canned responses.src/llm/__init__.py (1)
55-80: Don't memoize override clients by raw credential tuples.
@cachekeeps every(base_url, api_key)pair for the full process lifetime. If these overrides are test-, tenant-, or rotation-specific, this turns into an unbounded client cache and keeps secret-bearing keys resident in memory longer than necessary.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llm/__init__.py` around lines 55 - 80, The override client factory functions (_get_openai_override_client, _get_anthropic_override_client, _get_groq_override_client, _get_gemini_override_client) must not be memoized by raw (base_url, api_key) tuples because that caches secret-bearing keys for the process lifetime; remove the `@cache` decorator from those functions and make them return a fresh client per call (or replace with a bounded/ephemeral cache if you truly need caching), and update callers to create/use/close clients per operation rather than relying on a global persistent cache.src/utils/clients.py (1)
616-638: Groq provider uses OpenAIHistoryAdapter by default.
_history_adapter_for_providerreturnsOpenAIHistoryAdapterfor both "openai" and "groq" providers (via the fallback return). This is likely intentional since Groq uses OpenAI-compatible APIs, but consider adding an explicit case for clarity.💡 Explicit groq handling
def _history_adapter_for_provider( provider: SupportedProviders, ) -> AnthropicHistoryAdapter | GeminiHistoryAdapter | OpenAIHistoryAdapter: if provider == "anthropic": return AnthropicHistoryAdapter() if provider == "google": return GeminiHistoryAdapter() + # Groq uses OpenAI-compatible message format return OpenAIHistoryAdapter()🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/utils/clients.py` around lines 616 - 638, The _history_adapter_for_provider function currently falls through to OpenAIHistoryAdapter for both "openai" and "groq"; make the Groq handling explicit by adding an if branch for provider == "groq" that returns OpenAIHistoryAdapter(), keeping the existing return type and behavior; update the function body in _history_adapter_for_provider to include the explicit "groq" case (so callers like _history_adapter_for_provider("groq") clearly return OpenAIHistoryAdapter).src/llm/request_builder.py (1)
102-152: Consider extracting shared parameter-building logic to reduce duplication.
execute_streamduplicates the entire parameter-building logic fromexecute_completion. This could be extracted into a shared helper.♻️ Suggested extraction (conceptual)
def _build_request_params( config: ModelConfig, max_tokens: int, cache_policy: PromptCachePolicy | None, extra_params: dict[str, Any] | None, ) -> tuple[ModelCapabilities, dict[str, Any], int, int | None, dict[str, Any]]: capabilities = get_model_capabilities(config) credentials = resolve_credentials(config) thinking_budget_tokens = ( config.thinking_budget_tokens if config.thinking_budget_tokens is not None and config.thinking_budget_tokens > 0 else None ) requested_output_tokens = config.max_output_tokens or max_tokens effective_max_tokens = _adjust_max_tokens_for_explicit_budget( capabilities, requested_output_tokens, thinking_budget_tokens ) merged_extra_params = { **_build_config_extra_params(config), **(extra_params or {}), } if cache_policy is not None: merged_extra_params["cache_policy"] = cache_policy return capabilities, credentials, effective_max_tokens, thinking_budget_tokens, merged_extra_params🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llm/request_builder.py` around lines 102 - 152, execute_stream duplicates the parameter-building logic from execute_completion; extract that shared logic into a helper (e.g., _build_request_params) that returns capabilities, credentials, effective_max_tokens, thinking_budget_tokens, and merged_extra_params. Replace the duplicated block in execute_stream with a call to the new helper and pass returned values into backend.stream; do the same refactor in execute_completion so both functions reuse the helper and avoid divergent logic (refer to execute_stream, execute_completion, _adjust_max_tokens_for_explicit_budget, _build_config_extra_params, and resolve_credentials).src/llm/backends/openai.py (1)
263-274: Model-specific branching uses string containment check.The
"gpt-5" in modelcheck is fragile and may match unintended models (e.g., hypothetical "not-gpt-5-model"). Consider a more explicit check or extracting this to a model capabilities lookup.💡 Alternative approach
GPT5_MODEL_PREFIXES = ("gpt-5", "o1-", "o3-") # Models requiring max_completion_tokens def _is_reasoning_model(model: str) -> bool: return any(model.startswith(prefix) for prefix in GPT5_MODEL_PREFIXES)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llm/backends/openai.py` around lines 263 - 274, The model check using `"gpt-5" in model` is fragile; replace it with an explicit capability lookup helper (e.g., add a GPT5_MODEL_PREFIXES tuple and implement a function like _is_reasoning_model(model: str) that returns any(model.startswith(prefix) for prefix in GPT5_MODEL_PREFIXES)) and then use that helper in the openai.py branching where the code sets params["max_completion_tokens"] vs params["max_tokens"] (the block containing the "gpt-5" in model check and the thinking_effort/verbosity handling); update references to extra_params/temperature logic to use the same helper so only intended models use max_completion_tokens and reasoning flags.src/llm/backends/groq.py (1)
58-76: Add defensive check for empty choices array.Line 59 accesses
response.choices[0]without verifying that choices exist. While unlikely in normal operation, a malformed response could cause an IndexError.🛡️ Defensive check
response = await self._client.chat.completions.create(**params) - if response.choices[0].message.content is None: + if not response.choices: + raise ValueError("No choices in response") + if response.choices[0].message.content is None: raise ValueError("No content in response")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llm/backends/groq.py` around lines 58 - 76, Add a defensive check that response.choices is non-empty before accessing response.choices[0] in the chat completion handling after calling self._client.chat.completions.create; if choices is empty or None, raise a clear ValueError (or handle gracefully) with context (e.g., include response metadata) instead of indexing into it. Update the block that reads response.choices[0].message.content, response.choices[0].finish_reason, and the JSON parsing/repair code paths (response_format.model_validate and repair_response_model_json) to only run after this validation so you never trigger IndexError on a malformed response.tests/live_llm/test_live_gemini.py (1)
146-157: Consider iterable unpacking for cleaner list construction.The static analysis tool flags this concatenation pattern. Using unpacking is slightly more idiomatic and avoids intermediate list creation.
♻️ Suggested refactor
- replay_messages = initial_messages + [ - adapter.format_assistant_tool_message(first), - *adapter.format_tool_results( - [ - { - "tool_id": tool_call.id, - "tool_name": tool_call.name, - "result": tool_result, - } - ] - ), - ] + replay_messages = [ + *initial_messages, + adapter.format_assistant_tool_message(first), + *adapter.format_tool_results( + [ + { + "tool_id": tool_call.id, + "tool_name": tool_call.name, + "result": tool_result, + } + ] + ), + ]🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/live_llm/test_live_gemini.py` around lines 146 - 157, Replace the current list concatenation when building replay_messages with iterable unpacking to avoid creating intermediate lists: spread initial_messages, then include adapter.format_assistant_tool_message(first) as an element, and spread the result of adapter.format_tool_results(...) so the final construction uses [*initial_messages, adapter.format_assistant_tool_message(first), *adapter.format_tool_results(...)] while keeping the same input dict for tool_call (tool_call.id, tool_call.name, tool_result) to preserve behavior..env.template (1)
97-111: Minor: Variable ordering could be more consistent.
DERIVER_DEDUPLICATE(line 100) is interleaved within theMODEL_CONFIGvariables, breaking the logical grouping. Consider moving it before or after theMODEL_CONFIGblock for better readability.📝 Suggested reordering
# DERIVER_MODEL_CONFIG__TRANSPORT=gemini # DERIVER_MODEL_CONFIG__MODEL=gemini-2.5-flash-lite # DERIVER_MODEL_CONFIG__TEMPERATURE= -# DERIVER_DEDUPLICATE=true # DERIVER_MODEL_CONFIG__MAX_OUTPUT_TOKENS=4096 # DERIVER_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024 +# DERIVER_DEDUPLICATE=true # DERIVER_LOG_OBSERVATIONS=false🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.env.template around lines 97 - 111, Move the DERIVER_DEDUPLICATE variable out of the middle of the DERIVER_MODEL_CONFIG__* block so related MODEL_CONFIG keys stay grouped; specifically, relocate DERIVER_DEDUPLICATE (currently between DERIVER_MODEL_CONFIG__TEMPERATURE and DERIVER_MODEL_CONFIG__MAX_OUTPUT_TOKENS) to either just before the DERIVER_MODEL_CONFIG__* block or immediately after it, keeping all DERIVER_MODEL_CONFIG__* entries contiguous for clearer grouping and ordering.src/llm/backends/anthropic.py (1)
74-90: Consider extracting duplicated schema injection logic.The schema injection (
schema_jsoncreation and appending to message content) is duplicated in lines 80-84 and 86-90. While the conditions differ, the core logic is identical.♻️ Optional refactor to reduce duplication
+ # Inject JSON schema into the last message if response_format is a Pydantic model + if response_format and isinstance(response_format, type): + schema_json = json.dumps(response_format.model_json_schema(), indent=2) + params["messages"][-1]["content"] += ( + f"\n\nRespond with valid JSON matching this schema:\n{schema_json}" + ) + use_json_prefill = ( bool(response_format or self._json_mode(extra_params)) and not thinking_budget_tokens and self._supports_assistant_prefill(model) ) if use_json_prefill: - if response_format and isinstance(response_format, type): - schema_json = json.dumps(response_format.model_json_schema(), indent=2) - params["messages"][-1]["content"] += ( - f"\n\nRespond with valid JSON matching this schema:\n{schema_json}" - ) params["messages"].append({"role": "assistant", "content": "{"}) - elif response_format and isinstance(response_format, type): - schema_json = json.dumps(response_format.model_json_schema(), indent=2) - params["messages"][-1]["content"] += ( - f"\n\nRespond with valid JSON matching this schema:\n{schema_json}" - )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/llm/backends/anthropic.py` around lines 74 - 90, Move the duplicated schema injection into a small helper and call it from both branches: create a helper (e.g., _append_schema_to_last_message or build_and_append_schema) that takes response_format and params (and uses response_format.model_json_schema(), json.dumps, and appends to params["messages"][-1]["content"]); then replace the duplicate blocks in the use_json_prefill branch and the else branch with calls to that helper. Ensure the helper is only invoked when response_format is truthy and a type, and keep the existing use_json_prefill logic (including appending the assistant "{" when applicable) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/v3/contributing/configuration.mdx`:
- Line 402: The example line "DERIVER_MODEL_CONFIG__TEMPERATURE= # Optional
temperature override (unset by default)" currently assigns an empty string;
change the docs to avoid a blank env assignment by either removing the
assignment or showing it commented out and providing a concrete example value —
e.g. replace with a note that the variable is unset by default and show
"DERIVER_MODEL_CONFIG__TEMPERATURE=0.7" as an example, or present "#
DERIVER_MODEL_CONFIG__TEMPERATURE=" to indicate omission; ensure the symbol
DERIVER_MODEL_CONFIG__TEMPERATURE is referenced and the text explicitly says
"omit the variable to leave it unset" rather than showing a blank assignment.
In `@src/config.py`:
- Around line 601-604: The issue is that nested MODEL_CONFIG blocks get fully
replaced by TOML/env partial overrides instead of being merged, causing
validation failures when only a few nested keys (e.g., MAX_TOOL_ITERATIONS or
TOOL_CHOICE) are supplied. Fix by giving MODEL_CONFIG a proper default instance
so Pydantic will merge values rather than drop the whole object: change the
annotation to a concrete field with a default_factory that constructs a
ConfiguredModelSettings (e.g., replace MODEL_CONFIG:
Annotated[ConfiguredModelSettings, Field(validation_alias="model_config")] with
a field like model_config: ConfiguredModelSettings =
Field(default_factory=ConfiguredModelSettings, validation_alias="model_config")
and apply the same pattern for other nested setting blocks referenced in
AppSettings to ensure partial overrides merge correctly.
- Around line 20-21: The dotenv loading currently calls
load_dotenv(override=True) which lets .env values overwrite existing environment
variables; update the call in src/config.py (the load_dotenv invocation guarded
by os.getenv("PYTHON_DOTENV_DISABLED")) to use override=False so that existing
environment variables take precedence over .env file entries, preserving
deployment-time configuration order.
In `@src/llm/backends/anthropic.py`:
- Around line 82-85: The code mutates the incoming request_messages by directly
editing params["messages"][-1]["content"] and appending to params["messages"],
which can produce side effects; fix it by creating a copy of the messages before
modification (e.g., make params["messages"] = [m.copy() for m in
request_messages] or use copy.deepcopy(request_messages]) and then modify the
copied last message content and append the assistant message to the copied list
so request_messages remains unchanged; update the logic around
params["messages"] and any use of request_messages to operate on the copied list
instead.
- Around line 149-153: The stream() implementation uses a different JSON-prefill
check than complete(): update stream() to compute use_json_prefill the same way
as complete() by calling self._json_mode(extra_params) (in addition to checking
response_format, thinking_budget_tokens, and
self._supports_assistant_prefill(model)), and ensure extra_params is not deleted
before this check (or pass the original extra_params into _json_mode) so JSON
prefill triggers when extra_params={"json_mode": True}; reference the stream()
and complete() functions, the use_json_prefill variable, and the helper
_json_mode(extra_params) when making the change.
In `@src/llm/backends/gemini.py`:
- Around line 192-200: The code currently treats thinking_budget_tokens and
thinking_effort with truthiness checks which drop explicit zero values (e.g.,
thinking_budget_tokens = 0) so Gemini's thinking_config is omitted; update the
conditional logic in gemini.py around thinking_config, thinking_budget_tokens,
and thinking_effort to test for presence explicitly (e.g., use "is not None" or
explicit comparisons) rather than truthiness, so zeros are preserved when set,
and keep the existing mutual-exclusion check (len(thinking_config) > 1) intact.
- Around line 68-86: When cache_policy is a PromptCachePolicy (detected in
cache_policy and handled by _attach_cached_content), avoid re-sending the full
cached prompt to self._client.aio.models.generate_content: after
_attach_cached_content returns (it provides the cached_content handle), filter
or replace the original contents passed to generate_content so it contains only
the new/suffix messages (not the already-cached messages). Update the flow
around the cache check in complete() to detect PromptCachePolicy, obtain the
cached_content from _attach_cached_content, compute the suffix/new messages from
the original contents, and pass only that suffix as contents to generate_content
(keeping config and the cached_content handle as needed).
In `@src/llm/backends/groq.py`:
- Around line 38-42: Replace the silent deletion of tools and tool_choice in the
Groq backend with an explicit error: in src/llm/backends/groq.py where variables
tools and tool_choice are currently deleted (and alongside the existing check
for thinking_budget_tokens/thinking_effort), add a guard that raises
ValueError("Groq backend does not support tool calling") when either tools is
not None or tool_choice is not None; keep the existing ValueError for
thinking_budget_tokens/thinking_effort intact (look for the GroqBackend
constructor or the function handling these kwargs to locate the spot).
In `@src/llm/backends/openai.py`:
- Around line 146-159: The current bare except around the fallback path masks
error details; update the handler in the block that calls
_create_structured_response, _parse_or_repair_structured_content, and
_normalize_response to either catch specific exceptions you expect (e.g.,
ValueError, JSONDecodeError, OpenAIError) or, if you must catch all exceptions,
change it to except Exception as e: and log the exception (type and message)
using the module/logger (e.g., self.logger.exception or process_logger.error)
before proceeding to build the fallback_response so failures are visible in
logs.
In `@src/llm/caching.py`:
- Around line 45-65: The InMemoryGeminiCacheStore leaks expired entries because
eviction only happens in get(); update the set method (and/or constructor) to
purge expired handles before inserting to prevent unbounded growth: inside
InMemoryGeminiCacheStore.set (and while holding self._lock) iterate through
self._handles and remove any entries whose expires_at <=
datetime.now(timezone.utc) before assigning self._handles[handle.key] = handle
(ensure you use the same timezone-aware comparison as in get); this minimal
change reclaims expired entries on writes and prevents memory leaks for keys
that are never read again.
In `@src/llm/credentials.py`:
- Around line 3-25: The function _default_transport_credentials currently raises
a bare ValueError for unknown transports; change this to raise the project's
ValidationException (imported from src.exceptions) so misconfigurations use the
repo's typed exception system—update the imports at the top of
src/llm/credentials.py to include ValidationException, and replace the final
raise ValueError(f"Unknown transport: {transport}") with raise
ValidationException(f"Unknown transport: {transport}") so resolve_credentials
and callers receive the correct exception type.
In `@src/llm/structured_output.py`:
- Around line 82-92: The repair path currently only catches
StructuredOutputError in attempt_structured_output_repair when calling
repair_response_model_json, so pydantic ValidationError can still bubble and
bypass failure_policy; import and catch pydantic's ValidationError (e.g., from
pydantic import ValidationError) alongside StructuredOutputError and return None
on either exception so the failure_policy branches still run, and make the
identical change to the other equivalent try/except block later in the file that
follows the same pattern.
In `@src/utils/clients.py`:
- Around line 1542-1601: Remove the dead helper _repair_response_model_json (and
its pyright ignore) because the project uses the public
repair_response_model_json instead; delete the entire function definition for
_repair_response_model_json and any exclusively-used local references (and clean
up now-unused imports if they become unused after removal), ensuring all callers
use repair_response_model_json (from structured_output) instead.
In `@tests/live_llm/conftest.py`:
- Around line 38-45: The require_provider_key helper raises KeyError for
model_spec.provider == "groq" because the mapping only includes
"anthropic"/"openai"/"gemini"; update require_provider_key to include an
explicit "groq": bool(settings.LLM.GROQ_API_KEY) entry (and optionally handle
unknown providers by calling pytest.skip with a clear message) so that
require_provider_key and its use of model_spec.provider no longer blow up for
Groq LiveModelSpec.
In `@tests/unit/llm/test_model_config.py`:
- Around line 254-267: The test test_env_template_uses_nested_model_config_keys
is asserting the wrong env var prefix for nested dialectic settings; update the
assertions to expect the double-underscore nested form
(DIALECTIC__LEVELS__minimal__MODEL_CONFIG__MODEL and
DIALECTIC__LEVELS__minimal__PROVIDER=) instead of DIALECTIC_LEVELS__... so the
test validates the correct nested override key shape for
settings.DIALECTIC.LEVELS; change both occurences in that test to use
DIALECTIC__LEVELS__minimal__... to match the new per-level provider
configuration naming convention.
In `@tests/unit/llm/test_request_builder.py`:
- Around line 13-32: The async test functions (e.g.,
test_gemini_explicit_budget_adjusts_transport_max_tokens and the other three
async def tests in tests/unit/llm/test_request_builder.py) are missing the
`@pytest.mark.asyncio` decorator; add `@pytest.mark.asyncio` above each async test
function and ensure pytest.mark is imported (import pytest) at the top of the
file so pytest will properly await the coroutines during test runs.
---
Outside diff comments:
In `@src/dreamer/specialists.py`:
- Around line 207-228: The code always passes max_tokens=self.get_max_tokens()
into honcho_llm_call, which ignores per-model or specialist overrides (e.g.
DREAM_MODEL_CONFIG.max_output_tokens); update the honcho_llm_call invocation in
this function so max_tokens is derived from the resolved model_config (use
model_config.max_output_tokens when set, falling back to self.get_max_tokens()),
thereby ensuring get_model_config() and any specialist overrides actually
control the LLM request.
In `@src/embedding_client.py`:
- Around line 29-51: The constructor (__init__) currently treats any
non-"gemini" provider as OpenAI, which hides configuration typos; change
__init__ to explicitly accept only supported providers ("gemini" and "openai")
and raise a ValidationException (from src.exceptions import ValidationException)
for any other value of self.provider, ensuring you do not silently instantiate
AsyncOpenAI for unknown providers; keep the existing gemini and openai branches
(symbols: self.provider, genai.Client, AsyncOpenAI, self.model, self.client,
self.max_embedding_tokens, self.max_batch_size) but add the explicit else that
raises ValidationException with a clear message. Also apply the same
explicit-provider validation and exception raising to the other
provider-handling block in this file (the second location handling provider
around lines ~362-373) so both paths reject unsupported providers instead of
defaulting to OpenAI.
---
Nitpick comments:
In @.env.template:
- Around line 97-111: Move the DERIVER_DEDUPLICATE variable out of the middle of
the DERIVER_MODEL_CONFIG__* block so related MODEL_CONFIG keys stay grouped;
specifically, relocate DERIVER_DEDUPLICATE (currently between
DERIVER_MODEL_CONFIG__TEMPERATURE and DERIVER_MODEL_CONFIG__MAX_OUTPUT_TOKENS)
to either just before the DERIVER_MODEL_CONFIG__* block or immediately after it,
keeping all DERIVER_MODEL_CONFIG__* entries contiguous for clearer grouping and
ordering.
In `@src/llm/__init__.py`:
- Around line 55-80: The override client factory functions
(_get_openai_override_client, _get_anthropic_override_client,
_get_groq_override_client, _get_gemini_override_client) must not be memoized by
raw (base_url, api_key) tuples because that caches secret-bearing keys for the
process lifetime; remove the `@cache` decorator from those functions and make them
return a fresh client per call (or replace with a bounded/ephemeral cache if you
truly need caching), and update callers to create/use/close clients per
operation rather than relying on a global persistent cache.
In `@src/llm/backends/anthropic.py`:
- Around line 74-90: Move the duplicated schema injection into a small helper
and call it from both branches: create a helper (e.g.,
_append_schema_to_last_message or build_and_append_schema) that takes
response_format and params (and uses response_format.model_json_schema(),
json.dumps, and appends to params["messages"][-1]["content"]); then replace the
duplicate blocks in the use_json_prefill branch and the else branch with calls
to that helper. Ensure the helper is only invoked when response_format is truthy
and a type, and keep the existing use_json_prefill logic (including appending
the assistant "{" when applicable) unchanged.
In `@src/llm/backends/groq.py`:
- Around line 58-76: Add a defensive check that response.choices is non-empty
before accessing response.choices[0] in the chat completion handling after
calling self._client.chat.completions.create; if choices is empty or None, raise
a clear ValueError (or handle gracefully) with context (e.g., include response
metadata) instead of indexing into it. Update the block that reads
response.choices[0].message.content, response.choices[0].finish_reason, and the
JSON parsing/repair code paths (response_format.model_validate and
repair_response_model_json) to only run after this validation so you never
trigger IndexError on a malformed response.
In `@src/llm/backends/openai.py`:
- Around line 263-274: The model check using `"gpt-5" in model` is fragile;
replace it with an explicit capability lookup helper (e.g., add a
GPT5_MODEL_PREFIXES tuple and implement a function like
_is_reasoning_model(model: str) that returns any(model.startswith(prefix) for
prefix in GPT5_MODEL_PREFIXES)) and then use that helper in the openai.py
branching where the code sets params["max_completion_tokens"] vs
params["max_tokens"] (the block containing the "gpt-5" in model check and the
thinking_effort/verbosity handling); update references to
extra_params/temperature logic to use the same helper so only intended models
use max_completion_tokens and reasoning flags.
In `@src/llm/request_builder.py`:
- Around line 102-152: execute_stream duplicates the parameter-building logic
from execute_completion; extract that shared logic into a helper (e.g.,
_build_request_params) that returns capabilities, credentials,
effective_max_tokens, thinking_budget_tokens, and merged_extra_params. Replace
the duplicated block in execute_stream with a call to the new helper and pass
returned values into backend.stream; do the same refactor in execute_completion
so both functions reuse the helper and avoid divergent logic (refer to
execute_stream, execute_completion, _adjust_max_tokens_for_explicit_budget,
_build_config_extra_params, and resolve_credentials).
In `@src/utils/clients.py`:
- Around line 616-638: The _history_adapter_for_provider function currently
falls through to OpenAIHistoryAdapter for both "openai" and "groq"; make the
Groq handling explicit by adding an if branch for provider == "groq" that
returns OpenAIHistoryAdapter(), keeping the existing return type and behavior;
update the function body in _history_adapter_for_provider to include the
explicit "groq" case (so callers like _history_adapter_for_provider("groq")
clearly return OpenAIHistoryAdapter).
In `@tests/live_llm/test_live_gemini.py`:
- Around line 146-157: Replace the current list concatenation when building
replay_messages with iterable unpacking to avoid creating intermediate lists:
spread initial_messages, then include
adapter.format_assistant_tool_message(first) as an element, and spread the
result of adapter.format_tool_results(...) so the final construction uses
[*initial_messages, adapter.format_assistant_tool_message(first),
*adapter.format_tool_results(...)] while keeping the same input dict for
tool_call (tool_call.id, tool_call.name, tool_result) to preserve behavior.
In `@tests/unit/llm/conftest.py`:
- Around line 9-26: The FakeBackend test double currently calls
next(self._responses) in both complete and stream which will raise StopIteration
with no context if tests run out of canned responses; update FakeBackend
(constructor: _responses, methods: complete and stream) to catch StopIteration
around next(self._responses) and raise a more descriptive error (e.g.
RuntimeError or AssertionError) that includes the method name and a hint that no
more fake responses were queued, so test failures clearly indicate missing
canned responses.
In `@tests/unit/llm/test_backends/test_openai.py`:
- Around line 65-68: The test function
test_openai_backend_passes_thinking_effort_through_for_non_gpt5_models has its
return type annotation split across lines; consolidate the annotation so the
signature is on a single line (e.g., move "-> None" to the same line as the def)
to match standard Python formatting and improve readability.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 250f06dc-3d67-4269-844d-c6a531e7eeb6
📒 Files selected for processing (50)
.env.templateconfig.toml.exampledocs/v3/contributing/configuration.mdxpyproject.tomlsrc/config.pysrc/deriver/deriver.pysrc/dialectic/core.pysrc/dreamer/specialists.pysrc/embedding_client.pysrc/llm/__init__.pysrc/llm/backend.pysrc/llm/backends/__init__.pysrc/llm/backends/anthropic.pysrc/llm/backends/gemini.pysrc/llm/backends/groq.pysrc/llm/backends/openai.pysrc/llm/caching.pysrc/llm/capabilities.pysrc/llm/credentials.pysrc/llm/history_adapters.pysrc/llm/request_builder.pysrc/llm/structured_output.pysrc/telemetry/reasoning_traces.pysrc/utils/clients.pysrc/utils/summarizer.pysrc/utils/types.pytests/__init__.pytests/conftest.pytests/deriver/test_deriver_processing.pytests/dialectic/test_model_config_usage.pytests/dreamer/test_model_config_usage.pytests/live_llm/README.mdtests/live_llm/__init__.pytests/live_llm/conftest.pytests/live_llm/model_matrix.pytests/live_llm/test_live_anthropic.pytests/live_llm/test_live_gemini.pytests/live_llm/test_live_openai.pytests/unit/llm/conftest.pytests/unit/llm/test_backends/test_anthropic.pytests/unit/llm/test_backends/test_gemini.pytests/unit/llm/test_backends/test_openai.pytests/unit/llm/test_capabilities.pytests/unit/llm/test_credentials.pytests/unit/llm/test_history_adapters.pytests/unit/llm/test_model_config.pytests/unit/llm/test_request_builder.pytests/utils/test_clients.pytests/utils/test_length_finish_reason.pytests/utils/test_summarizer.py
| DERIVER_MODEL_CONFIG__THINKING_BUDGET_TOKENS=1024 | ||
| DERIVER_MAX_INPUT_TOKENS=23000 # Maximum input tokens for deriver | ||
| DERIVER_TEMPERATURE= # Optional temperature override (unset by default) | ||
| DERIVER_MODEL_CONFIG__TEMPERATURE= # Optional temperature override (unset by default) |
There was a problem hiding this comment.
Don't use a blank env assignment to mean "unset".
In .env/shell syntax this still assigns a value, so users copying it verbatim won't get the "leave it unset" behavior the comment describes.
📝 Suggested doc tweak
-DERIVER_MODEL_CONFIG__TEMPERATURE= # Optional temperature override (unset by default)
+# DERIVER_MODEL_CONFIG__TEMPERATURE=0.2 # Optional temperature override📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| DERIVER_MODEL_CONFIG__TEMPERATURE= # Optional temperature override (unset by default) | |
| # DERIVER_MODEL_CONFIG__TEMPERATURE=0.2 # Optional temperature override |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@docs/v3/contributing/configuration.mdx` at line 402, The example line
"DERIVER_MODEL_CONFIG__TEMPERATURE= # Optional temperature override (unset by
default)" currently assigns an empty string; change the docs to avoid a blank
env assignment by either removing the assignment or showing it commented out and
providing a concrete example value — e.g. replace with a note that the variable
is unset by default and show "DERIVER_MODEL_CONFIG__TEMPERATURE=0.7" as an
example, or present "# DERIVER_MODEL_CONFIG__TEMPERATURE=" to indicate omission;
ensure the symbol DERIVER_MODEL_CONFIG__TEMPERATURE is referenced and the text
explicitly says "omit the variable to leave it unset" rather than showing a
blank assignment.
| if not os.getenv("PYTHON_DOTENV_DISABLED"): | ||
| load_dotenv(override=True) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Check the current implementation in src/config.py
head -100 src/config.py | cat -nRepository: plastic-labs/honcho
Length of output: 3593
🏁 Script executed:
# Search for all load_dotenv calls in the codebase
rg "load_dotenv" -A 2 -B 2Repository: plastic-labs/honcho
Length of output: 15102
🏁 Script executed:
# Check if there's any documentation or comments about config precedence
rg "precedence|override|environment" src/config.py -i -A 2 -B 2 | head -50Repository: plastic-labs/honcho
Length of output: 1710
🌐 Web query:
python-dotenv load_dotenv override parameter behavior documentation
💡 Result:
In python-dotenv, load_dotenv(..., override=...) controls whether values from your .env file are allowed to replace variables that are already set in the process environment (os.environ):
override=False(default): existing environment variables win;.enventries are only used for variables that are not already set. [1]override=True: variables loaded from the.envfile override/replace existing environment variables. [1][2]
Sources: python-dotenv reference docs and package docs. [1][2]
Use override=False to respect environment variable precedence.
The override=True parameter allows .env values to replace deployment-time environment variables, violating the intended configuration precedence where environment variables should have highest priority. Change to override=False so that existing environment variables take precedence over .env file entries.
Suggested fix
if not os.getenv("PYTHON_DOTENV_DISABLED"):
- load_dotenv(override=True)
+ load_dotenv(override=False)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if not os.getenv("PYTHON_DOTENV_DISABLED"): | |
| load_dotenv(override=True) | |
| if not os.getenv("PYTHON_DOTENV_DISABLED"): | |
| load_dotenv(override=False) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/config.py` around lines 20 - 21, The dotenv loading currently calls
load_dotenv(override=True) which lets .env values overwrite existing environment
variables; update the call in src/config.py (the load_dotenv invocation guarded
by os.getenv("PYTHON_DOTENV_DISABLED")) to use override=False so that existing
environment variables take precedence over .env file entries, preserving
deployment-time configuration order.
| MODEL_CONFIG: Annotated[ | ||
| ConfiguredModelSettings, | ||
| Field(validation_alias="model_config"), | ||
| ] |
There was a problem hiding this comment.
Partial dialectic.levels.* overrides now break settings bootstrap.
The defaults are defined here, but incoming TOML/env level blocks replace each nested level instead of layering on top of it. Any config that only overrides MAX_TOOL_ITERATIONS / TOOL_CHOICE now fails validation because MODEL_CONFIG is missing, which matches the current AppSettings() pipeline failure.
Also applies to: 639-685
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/config.py` around lines 601 - 604, The issue is that nested MODEL_CONFIG
blocks get fully replaced by TOML/env partial overrides instead of being merged,
causing validation failures when only a few nested keys (e.g.,
MAX_TOOL_ITERATIONS or TOOL_CHOICE) are supplied. Fix by giving MODEL_CONFIG a
proper default instance so Pydantic will merge values rather than drop the whole
object: change the annotation to a concrete field with a default_factory that
constructs a ConfiguredModelSettings (e.g., replace MODEL_CONFIG:
Annotated[ConfiguredModelSettings, Field(validation_alias="model_config")] with
a field like model_config: ConfiguredModelSettings =
Field(default_factory=ConfiguredModelSettings, validation_alias="model_config")
and apply the same pattern for other nested setting blocks referenced in
AppSettings to ensure partial overrides merge correctly.
| params["messages"][-1]["content"] += ( | ||
| f"\n\nRespond with valid JSON matching this schema:\n{schema_json}" | ||
| ) | ||
| params["messages"].append({"role": "assistant", "content": "{"}) |
There was a problem hiding this comment.
Mutating the input message list may cause unintended side effects.
The code modifies params["messages"][-1]["content"] in-place and appends to params["messages"]. Since params["messages"] is assigned from request_messages (the filtered messages list), mutations here could affect the caller's data if they reuse the messages list.
Consider making a deep copy of the messages before modification, or at minimum copying the last message dict before mutating its content.
🛡️ Proposed fix to avoid mutation
use_json_prefill = (
bool(response_format or self._json_mode(extra_params))
and not thinking_budget_tokens
and self._supports_assistant_prefill(model)
)
if use_json_prefill:
if response_format and isinstance(response_format, type):
schema_json = json.dumps(response_format.model_json_schema(), indent=2)
- params["messages"][-1]["content"] += (
+ # Copy the last message to avoid mutating the caller's data
+ last_msg = dict(params["messages"][-1])
+ last_msg["content"] = last_msg["content"] + (
f"\n\nRespond with valid JSON matching this schema:\n{schema_json}"
)
+ params["messages"][-1] = last_msg
params["messages"].append({"role": "assistant", "content": "{"})📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| params["messages"][-1]["content"] += ( | |
| f"\n\nRespond with valid JSON matching this schema:\n{schema_json}" | |
| ) | |
| params["messages"].append({"role": "assistant", "content": "{"}) | |
| # Copy the last message to avoid mutating the caller's data | |
| last_msg = dict(params["messages"][-1]) | |
| last_msg["content"] = last_msg["content"] + ( | |
| f"\n\nRespond with valid JSON matching this schema:\n{schema_json}" | |
| ) | |
| params["messages"][-1] = last_msg | |
| params["messages"].append({"role": "assistant", "content": "{"}) |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/llm/backends/anthropic.py` around lines 82 - 85, The code mutates the
incoming request_messages by directly editing params["messages"][-1]["content"]
and appending to params["messages"], which can produce side effects; fix it by
creating a copy of the messages before modification (e.g., make
params["messages"] = [m.copy() for m in request_messages] or use
copy.deepcopy(request_messages]) and then modify the copied last message content
and append the assistant message to the copied list so request_messages remains
unchanged; update the logic around params["messages"] and any use of
request_messages to operate on the copied list instead.
| use_json_prefill = ( | ||
| response_format is not None | ||
| and not thinking_budget_tokens | ||
| and self._supports_assistant_prefill(model) | ||
| ) |
There was a problem hiding this comment.
Inconsistent JSON mode handling between complete() and stream().
In complete() (line 74-78), use_json_prefill considers self._json_mode(extra_params), but in stream() it only checks response_format is not None. Additionally, extra_params is deleted at line 120, making _json_mode inaccessible.
This means stream() won't apply JSON prefill when extra_params={"json_mode": True} is passed without an explicit response_format.
🔧 Proposed fix for consistency
- async def stream(
- ...
- extra_params: dict[str, Any] | None = None,
- ) -> AsyncIterator[StreamChunk]:
- del max_output_tokens, api_key, api_base, extra_params
+ async def stream(
+ ...
+ extra_params: dict[str, Any] | None = None,
+ ) -> AsyncIterator[StreamChunk]:
+ del max_output_tokens, api_key, api_base
...
use_json_prefill = (
- response_format is not None
+ bool(response_format or self._json_mode(extra_params))
and not thinking_budget_tokens
and self._supports_assistant_prefill(model)
)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/llm/backends/anthropic.py` around lines 149 - 153, The stream()
implementation uses a different JSON-prefill check than complete(): update
stream() to compute use_json_prefill the same way as complete() by calling
self._json_mode(extra_params) (in addition to checking response_format,
thinking_budget_tokens, and self._supports_assistant_prefill(model)), and ensure
extra_params is not deleted before this check (or pass the original extra_params
into _json_mode) so JSON prefill triggers when extra_params={"json_mode": True};
reference the stream() and complete() functions, the use_json_prefill variable,
and the helper _json_mode(extra_params) when making the change.
| def attempt_structured_output_repair( | ||
| content: object, | ||
| response_model: type[BaseModel], | ||
| model: str, | ||
| ) -> BaseModel | None: | ||
| if not isinstance(content, str): | ||
| return None | ||
| try: | ||
| return repair_response_model_json(content, response_model, model) | ||
| except StructuredOutputError: | ||
| return None |
There was a problem hiding this comment.
Catch ValidationError during repair so failure_policy still applies.
repair_response_model_json() raises ValidationError for irreparable string payloads, but attempt_structured_output_repair() only swallows StructuredOutputError. That means a bad JSON string can skip the repair_then_empty / repair_then_raise branches and bubble a raw validation error instead.
🩹 Suggested fix
def attempt_structured_output_repair(
content: object,
response_model: type[BaseModel],
model: str,
) -> BaseModel | None:
if not isinstance(content, str):
return None
try:
return repair_response_model_json(content, response_model, model)
- except StructuredOutputError:
+ except (StructuredOutputError, ValidationError):
return NoneAlso applies to: 101-132
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/llm/structured_output.py` around lines 82 - 92, The repair path currently
only catches StructuredOutputError in attempt_structured_output_repair when
calling repair_response_model_json, so pydantic ValidationError can still bubble
and bypass failure_policy; import and catch pydantic's ValidationError (e.g.,
from pydantic import ValidationError) alongside StructuredOutputError and return
None on either exception so the failure_policy branches still run, and make the
identical change to the other equivalent try/except block later in the file that
follows the same pattern.
| def _repair_response_model_json( # pyright: ignore[reportUnusedFunction] | ||
| raw_content: str, | ||
| response_model: type[BaseModel], | ||
| model: str, | ||
| ) -> BaseModel: | ||
| """Attempt to repair truncated/malformed JSON and validate against response_model. | ||
|
|
||
| Used by all provider paths when structured output parsing fails. | ||
| For PromptRepresentation, falls back to an empty instance. | ||
| For other models, re-raises ValidationError. | ||
| """ | ||
| try: | ||
| final = validate_and_repair_json(raw_content) | ||
| repaired_data = json.loads(final) | ||
|
|
||
| # Schema-aware repair for PromptRepresentation | ||
| if ( | ||
| response_model is PromptRepresentation | ||
| and "deductive" in repaired_data | ||
| and isinstance(repaired_data["deductive"], list) | ||
| ): | ||
| for i, item in enumerate(repaired_data["deductive"]): | ||
| if isinstance(item, dict): | ||
| if "conclusion" not in item and "premises" in item: | ||
| logger.warning( | ||
| f"Deductive observation {i} missing conclusion, adding placeholder" | ||
| ) | ||
| if item["premises"]: | ||
| item["conclusion"] = ( | ||
| f"[Incomplete reasoning from premises: {item['premises'][0][:100]}...]" | ||
| ) | ||
| else: | ||
| item["conclusion"] = ( | ||
| "[Incomplete reasoning - conclusion missing]" | ||
| ) | ||
| if "premises" not in item: | ||
| item["premises"] = [] | ||
|
|
||
| final = json.dumps(repaired_data) | ||
| except (json.JSONDecodeError, KeyError, TypeError, ValueError) as repair_err: | ||
| final = "" | ||
| logger.warning( | ||
| f"Could not perform JSON repair on truncated output from {model}: {repair_err}" | ||
| ) | ||
|
|
||
| try: | ||
| return response_model.model_validate_json(final) | ||
| except ValidationError as ve: | ||
| logger.error( | ||
| f"Validation error after repair of truncated output from {model}: {ve}" | ||
| ) | ||
| logger.debug(f"Problematic JSON: {final}") | ||
|
|
||
| if response_model is PromptRepresentation: | ||
| logger.warning( | ||
| "Using fallback empty Representation due to truncated output" | ||
| ) | ||
| return PromptRepresentation(explicit=[]) | ||
| else: | ||
| raise |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if _repair_response_model_json is called anywhere
rg -n "_repair_response_model_json" --type=py
# Check for similar repair functions
rg -n "repair.*json" --type=py -lRepository: plastic-labs/honcho
Length of output: 346
🏁 Script executed:
#!/bin/bash
# Check for repair_response_model_json (without underscore) in structured_output.py
echo "=== Checking for repair_response_model_json in structured_output.py ==="
rg -n "def repair_response_model_json" src/llm/structured_output.py
# Check all occurrences of repair functions
echo -e "\n=== All repair function definitions ==="
rg -n "^def.*repair.*json" --type=py
# Check git history for _repair_response_model_json
echo -e "\n=== Git history (last 5 commits) for the function ==="
cd src/utils && git log --oneline -5 -- clients.pyRepository: plastic-labs/honcho
Length of output: 1107
🏁 Script executed:
#!/bin/bash
# Check the actual content around the function in structured_output.py
echo "=== Checking structured_output.py for repair functions ==="
rg -n "repair.*json" src/llm/structured_output.py -A 3Repository: plastic-labs/honcho
Length of output: 734
🏁 Script executed:
#!/bin/bash
# Extract the full implementation of repair_response_model_json from structured_output.py
echo "=== repair_response_model_json from structured_output.py ==="
sed -n '25,88p' src/llm/structured_output.py
# Extract the full implementation of _repair_response_model_json from clients.py
echo -e "\n=== _repair_response_model_json from clients.py ==="
sed -n '1542,1601p' src/utils/clients.pyRepository: plastic-labs/honcho
Length of output: 4793
🏁 Script executed:
#!/bin/bash
# Check where repair_response_model_json is called from
echo "=== Calls to repair_response_model_json ==="
rg -n "repair_response_model_json" --type=py -B 2 -A 2
# Check if _repair_response_model_json was ever imported or used
echo -e "\n=== Any imports or references to _repair_response_model_json ==="
rg "_repair_response_model_json" --type=pyRepository: plastic-labs/honcho
Length of output: 4750
Remove _repair_response_model_json – it is dead code superseded by repair_response_model_json in src/llm/structured_output.py.
The underscore-prefixed function is never imported or called anywhere in the codebase. All LLM backends (anthropic, gemini, groq, openai) use the public repair_response_model_json from src/llm/structured_output.py instead. The # pyright: ignore[reportUnusedFunction] comment confirms this is intentionally suppressed dead code and should be deleted.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/utils/clients.py` around lines 1542 - 1601, Remove the dead helper
_repair_response_model_json (and its pyright ignore) because the project uses
the public repair_response_model_json instead; delete the entire function
definition for _repair_response_model_json and any exclusively-used local
references (and clean up now-unused imports if they become unused after
removal), ensuring all callers use repair_response_model_json (from
structured_output) instead.
| def require_provider_key(model_spec: LiveModelSpec) -> None: | ||
| key_present = { | ||
| "anthropic": bool(settings.LLM.ANTHROPIC_API_KEY), | ||
| "openai": bool(settings.LLM.OPENAI_API_KEY), | ||
| "gemini": bool(settings.LLM.GEMINI_API_KEY), | ||
| }[model_spec.provider] | ||
| if not key_present: | ||
| pytest.skip(f"Missing API key for live provider {model_spec.provider}") |
There was a problem hiding this comment.
Handle Groq explicitly in the live provider-key guard.
This helper only covers Anthropic/OpenAI/Gemini. A Groq LiveModelSpec will currently raise KeyError instead of producing a clear skip/failure path.
🔧 Suggested fix
def require_provider_key(model_spec: LiveModelSpec) -> None:
key_present = {
"anthropic": bool(settings.LLM.ANTHROPIC_API_KEY),
"openai": bool(settings.LLM.OPENAI_API_KEY),
"gemini": bool(settings.LLM.GEMINI_API_KEY),
- }[model_spec.provider]
+ "groq": bool(settings.LLM.GROQ_API_KEY),
+ }.get(model_spec.provider)
+ if key_present is None:
+ raise AssertionError(f"Unsupported live provider {model_spec.provider}")
if not key_present:
pytest.skip(f"Missing API key for live provider {model_spec.provider}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def require_provider_key(model_spec: LiveModelSpec) -> None: | |
| key_present = { | |
| "anthropic": bool(settings.LLM.ANTHROPIC_API_KEY), | |
| "openai": bool(settings.LLM.OPENAI_API_KEY), | |
| "gemini": bool(settings.LLM.GEMINI_API_KEY), | |
| }[model_spec.provider] | |
| if not key_present: | |
| pytest.skip(f"Missing API key for live provider {model_spec.provider}") | |
| def require_provider_key(model_spec: LiveModelSpec) -> None: | |
| key_present = { | |
| "anthropic": bool(settings.LLM.ANTHROPIC_API_KEY), | |
| "openai": bool(settings.LLM.OPENAI_API_KEY), | |
| "gemini": bool(settings.LLM.GEMINI_API_KEY), | |
| "groq": bool(settings.LLM.GROQ_API_KEY), | |
| }.get(model_spec.provider) | |
| if key_present is None: | |
| raise AssertionError(f"Unsupported live provider {model_spec.provider}") | |
| if not key_present: | |
| pytest.skip(f"Missing API key for live provider {model_spec.provider}") |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/live_llm/conftest.py` around lines 38 - 45, The require_provider_key
helper raises KeyError for model_spec.provider == "groq" because the mapping
only includes "anthropic"/"openai"/"gemini"; update require_provider_key to
include an explicit "groq": bool(settings.LLM.GROQ_API_KEY) entry (and
optionally handle unknown providers by calling pytest.skip with a clear message)
so that require_provider_key and its use of model_spec.provider no longer blow
up for Groq LiveModelSpec.
| def test_env_template_uses_nested_model_config_keys() -> None: | ||
| env_template_path = Path(__file__).resolve().parents[3] / ".env.template" | ||
| env_template = env_template_path.read_text() | ||
|
|
||
| assert "DERIVER_MODEL_CONFIG__MODEL" in env_template | ||
| assert "DIALECTIC_LEVELS__minimal__MODEL_CONFIG__MODEL" in env_template | ||
| assert "SUMMARY_MODEL_CONFIG__MODEL" in env_template | ||
| assert "DREAM_MODEL_CONFIG__MODEL" in env_template | ||
| assert "DREAM_DEDUCTION_MODEL_CONFIG__MODEL" in env_template | ||
|
|
||
| assert "DERIVER_PROVIDER=" not in env_template | ||
| assert "SUMMARY_PROVIDER=" not in env_template | ||
| assert "DIALECTIC_LEVELS__minimal__PROVIDER=" not in env_template | ||
| assert "DREAM_PROVIDER=" not in env_template |
There was a problem hiding this comment.
Use the DIALECTIC__LEVELS__... env path here.
This test is currently asserting DIALECTIC_LEVELS__..., so it will still pass if .env.template uses the wrong override key shape. The dialectic nested settings need the extra __ hop after DIALECTIC.
🧪 Suggested fix
- assert "DIALECTIC_LEVELS__minimal__MODEL_CONFIG__MODEL" in env_template
+ assert "DIALECTIC__LEVELS__minimal__MODEL_CONFIG__MODEL" in env_template
@@
- assert "DIALECTIC_LEVELS__minimal__PROVIDER=" not in env_template
+ assert "DIALECTIC__LEVELS__minimal__PROVIDER=" not in env_templateBased on learnings: "Use new per-level provider configuration under settings.DIALECTIC.LEVELS. In CI, overrides must use nested environment variables following the double-underscore naming convention: DIALECTIC__LEVELS____PROVIDER."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/llm/test_model_config.py` around lines 254 - 267, The test
test_env_template_uses_nested_model_config_keys is asserting the wrong env var
prefix for nested dialectic settings; update the assertions to expect the
double-underscore nested form (DIALECTIC__LEVELS__minimal__MODEL_CONFIG__MODEL
and DIALECTIC__LEVELS__minimal__PROVIDER=) instead of DIALECTIC_LEVELS__... so
the test validates the correct nested override key shape for
settings.DIALECTIC.LEVELS; change both occurences in that test to use
DIALECTIC__LEVELS__minimal__... to match the new per-level provider
configuration naming convention.
| async def test_gemini_explicit_budget_adjusts_transport_max_tokens( | ||
| fake_backend: FakeBackend, | ||
| ) -> None: | ||
| config = ModelConfig( | ||
| model="gemini-2.5-flash", | ||
| transport="gemini", | ||
| thinking_budget_tokens=256, | ||
| ) | ||
|
|
||
| await execute_completion( | ||
| fake_backend, | ||
| config, | ||
| messages=[{"role": "user", "content": "Hello"}], | ||
| max_tokens=100, | ||
| ) | ||
|
|
||
| call = fake_backend.calls[0] | ||
| assert call["max_output_tokens"] == 100 | ||
| assert call["max_tokens"] == 356 | ||
| assert call["thinking_budget_tokens"] == 256 |
There was a problem hiding this comment.
Missing @pytest.mark.asyncio decorator on async tests.
All four test functions in this file are async def but lack the @pytest.mark.asyncio decorator. Without it, pytest will not properly await the coroutines, causing tests to appear to pass while the actual async code never executes.
🐛 Proposed fix
+import pytest
+
from pydantic import BaseModel
from src.config import ModelConfig
from src.llm.caching import PromptCachePolicy
from src.llm.request_builder import execute_completion
from tests.unit.llm.conftest import FakeBackend
class SampleResponse(BaseModel):
answer: str
+@pytest.mark.asyncio
async def test_gemini_explicit_budget_adjusts_transport_max_tokens(
fake_backend: FakeBackend,
) -> None:Apply the same decorator to all four test functions.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/unit/llm/test_request_builder.py` around lines 13 - 32, The async test
functions (e.g., test_gemini_explicit_budget_adjusts_transport_max_tokens and
the other three async def tests in tests/unit/llm/test_request_builder.py) are
missing the `@pytest.mark.asyncio` decorator; add `@pytest.mark.asyncio` above each
async test function and ensure pytest.mark is imported (import pytest) at the
top of the file so pytest will properly await the coroutines during test runs.
| self.max_batch_size = 2048 # Same as OpenAI | ||
| else: # openai | ||
| if api_key is None: | ||
| api_key = settings.LLM.OPENAI_API_KEY |
There was a problem hiding this comment.
this should accept the OPENAI_BASE_URL so we can have configurable openai-compatible embedding providers
| model_config = SettingsConfigDict(env_prefix="DERIVER_", extra="ignore") # pyright: ignore | ||
| class DeriverSettings(HonchoSettings): | ||
| model_config = SettingsConfigDict( # pyright: ignore | ||
| env_prefix="DERIVER_", env_nested_delimiter="__", extra="ignore" |
There was a problem hiding this comment.
one thing to note is that env_nested_delimiter="__" doesn't merge overrides with the default model config, so for example setting DERIVER_MODEL_CONFIG__THINKING_BUDGET_TOKENS=2048 replaces the whole object instead of patching the default deriver config. this might be intentional but just flagging
| Raises: | ||
| ValueError: If provider is not configured | ||
| """ | ||
| runtime_model_config = _resolve_runtime_model_config(model_config) |
There was a problem hiding this comment.
i think there's a regression here. previously we resolved reasoning params per-attempt, but now we are resolving them upfront. if our final retry switches transports, we can end up passing invalid params to the fallback transport
| """Best-effort local cache for Gemini cached-content handles.""" | ||
|
|
||
| def __init__(self) -> None: | ||
| self._handles: dict[str, GeminiCacheHandle] = {} |
There was a problem hiding this comment.
looks like this grows unbounded
| except Exception: | ||
| fallback_response = await self._create_structured_response( | ||
| params=params, | ||
| response_format=response_format, | ||
| ) | ||
| content = self._parse_or_repair_structured_content( | ||
| fallback_response, | ||
| response_format, | ||
| model, | ||
| ) |
There was a problem hiding this comment.
this is broad, I think we want some errors to propagate normally instead of trying to repair in all cases
| ): | ||
| system_messages.append(message["content"]) | ||
| else: | ||
| non_system_messages.append(message) |
There was a problem hiding this comment.
we should copy the message dict before storing it since we modify it later via params["messages"][-1]["content"] += "\n\nRespond with valid JSON matching this schema:...otherwise we would change the caller's original message list
Summary by CodeRabbit
Release Notes
New Features
Configuration
Tests