Conversation
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the Use the checkbox below for a quick retry:
WalkthroughAdds optional generation controls ( Changes
Sequence Diagram(s)sequenceDiagram
participant Deriver as Deriver
participant Clients as Clients.honcho_llm_call
participant Provider as LLM Provider (OpenAI / vLLM)
participant Observers as Observers / Storage
Deriver->>Clients: call honcho_llm_call(messages, top_p, repetition_penalty, ...)
Clients->>Provider: dispatch request (include top_p in openai_params / repetition_penalty in extra_body)
Provider-->>Clients: stream/complete response (tokens, finish_reason, raw_json)
Clients->>Deriver: return response (messages, observations)
Clients->>Observers: log verbose response (finish_reasons, token counts, thinking, raw output)
Deriver->>Observers: persist observations / synthesized output
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 3 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Code reviewNo issues found. Checked for bugs and CLAUDE.md compliance. |
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/utils/clients.py`:
- Around line 575-578: The streaming path in honcho_llm_call_inner is not adding
the sampling controls to the params dict before calling
handle_streaming_response, so top_p, repetition_penalty and no_repeat_ngram_size
are dropped; update honcho_llm_call_inner to add params["top_p"] = top_p,
params["repetition_penalty"] = repetition_penalty and
params["no_repeat_ngram_size"] = no_repeat_ngram_size (mirroring how temperature
is added) before the streaming call so handle_streaming_response receives them,
and ensure you do not pass no_repeat_ngram_size to vLLM providers via extra_body
(add a short inline comment in the vLLM branch documenting that vLLM’s
OpenAI-compatible API does not support no_repeat_ngram_size).
dr-frmr
left a comment
There was a problem hiding this comment.
made my review in the form of a commit, happy with all of this now
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/utils/clients.py (1)
1283-1343:⚠️ Potential issue | 🟡 MinorDocstring missing new parameters.
Please addtop_pandrepetition_penaltyto the Args section for the publichoncho_llm_callAPI so usage stays accurate.As per coding guidelines: “Use Google style docstrings in Python code.”
🧹 Nitpick comments (1)
src/config.py (1)
253-255: Add bounds to avoid invalid sampler settings.
Without constraints, misconfigured values can slip through to providers. Consider mirroring other fields’ validation style.Proposed adjustment
- TOP_P: float | None = None - REPETITION_PENALTY: float | None = None + TOP_P: Annotated[float | None, Field(ge=0.0, le=1.0)] = None + REPETITION_PENALTY: Annotated[float | None, Field(ge=0.0)] = None
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/deriver/deriver.py`:
- Around line 180-208: The logging emits raw user-derived content at WARNING
(e.g., logger.warning calls that print response.content.explicit,
response.thinking_content, and response.content.model_dump_json()), which can
leak PII; wrap those content-bearing warnings behind the same gate used
elsewhere (settings.DERIVER.LOG_OBSERVATIONS) or downgrade them to DEBUG and
ensure only metadata (response.finish_reasons, response.output_tokens,
response.input_tokens, len(...) of lists) remains at WARNING; specifically
modify the blocks referencing response.thinking_content,
response.content.explicit, and response.content.model_dump_json() so they check
settings.DERIVER.LOG_OBSERVATIONS (or change to logger.debug) before emitting
raw observation text.
🧹 Nitpick comments (2)
src/config.py (2)
254-255: Add validation bounds forTOP_PandREPETITION_PENALTY.Other numeric settings in this file use
Annotated[..., Field(...)]with bounds (e.g.,TEMPERATUREaside,TRACES_SAMPLE_RATEhasge=0.0, le=1.0). These LLM sampling parameters have well-known valid ranges —top_pis typically[0.0, 1.0]andrepetition_penaltyis typically> 0. Without constraints, invalid values will only surface as opaque API errors downstream.Proposed fix
- TOP_P: float | None = None - REPETITION_PENALTY: float | None = None + TOP_P: Annotated[float | None, Field(default=None, ge=0.0, le=1.0)] = None + REPETITION_PENALTY: Annotated[float | None, Field(default=None, gt=0.0)] = None
429-440: Potential mixed-case key collision during merge with TOML-sourced overrides.When a user provides level overrides via TOML (lowercase keys like
provider), they're merged with_DEFAULT_LEVELS(UPPERCASE keys likePROVIDER) via{**default_config, **user_config}. This produces a dict with both casings present (e.g.,{"PROVIDER": "google", "provider": "openai"}). Pydantic resolves this correctly today (alias wins), but it's fragile and could surprise future maintainers.Consider normalizing keys to a single casing before merging:
Proposed normalization
elif isinstance(user_config, dict): # Merge user config over defaults - merged[level_name] = {**default_config, **user_config} + # Normalize user keys to UPPER to match default_config casing + normalized = {k.upper(): v for k, v in user_config.items()} + merged[level_name] = {**default_config, **normalized}
| logger.warning( | ||
| "LLM response debug — finish_reasons=%s, output_tokens=%d, input_tokens=%d, " | ||
| + "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d", | ||
| response.finish_reasons, | ||
| response.output_tokens, | ||
| response.input_tokens, | ||
| len(response.content.explicit), | ||
| len(message_ids), | ||
| len(formatted_messages), | ||
| ) | ||
| if response.thinking_content: | ||
| logger.warning( | ||
| "LLM thinking content present (%d chars): %.500s", | ||
| len(response.thinking_content), | ||
| response.thinking_content, | ||
| ) | ||
| if response.content.explicit: | ||
| logger.warning( | ||
| "Raw explicit observations before conversion: %s", | ||
| [obs.content[:200] for obs in response.content.explicit], | ||
| ) | ||
| else: | ||
| logger.warning( | ||
| "LLM returned no explicit observations. Raw content type: %s", | ||
| type(response.content), | ||
| ) | ||
| logger.warning( | ||
| "Raw LLM output (parsed model JSON): %s", response.content.model_dump_json() | ||
| ) |
There was a problem hiding this comment.
Ungated logging of observation content may leak PII into production logs.
Lines 196-208 log raw observation text and the full model_dump_json() at WARNING level with no guard. These contain user-derived content. Contrast with the existing logging at lines 248-261, which is gated behind settings.DERIVER.LOG_OBSERVATIONS. This new block should apply a similar gate around the content-bearing lines, or at minimum reduce them to DEBUG level, to avoid writing user data into production log aggregators.
Token counts and finish reasons (lines 180-189) are safe metadata and can remain at WARNING.
Proposed fix — gate content logging
logger.warning(
"LLM response debug — finish_reasons=%s, output_tokens=%d, input_tokens=%d, "
- + "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d",
+ "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d",
response.finish_reasons,
response.output_tokens,
response.input_tokens,
len(response.content.explicit),
len(message_ids),
len(formatted_messages),
)
- if response.thinking_content:
- logger.warning(
- "LLM thinking content present (%d chars): %.500s",
- len(response.thinking_content),
- response.thinking_content,
- )
- if response.content.explicit:
- logger.warning(
- "Raw explicit observations before conversion: %s",
- [obs.content[:200] for obs in response.content.explicit],
- )
- else:
- logger.warning(
- "LLM returned no explicit observations. Raw content type: %s",
- type(response.content),
- )
- logger.warning(
- "Raw LLM output (parsed model JSON): %s", response.content.model_dump_json()
- )
+ if settings.DERIVER.LOG_OBSERVATIONS:
+ if response.thinking_content:
+ logger.warning(
+ "LLM thinking content present (%d chars): %.500s",
+ len(response.thinking_content),
+ response.thinking_content,
+ )
+ if response.content.explicit:
+ logger.warning(
+ "Raw explicit observations before conversion: %s",
+ [obs.content[:200] for obs in response.content.explicit],
+ )
+ else:
+ logger.warning(
+ "LLM returned no explicit observations. Raw content type: %s",
+ type(response.content),
+ )
+ logger.warning(
+ "Raw LLM output (parsed model JSON): %s",
+ response.content.model_dump_json(),
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| logger.warning( | |
| "LLM response debug — finish_reasons=%s, output_tokens=%d, input_tokens=%d, " | |
| + "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d", | |
| response.finish_reasons, | |
| response.output_tokens, | |
| response.input_tokens, | |
| len(response.content.explicit), | |
| len(message_ids), | |
| len(formatted_messages), | |
| ) | |
| if response.thinking_content: | |
| logger.warning( | |
| "LLM thinking content present (%d chars): %.500s", | |
| len(response.thinking_content), | |
| response.thinking_content, | |
| ) | |
| if response.content.explicit: | |
| logger.warning( | |
| "Raw explicit observations before conversion: %s", | |
| [obs.content[:200] for obs in response.content.explicit], | |
| ) | |
| else: | |
| logger.warning( | |
| "LLM returned no explicit observations. Raw content type: %s", | |
| type(response.content), | |
| ) | |
| logger.warning( | |
| "Raw LLM output (parsed model JSON): %s", response.content.model_dump_json() | |
| ) | |
| logger.warning( | |
| "LLM response debug — finish_reasons=%s, output_tokens=%d, input_tokens=%d, " | |
| "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d", | |
| response.finish_reasons, | |
| response.output_tokens, | |
| response.input_tokens, | |
| len(response.content.explicit), | |
| len(message_ids), | |
| len(formatted_messages), | |
| ) | |
| if settings.DERIVER.LOG_OBSERVATIONS: | |
| if response.thinking_content: | |
| logger.warning( | |
| "LLM thinking content present (%d chars): %.500s", | |
| len(response.thinking_content), | |
| response.thinking_content, | |
| ) | |
| if response.content.explicit: | |
| logger.warning( | |
| "Raw explicit observations before conversion: %s", | |
| [obs.content[:200] for obs in response.content.explicit], | |
| ) | |
| else: | |
| logger.warning( | |
| "LLM returned no explicit observations. Raw content type: %s", | |
| type(response.content), | |
| ) | |
| logger.warning( | |
| "Raw LLM output (parsed model JSON): %s", | |
| response.content.model_dump_json(), | |
| ) |
🤖 Prompt for AI Agents
In `@src/deriver/deriver.py` around lines 180 - 208, The logging emits raw
user-derived content at WARNING (e.g., logger.warning calls that print
response.content.explicit, response.thinking_content, and
response.content.model_dump_json()), which can leak PII; wrap those
content-bearing warnings behind the same gate used elsewhere
(settings.DERIVER.LOG_OBSERVATIONS) or downgrade them to DEBUG and ensure only
metadata (response.finish_reasons, response.output_tokens,
response.input_tokens, len(...) of lists) remains at WARNING; specifically
modify the blocks referencing response.thinking_content,
response.content.explicit, and response.content.model_dump_json() so they check
settings.DERIVER.LOG_OBSERVATIONS (or change to logger.debug) before emitting
raw observation text.
|
Test |
Update deriver prompt and add additional sampling parameters for new Neuromancer models.
Summary by CodeRabbit
New Features
Documentation
Improvements