Update Deriver + Sampling Parameters by 3un01a · Pull Request #370 · plastic-labs/honcho

3un01a · 2026-02-04T20:39:34Z

Update deriver prompt and add additional sampling parameters for new Neuromancer models.

Summary by CodeRabbit

New Features
- Added TOP_P and REPETITION_PENALTY controls for finer LLM generation tuning; these now apply across single-call, streaming, and tool-loop workflows.
- Replaced fact-extraction prompt with a structured, step-by-step "molecular facts" workflow for more precise, atomic claims.
Documentation
- Updated configuration examples and docs to document the new deriver parameters and template entries.
Improvements
- Expanded verbose LLM response logging; removed a small set of legacy debug logs.

coderabbitai · 2026-02-04T20:40:01Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

Walkthrough

Adds optional generation controls (top_p, repetition_penalty) to deriver configuration and threads them through LLM call paths; merges partial per-level Dialectic settings with defaults and validates token budgets; replaces the facts-extraction prompt with a structured "molecular facts" workflow and removes two debug logs.

Changes

Cohort / File(s)	Summary
Configuration & Defaults `src/config.py`	Add `TOP_P` and `REPETITION_PENALTY` to `DeriverSettings`; introduce `DialecticSettings._DEFAULT_LEVELS`, merge-before-validation of partial `LEVELS`, add token/limit defaults and validators ensuring budgets and presence of all reasoning levels.
LLM Client Plumbing `src/utils/clients.py`	Add `top_p` and `repetition_penalty` parameters across `honcho_llm_call`, `honcho_llm_call_inner`, `_stream_final_response`, `_execute_tool_loop`, `_call_with_messages` and propagate them for OpenAI/vLLM flows (including provider-specific forwarding via `openai_params`/`extra_body`).
Deriver Integration & Logging `src/deriver/deriver.py`	Forward `top_p` and `repetition_penalty` in minimal deriver batch single-LLM call path; add verbose LLM response logging (finish reasons, token counts, thinking content, raw outputs) including when no observations are produced.
Prompts `src/deriver/prompts.py`	Replace explicit-facts prompt with structured "molecular facts" extraction process (Identify Candidate Claims → Verify Atomicity → Decontextualize → Strip Excess Detail) and require minimal fact+source per line.
Env, Examples & Docs `.env.template`, `config.toml.example`, `docs/v3/contributing/configuration.mdx`	Expose `DERIVER_TOP_P` and `DERIVER_REPETITION_PENALTY` placeholders; update commented example values for `TEMPERATURE` and add commented `TOP_P`/`REPETITION_PENALTY` examples; document new env vars.
Logging Cleanup `src/utils/agent_tools.py`	Removed two debug logging statements around `execute_tool` invocation/results.

Sequence Diagram(s)

sequenceDiagram
    participant Deriver as Deriver
    participant Clients as Clients.honcho_llm_call
    participant Provider as LLM Provider (OpenAI / vLLM)
    participant Observers as Observers / Storage

    Deriver->>Clients: call honcho_llm_call(messages, top_p, repetition_penalty, ...)
    Clients->>Provider: dispatch request (include top_p in openai_params / repetition_penalty in extra_body)
    Provider-->>Clients: stream/complete response (tokens, finish_reason, raw_json)
    Clients->>Deriver: return response (messages, observations)
    Clients->>Observers: log verbose response (finish_reasons, token counts, thinking, raw output)
    Deriver->>Observers: persist observations / synthesized output

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

feat: rework langfuse setup to work more cleanly; fix bug in dream scheduling #253 — Modifies src/utils/clients.py's honcho_llm_call wrapping and observation handling; overlaps with client-call changes here.
Peer Cards #180 — Updates LLM call plumbing and signatures in src/utils/clients.py; related to parameter propagation.
feat: add tool_choice to dialectic level settings, fix: extra-high label to max #326 — Alters DialecticSettings/LEVELS handling and per-level settings; closely related to default-levels and validation changes.

Suggested reviewers

Rajat-Ahuja1997
VVoruganti
danibalcells

Poem

🐇 I hop through code with sampler bright,
top_p tucked in, penalty light,
Molecules of facts I prune and pare,
Atomic claims, tidy and fair.
Logs sing out the model's thought at night.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 47.06% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check	❓ Inconclusive	Additional changes include config handling improvements (DialecticSettings defaults), verbose logging, and removal of debug statements, which extend beyond the stated objectives but support the primary changes.	Consider clarifying whether config flexibility improvements and additional logging are intentional or should be separated into a follow-up PR for clearer scope management.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: updating deriver and sampling parameters, which aligns with the primary objectives of adding sampling parameters and updating the deriver prompt.
Linked Issues check	✅ Passed	The PR addresses both objectives from AI-130: updates the deriver prompt (src/deriver/prompts.py) and adds sampling parameters (TOP_P, REPETITION_PENALTY) throughout the codebase.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch yuya/AI-130

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-02-04T20:43:35Z

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/utils/clients.py`:
- Around line 575-578: The streaming path in honcho_llm_call_inner is not adding
the sampling controls to the params dict before calling
handle_streaming_response, so top_p, repetition_penalty and no_repeat_ngram_size
are dropped; update honcho_llm_call_inner to add params["top_p"] = top_p,
params["repetition_penalty"] = repetition_penalty and
params["no_repeat_ngram_size"] = no_repeat_ngram_size (mirroring how temperature
is added) before the streaming call so handle_streaming_response receives them,
and ensure you do not pass no_repeat_ngram_size to vLLM providers via extra_body
(add a short inline comment in the vLLM branch documenting that vLLM’s
OpenAI-compatible API does not support no_repeat_ngram_size).

src/utils/clients.py

dr-frmr

made my review in the form of a commit, happy with all of this now

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

src/utils/clients.py (1)

1283-1343: ⚠️ Potential issue | 🟡 Minor

Docstring missing new parameters.
Please add top_p and repetition_penalty to the Args section for the public honcho_llm_call API so usage stays accurate.

As per coding guidelines: “Use Google style docstrings in Python code.”

🧹 Nitpick comments (1)

src/config.py (1)
253-255: Add bounds to avoid invalid sampler settings.
Without constraints, misconfigured values can slip through to providers. Consider mirroring other fields’ validation style.
Proposed adjustment
-    TOP_P: float | None = None
-    REPETITION_PENALTY: float | None = None
+    TOP_P: Annotated[float | None, Field(ge=0.0, le=1.0)] = None
+    REPETITION_PENALTY: Annotated[float | None, Field(ge=0.0)] = None

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@src/deriver/deriver.py`:
- Around line 180-208: The logging emits raw user-derived content at WARNING
(e.g., logger.warning calls that print response.content.explicit,
response.thinking_content, and response.content.model_dump_json()), which can
leak PII; wrap those content-bearing warnings behind the same gate used
elsewhere (settings.DERIVER.LOG_OBSERVATIONS) or downgrade them to DEBUG and
ensure only metadata (response.finish_reasons, response.output_tokens,
response.input_tokens, len(...) of lists) remains at WARNING; specifically
modify the blocks referencing response.thinking_content,
response.content.explicit, and response.content.model_dump_json() so they check
settings.DERIVER.LOG_OBSERVATIONS (or change to logger.debug) before emitting
raw observation text.

🧹 Nitpick comments (2)

src/config.py (2)
254-255: Add validation bounds for TOP_P and REPETITION_PENALTY.

Other numeric settings in this file use Annotated[..., Field(...)] with bounds (e.g., TEMPERATURE aside, TRACES_SAMPLE_RATE has ge=0.0, le=1.0). These LLM sampling parameters have well-known valid ranges — top_p is typically [0.0, 1.0] and repetition_penalty is typically > 0. Without constraints, invalid values will only surface as opaque API errors downstream.
Proposed fix
-    TOP_P: float | None = None
-    REPETITION_PENALTY: float | None = None
+    TOP_P: Annotated[float | None, Field(default=None, ge=0.0, le=1.0)] = None
+    REPETITION_PENALTY: Annotated[float | None, Field(default=None, gt=0.0)] = None
429-440: Potential mixed-case key collision during merge with TOML-sourced overrides.

When a user provides level overrides via TOML (lowercase keys like provider), they're merged with _DEFAULT_LEVELS (UPPERCASE keys like PROVIDER) via {**default_config, **user_config}. This produces a dict with both casings present (e.g., {"PROVIDER": "google", "provider": "openai"}). Pydantic resolves this correctly today (alias wins), but it's fragile and could surprise future maintainers.

Consider normalizing keys to a single casing before merging:
Proposed normalization
             elif isinstance(user_config, dict):
                 # Merge user config over defaults
-                merged[level_name] = {**default_config, **user_config}
+                # Normalize user keys to UPPER to match default_config casing
+                normalized = {k.upper(): v for k, v in user_config.items()}
+                merged[level_name] = {**default_config, **normalized}

coderabbitai · 2026-02-06T19:12:16Z

src/deriver/deriver.py

+        logger.warning(
+            "LLM response debug — finish_reasons=%s, output_tokens=%d, input_tokens=%d, "
+            + "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d",
+            response.finish_reasons,
+            response.output_tokens,
+            response.input_tokens,
+            len(response.content.explicit),
+            len(message_ids),
+            len(formatted_messages),
+        )
+        if response.thinking_content:
+            logger.warning(
+                "LLM thinking content present (%d chars): %.500s",
+                len(response.thinking_content),
+                response.thinking_content,
+            )
+        if response.content.explicit:
+            logger.warning(
+                "Raw explicit observations before conversion: %s",
+                [obs.content[:200] for obs in response.content.explicit],
+            )
+        else:
+            logger.warning(
+                "LLM returned no explicit observations. Raw content type: %s",
+                type(response.content),
+            )
+        logger.warning(
+            "Raw LLM output (parsed model JSON): %s", response.content.model_dump_json()
+        )


⚠️ Potential issue | 🟠 Major

Ungated logging of observation content may leak PII into production logs.

Lines 196-208 log raw observation text and the full model_dump_json() at WARNING level with no guard. These contain user-derived content. Contrast with the existing logging at lines 248-261, which is gated behind settings.DERIVER.LOG_OBSERVATIONS. This new block should apply a similar gate around the content-bearing lines, or at minimum reduce them to DEBUG level, to avoid writing user data into production log aggregators.

Token counts and finish reasons (lines 180-189) are safe metadata and can remain at WARNING.

Proposed fix — gate content logging

logger.warning( "LLM response debug — finish_reasons=%s, output_tokens=%d, input_tokens=%d, " - + "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d", + "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d", response.finish_reasons, response.output_tokens, response.input_tokens, len(response.content.explicit), len(message_ids), len(formatted_messages), ) - if response.thinking_content: - logger.warning( - "LLM thinking content present (%d chars): %.500s", - len(response.thinking_content), - response.thinking_content, - ) - if response.content.explicit: - logger.warning( - "Raw explicit observations before conversion: %s", - [obs.content[:200] for obs in response.content.explicit], - ) - else: - logger.warning( - "LLM returned no explicit observations. Raw content type: %s", - type(response.content), - ) - logger.warning( - "Raw LLM output (parsed model JSON): %s", response.content.model_dump_json() - ) + if settings.DERIVER.LOG_OBSERVATIONS: + if response.thinking_content: + logger.warning( + "LLM thinking content present (%d chars): %.500s", + len(response.thinking_content), + response.thinking_content, + ) + if response.content.explicit: + logger.warning( + "Raw explicit observations before conversion: %s", + [obs.content[:200] for obs in response.content.explicit], + ) + else: + logger.warning( + "LLM returned no explicit observations. Raw content type: %s", + type(response.content), + ) + logger.warning( + "Raw LLM output (parsed model JSON): %s", + response.content.model_dump_json(), + )

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

logger.warning(

"LLM response debug — finish_reasons=%s, output_tokens=%d, input_tokens=%d, "

+ "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d",

response.finish_reasons,

response.output_tokens,

response.input_tokens,

len(response.content.explicit),

len(message_ids),

len(formatted_messages),

)

if response.thinking_content:

logger.warning(

"LLM thinking content present (%d chars): %.500s",

len(response.thinking_content),

response.thinking_content,

)

if response.content.explicit:

logger.warning(

"Raw explicit observations before conversion: %s",

[obs.content[:200] for obs in response.content.explicit],

)

else:

logger.warning(

"LLM returned no explicit observations. Raw content type: %s",

type(response.content),

)

logger.warning(

"Raw LLM output (parsed model JSON): %s", response.content.model_dump_json()

)

logger.warning(

"LLM response debug — finish_reasons=%s, output_tokens=%d, input_tokens=%d, "

"raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d",

response.finish_reasons,

response.output_tokens,

response.input_tokens,

len(response.content.explicit),

len(message_ids),

len(formatted_messages),

)

if settings.DERIVER.LOG_OBSERVATIONS:

if response.thinking_content:

logger.warning(

"LLM thinking content present (%d chars): %.500s",

len(response.thinking_content),

response.thinking_content,

)

if response.content.explicit:

logger.warning(

"Raw explicit observations before conversion: %s",

[obs.content[:200] for obs in response.content.explicit],

)

else:

logger.warning(

"LLM returned no explicit observations. Raw content type: %s",

type(response.content),

)

logger.warning(

"Raw LLM output (parsed model JSON): %s",

response.content.model_dump_json(),

)

🤖 Prompt for AI Agents

In `@src/deriver/deriver.py` around lines 180 - 208, The logging emits raw user-derived content at WARNING (e.g., logger.warning calls that print response.content.explicit, response.thinking_content, and response.content.model_dump_json()), which can leak PII; wrap those content-bearing warnings behind the same gate used elsewhere (settings.DERIVER.LOG_OBSERVATIONS) or downgrade them to DEBUG and ensure only metadata (response.finish_reasons, response.output_tokens, response.input_tokens, len(...) of lists) remains at WARNING; specifically modify the blocks referencing response.thinking_content, response.content.explicit, and response.content.model_dump_json() so they check settings.DERIVER.LOG_OBSERVATIONS (or change to logger.debug) before emitting raw observation text.

claude · 2026-02-06T19:16:44Z

Test

(feat) Updaate deriver prompt and sampling parameters

535b486

3un01a requested a review from dr-frmr February 4, 2026 20:39

3un01a self-assigned this Feb 4, 2026

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

src/utils/clients.py Outdated Show resolved Hide resolved

3un01a and others added 2 commits February 4, 2026 13:06

(feat) Update with correct deriver prompt

498dcb1

chore: cleanup config, remove noisy logs

dbb1f8c

dr-frmr previously approved these changes Feb 4, 2026

View reviewed changes

coderabbitai bot reviewed Feb 4, 2026

View reviewed changes

fix: allow partial dialectic reasoning level configs, merge in defaults

8166eb6

dr-frmr dismissed their stale review via 8166eb6 February 5, 2026 16:33

chore: add more logging

efbc2d4

dr-frmr marked this pull request as draft February 6, 2026 19:11

coderabbitai bot reviewed Feb 6, 2026

View reviewed changes

fix: comment out response model in vllm

1fe59aa

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Deriver + Sampling Parameters#370

Update Deriver + Sampling Parameters#370
3un01a wants to merge 6 commits intomainfrom
yuya/AI-130

3un01a commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 4, 2026 •

edited

Loading

Review skipped

claude bot commented Feb 4, 2026

coderabbitai bot left a comment

Uh oh!

dr-frmr left a comment

coderabbitai bot left a comment

coderabbitai bot left a comment

coderabbitai bot Feb 6, 2026

claude bot commented Feb 6, 2026

Labels

2 participants

Conversation

3un01a commented Feb 4, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

coderabbitai bot commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

claude bot commented Feb 4, 2026

Code review

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

dr-frmr left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot left a comment

Choose a reason for hiding this comment

coderabbitai bot Feb 6, 2026

Choose a reason for hiding this comment

claude bot commented Feb 6, 2026

Labels

2 participants

3un01a commented Feb 4, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 4, 2026 •

edited

Loading