Skip to content

Update Deriver + Sampling Parameters#370

Draft
3un01a wants to merge 6 commits intomainfrom
yuya/AI-130
Draft

Update Deriver + Sampling Parameters#370
3un01a wants to merge 6 commits intomainfrom
yuya/AI-130

Conversation

@3un01a
Copy link
Copy Markdown
Contributor

@3un01a 3un01a commented Feb 4, 2026

Update deriver prompt and add additional sampling parameters for new Neuromancer models.

Summary by CodeRabbit

  • New Features

    • Added TOP_P and REPETITION_PENALTY controls for finer LLM generation tuning; these now apply across single-call, streaming, and tool-loop workflows.
    • Replaced fact-extraction prompt with a structured, step-by-step "molecular facts" workflow for more precise, atomic claims.
  • Documentation

    • Updated configuration examples and docs to document the new deriver parameters and template entries.
  • Improvements

    • Expanded verbose LLM response logging; removed a small set of legacy debug logs.
@3un01a 3un01a requested a review from dr-frmr February 4, 2026 20:39
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 4, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review

Walkthrough

Adds optional generation controls (top_p, repetition_penalty) to deriver configuration and threads them through LLM call paths; merges partial per-level Dialectic settings with defaults and validates token budgets; replaces the facts-extraction prompt with a structured "molecular facts" workflow and removes two debug logs.

Changes

Cohort / File(s) Summary
Configuration & Defaults
src/config.py
Add TOP_P and REPETITION_PENALTY to DeriverSettings; introduce DialecticSettings._DEFAULT_LEVELS, merge-before-validation of partial LEVELS, add token/limit defaults and validators ensuring budgets and presence of all reasoning levels.
LLM Client Plumbing
src/utils/clients.py
Add top_p and repetition_penalty parameters across honcho_llm_call, honcho_llm_call_inner, _stream_final_response, _execute_tool_loop, _call_with_messages and propagate them for OpenAI/vLLM flows (including provider-specific forwarding via openai_params/extra_body).
Deriver Integration & Logging
src/deriver/deriver.py
Forward top_p and repetition_penalty in minimal deriver batch single-LLM call path; add verbose LLM response logging (finish reasons, token counts, thinking content, raw outputs) including when no observations are produced.
Prompts
src/deriver/prompts.py
Replace explicit-facts prompt with structured "molecular facts" extraction process (Identify Candidate Claims → Verify Atomicity → Decontextualize → Strip Excess Detail) and require minimal fact+source per line.
Env, Examples & Docs
.env.template, config.toml.example, docs/v3/contributing/configuration.mdx
Expose DERIVER_TOP_P and DERIVER_REPETITION_PENALTY placeholders; update commented example values for TEMPERATURE and add commented TOP_P/REPETITION_PENALTY examples; document new env vars.
Logging Cleanup
src/utils/agent_tools.py
Removed two debug logging statements around execute_tool invocation/results.

Sequence Diagram(s)

sequenceDiagram
    participant Deriver as Deriver
    participant Clients as Clients.honcho_llm_call
    participant Provider as LLM Provider (OpenAI / vLLM)
    participant Observers as Observers / Storage

    Deriver->>Clients: call honcho_llm_call(messages, top_p, repetition_penalty, ...)
    Clients->>Provider: dispatch request (include top_p in openai_params / repetition_penalty in extra_body)
    Provider-->>Clients: stream/complete response (tokens, finish_reason, raw_json)
    Clients->>Deriver: return response (messages, observations)
    Clients->>Observers: log verbose response (finish_reasons, token counts, thinking, raw output)
    Deriver->>Observers: persist observations / synthesized output
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

Suggested reviewers

  • Rajat-Ahuja1997
  • VVoruganti
  • danibalcells

Poem

🐇 I hop through code with sampler bright,
top_p tucked in, penalty light,
Molecules of facts I prune and pare,
Atomic claims, tidy and fair.
Logs sing out the model's thought at night.

🚥 Pre-merge checks | ✅ 3 | ❌ 2
❌ Failed checks (1 warning, 1 inconclusive)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 47.06% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Out of Scope Changes check ❓ Inconclusive Additional changes include config handling improvements (DialecticSettings defaults), verbose logging, and removal of debug statements, which extend beyond the stated objectives but support the primary changes. Consider clarifying whether config flexibility improvements and additional logging are intentional or should be separated into a follow-up PR for clearer scope management.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: updating deriver and sampling parameters, which aligns with the primary objectives of adding sampling parameters and updating the deriver prompt.
Linked Issues check ✅ Passed The PR addresses both objectives from AI-130: updates the deriver prompt (src/deriver/prompts.py) and adds sampling parameters (TOP_P, REPETITION_PENALTY) throughout the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch yuya/AI-130

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@3un01a 3un01a self-assigned this Feb 4, 2026
@claude
Copy link
Copy Markdown

claude bot commented Feb 4, 2026

Code review

No issues found. Checked for bugs and CLAUDE.md compliance.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/utils/clients.py`:
- Around line 575-578: The streaming path in honcho_llm_call_inner is not adding
the sampling controls to the params dict before calling
handle_streaming_response, so top_p, repetition_penalty and no_repeat_ngram_size
are dropped; update honcho_llm_call_inner to add params["top_p"] = top_p,
params["repetition_penalty"] = repetition_penalty and
params["no_repeat_ngram_size"] = no_repeat_ngram_size (mirroring how temperature
is added) before the streaming call so handle_streaming_response receives them,
and ensure you do not pass no_repeat_ngram_size to vLLM providers via extra_body
(add a short inline comment in the vLLM branch documenting that vLLM’s
OpenAI-compatible API does not support no_repeat_ngram_size).
dr-frmr
dr-frmr previously approved these changes Feb 4, 2026
Copy link
Copy Markdown
Contributor

@dr-frmr dr-frmr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

made my review in the form of a commit, happy with all of this now

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/utils/clients.py (1)

1283-1343: ⚠️ Potential issue | 🟡 Minor

Docstring missing new parameters.
Please add top_p and repetition_penalty to the Args section for the public honcho_llm_call API so usage stays accurate.

As per coding guidelines: “Use Google style docstrings in Python code.”

🧹 Nitpick comments (1)
src/config.py (1)

253-255: Add bounds to avoid invalid sampler settings.
Without constraints, misconfigured values can slip through to providers. Consider mirroring other fields’ validation style.

Proposed adjustment
-    TOP_P: float | None = None
-    REPETITION_PENALTY: float | None = None
+    TOP_P: Annotated[float | None, Field(ge=0.0, le=1.0)] = None
+    REPETITION_PENALTY: Annotated[float | None, Field(ge=0.0)] = None
@dr-frmr dr-frmr marked this pull request as draft February 6, 2026 19:11
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In `@src/deriver/deriver.py`:
- Around line 180-208: The logging emits raw user-derived content at WARNING
(e.g., logger.warning calls that print response.content.explicit,
response.thinking_content, and response.content.model_dump_json()), which can
leak PII; wrap those content-bearing warnings behind the same gate used
elsewhere (settings.DERIVER.LOG_OBSERVATIONS) or downgrade them to DEBUG and
ensure only metadata (response.finish_reasons, response.output_tokens,
response.input_tokens, len(...) of lists) remains at WARNING; specifically
modify the blocks referencing response.thinking_content,
response.content.explicit, and response.content.model_dump_json() so they check
settings.DERIVER.LOG_OBSERVATIONS (or change to logger.debug) before emitting
raw observation text.
🧹 Nitpick comments (2)
src/config.py (2)

254-255: Add validation bounds for TOP_P and REPETITION_PENALTY.

Other numeric settings in this file use Annotated[..., Field(...)] with bounds (e.g., TEMPERATURE aside, TRACES_SAMPLE_RATE has ge=0.0, le=1.0). These LLM sampling parameters have well-known valid ranges — top_p is typically [0.0, 1.0] and repetition_penalty is typically > 0. Without constraints, invalid values will only surface as opaque API errors downstream.

Proposed fix
-    TOP_P: float | None = None
-    REPETITION_PENALTY: float | None = None
+    TOP_P: Annotated[float | None, Field(default=None, ge=0.0, le=1.0)] = None
+    REPETITION_PENALTY: Annotated[float | None, Field(default=None, gt=0.0)] = None

429-440: Potential mixed-case key collision during merge with TOML-sourced overrides.

When a user provides level overrides via TOML (lowercase keys like provider), they're merged with _DEFAULT_LEVELS (UPPERCASE keys like PROVIDER) via {**default_config, **user_config}. This produces a dict with both casings present (e.g., {"PROVIDER": "google", "provider": "openai"}). Pydantic resolves this correctly today (alias wins), but it's fragile and could surprise future maintainers.

Consider normalizing keys to a single casing before merging:

Proposed normalization
             elif isinstance(user_config, dict):
                 # Merge user config over defaults
-                merged[level_name] = {**default_config, **user_config}
+                # Normalize user keys to UPPER to match default_config casing
+                normalized = {k.upper(): v for k, v in user_config.items()}
+                merged[level_name] = {**default_config, **normalized}
Comment on lines +180 to +208
logger.warning(
"LLM response debug — finish_reasons=%s, output_tokens=%d, input_tokens=%d, "
+ "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d",
response.finish_reasons,
response.output_tokens,
response.input_tokens,
len(response.content.explicit),
len(message_ids),
len(formatted_messages),
)
if response.thinking_content:
logger.warning(
"LLM thinking content present (%d chars): %.500s",
len(response.thinking_content),
response.thinking_content,
)
if response.content.explicit:
logger.warning(
"Raw explicit observations before conversion: %s",
[obs.content[:200] for obs in response.content.explicit],
)
else:
logger.warning(
"LLM returned no explicit observations. Raw content type: %s",
type(response.content),
)
logger.warning(
"Raw LLM output (parsed model JSON): %s", response.content.model_dump_json()
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Ungated logging of observation content may leak PII into production logs.

Lines 196-208 log raw observation text and the full model_dump_json() at WARNING level with no guard. These contain user-derived content. Contrast with the existing logging at lines 248-261, which is gated behind settings.DERIVER.LOG_OBSERVATIONS. This new block should apply a similar gate around the content-bearing lines, or at minimum reduce them to DEBUG level, to avoid writing user data into production log aggregators.

Token counts and finish reasons (lines 180-189) are safe metadata and can remain at WARNING.

Proposed fix — gate content logging
         logger.warning(
             "LLM response debug — finish_reasons=%s, output_tokens=%d, input_tokens=%d, "
-            + "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d",
+            "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d",
             response.finish_reasons,
             response.output_tokens,
             response.input_tokens,
             len(response.content.explicit),
             len(message_ids),
             len(formatted_messages),
         )
-        if response.thinking_content:
-            logger.warning(
-                "LLM thinking content present (%d chars): %.500s",
-                len(response.thinking_content),
-                response.thinking_content,
-            )
-        if response.content.explicit:
-            logger.warning(
-                "Raw explicit observations before conversion: %s",
-                [obs.content[:200] for obs in response.content.explicit],
-            )
-        else:
-            logger.warning(
-                "LLM returned no explicit observations. Raw content type: %s",
-                type(response.content),
-            )
-        logger.warning(
-            "Raw LLM output (parsed model JSON): %s", response.content.model_dump_json()
-        )
+        if settings.DERIVER.LOG_OBSERVATIONS:
+            if response.thinking_content:
+                logger.warning(
+                    "LLM thinking content present (%d chars): %.500s",
+                    len(response.thinking_content),
+                    response.thinking_content,
+                )
+            if response.content.explicit:
+                logger.warning(
+                    "Raw explicit observations before conversion: %s",
+                    [obs.content[:200] for obs in response.content.explicit],
+                )
+            else:
+                logger.warning(
+                    "LLM returned no explicit observations. Raw content type: %s",
+                    type(response.content),
+                )
+            logger.warning(
+                "Raw LLM output (parsed model JSON): %s",
+                response.content.model_dump_json(),
+            )
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
logger.warning(
"LLM response debug — finish_reasons=%s, output_tokens=%d, input_tokens=%d, "
+ "raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d",
response.finish_reasons,
response.output_tokens,
response.input_tokens,
len(response.content.explicit),
len(message_ids),
len(formatted_messages),
)
if response.thinking_content:
logger.warning(
"LLM thinking content present (%d chars): %.500s",
len(response.thinking_content),
response.thinking_content,
)
if response.content.explicit:
logger.warning(
"Raw explicit observations before conversion: %s",
[obs.content[:200] for obs in response.content.explicit],
)
else:
logger.warning(
"LLM returned no explicit observations. Raw content type: %s",
type(response.content),
)
logger.warning(
"Raw LLM output (parsed model JSON): %s", response.content.model_dump_json()
)
logger.warning(
"LLM response debug — finish_reasons=%s, output_tokens=%d, input_tokens=%d, "
"raw_explicit_count=%d, message_ids_count=%d, formatted_messages_len=%d",
response.finish_reasons,
response.output_tokens,
response.input_tokens,
len(response.content.explicit),
len(message_ids),
len(formatted_messages),
)
if settings.DERIVER.LOG_OBSERVATIONS:
if response.thinking_content:
logger.warning(
"LLM thinking content present (%d chars): %.500s",
len(response.thinking_content),
response.thinking_content,
)
if response.content.explicit:
logger.warning(
"Raw explicit observations before conversion: %s",
[obs.content[:200] for obs in response.content.explicit],
)
else:
logger.warning(
"LLM returned no explicit observations. Raw content type: %s",
type(response.content),
)
logger.warning(
"Raw LLM output (parsed model JSON): %s",
response.content.model_dump_json(),
)
🤖 Prompt for AI Agents
In `@src/deriver/deriver.py` around lines 180 - 208, The logging emits raw
user-derived content at WARNING (e.g., logger.warning calls that print
response.content.explicit, response.thinking_content, and
response.content.model_dump_json()), which can leak PII; wrap those
content-bearing warnings behind the same gate used elsewhere
(settings.DERIVER.LOG_OBSERVATIONS) or downgrade them to DEBUG and ensure only
metadata (response.finish_reasons, response.output_tokens,
response.input_tokens, len(...) of lists) remains at WARNING; specifically
modify the blocks referencing response.thinking_content,
response.content.explicit, and response.content.model_dump_json() so they check
settings.DERIVER.LOG_OBSERVATIONS (or change to logger.debug) before emitting
raw observation text.
@claude
Copy link
Copy Markdown

claude bot commented Feb 6, 2026

Test

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

2 participants