fix: enable json_mode for vllm provider and add explicit JSON schema to deriver prompt by paralizeer · Pull Request #466 · plastic-labs/honcho

paralizeer · 2026-03-30T16:27:47Z

Problem

Two related issues when using the vllm provider (OpenAI-compatible endpoints like MiniMax, Mistral, Llama variants):

1. vllm excluded from json_object mode

if json_mode and provider != "vllm":
    openai_params["response_format"] = {"type": "json_object"}

The vllm provider is excluded from json_object response format injection. When json_mode=True with no response_model, vllm providers receive no JSON format instruction and return plain text — causing the deriver's PromptRepresentation validation to fail.

2. Deriver prompt has no explicit JSON schema

The minimal deriver prompt instructs the model to extract observations but doesn't specify the expected JSON structure. OpenAI and Anthropic infer this from response_format, but other providers need the schema stated explicitly in the prompt itself.

Fix

clients.py: Remove vllm from the exclusion when json_mode=True and no response_model is set
prompts.py: Append the exact expected JSON structure to the minimal deriver prompt

Result

The deriver now works correctly with:

MiniMax M2.7 (via vllm provider pointing to https://api.minimax.io/v1)
Any Ollama model that supports JSON mode
Other OpenAI-compatible providers

Tested: deriver successfully extracts and stores observations using MiniMax M2.7.

Summary by CodeRabbit

Bug Fixes
- Resolved a conflict that could force an incorrect JSON output format when structured responses were requested.
Improvements
- Enforced stricter JSON-only output for minimal extraction prompts to ensure predictable, machine-readable responses.
- Left structured response handling unchanged so models respect explicit schema-based outputs.

Two related fixes for using OpenAI-compatible providers (vllm, custom) with json_mode=True: 1. clients.py: vllm was excluded from json_object response_format injection when json_mode=True. This caused providers like MiniMax M2.7 (routed via the vllm provider config) to return plain text instead of JSON, failing the PromptRepresentation pydantic validation. Fix: remove vllm from the exclusion list when no response_model is set. 2. prompts.py: the minimal deriver prompt had no explicit JSON schema instruction. OpenAI and Anthropic infer the correct format from json_mode + the pydantic schema, but other providers (MiniMax, Mistral, Llama variants) need the schema stated explicitly in the prompt. Fix: append the exact expected JSON structure to the prompt. These two fixes together enable the deriver to work correctly with non-OpenAI/Anthropic providers including MiniMax M2.7, Ollama-served models, and other OpenAI-compatible endpoints via the vllm provider.

coderabbitai · 2026-03-30T16:28:02Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f2a603c6-bf1a-4758-98de-6fda548c0de8

📥 Commits

Reviewing files that changed from the base of the PR and between 1a4129c and 2a0774a.

📒 Files selected for processing (1)

src/utils/clients.py

✅ Files skipped from review due to trivial changes (1)

src/utils/clients.py

Walkthrough

This PR tightens JSON output constraints: the minimal deriver prompt now requires responses that are strictly a JSON object matching {"explicit": [{"content": "<observation>"}]}, and the OpenAI response_format is only forced to json_object when json_mode is true and no response_model is provided.

Changes

Cohort / File(s)	Summary
Prompt Output Format Clarification `src/deriver/prompts.py`	Added an explicit requirement to `minimal_deriver_prompt` that the model must return only a JSON object following the schema `{"explicit": [{"content": "<observation>"}]}` (no extra text/markdown/explanation).
Client JSON Mode Condition Refinement `src/utils/clients.py`	Changed logic so `openai_params["response_format"] = {"type": "json_object"}` is set only when `json_mode` is true AND `response_model` is not provided; leaves `response_format` unset when a `response_model` is used (including `vllm`'s own handling).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐇 I nibble on schemas, tidy and neat,
JSON-only carrots are quite the treat,
Prompts aligned and clients tuned right,
Observations hop into the light,
Hooray — structured outputs take flight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the two main changes: enabling json_mode for vllm provider and adding explicit JSON schema to the deriver prompt, matching the PR's core objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/clients.py`:
- Around line 1898-1899: The current conditional prevents injecting JSON mode
into vllm-backed providers: modify the check around the openai_params assignment
so that when json_mode is True and there is no response_model the code sets
openai_params["response_format"] = {"type": "json_object"} regardless of
provider (remove the provider != "vllm" exclusion); update the logic near the
variables json_mode, provider, response_model and the openai_params assignment
so vllm-routed OpenAI-compatible models receive the JSON response_format.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 152b7093-f03a-4734-89af-abc5f8f3c433

📥 Commits

Reviewing files that changed from the base of the PR and between a5423b5 and 1a4129c.

📒 Files selected for processing (2)

src/deriver/prompts.py
src/utils/clients.py

src/utils/clients.py

Remove the provider != 'vllm' exclusion so vllm-routed OpenAI-compatible backends (e.g. MiniMax via vllm) receive response_format: json_object when json_mode=True. The existing vllm response_model path (json_schema) is unaffected since it is gated on response_model being set. Co-authored-by: CodeRabbit <coderabbit@coderabbit.ai>

paralizeer · 2026-03-30T17:34:33Z

Applied the fix from @coderabbitai — removed the provider not in ("vllm",) exclusion so vllm-routed OpenAI-compatible backends receive response_format: {"type": "json_object"} when json_mode=True.

The existing vllm response_model path (which produces json_schema format) is unaffected since it's gated on response_model being set — the two branches don't conflict.

Context: this was found while running the deriver against MiniMax M2.7 via a vllm-compatible endpoint. Without this fix, the deriver falls back to unstructured output and observation extraction breaks silently.

coderabbitai · 2026-03-30T17:35:52Z

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me quickly verify the actual change in clients.py to make sure the implementation matches the stated intent.

🏁 Script executed:

#!/bin/bash
# Check the current state of clients.py around the json_mode / vllm condition
rg -n "vllm|json_mode|json_object|response_format|response_model" --type=py src/utils/clients.py -A2 -B2

Length of output: 16014

@paralizeer Thanks for the context — I verified the implementation in clients.py and it looks exactly right.

What the code now does (Lines 1898–1915):

# Generic json_object path — now applies to ALL providers including vllm
if json_mode and not response_model:
    openai_params["response_format"] = {"type": "json_object"}

# vllm-specific json_schema path — only fires when response_model is set
if provider == "vllm" and response_model:
    openai_params["response_format"] = {
        "type": "json_schema",
        "json_schema": { ... }
    }
    ...

The two branches are cleanly mutually exclusive:

json_mode=True, no response_model → json_object (now reaches vllm too ✓)
provider == "vllm", response_model set → json_schema structured output path (unchanged ✓)
response_model set, non-vllm → falls through to response_model path at Line 1990 ✓

The fix is well-structured and the silent failure mode (deriver returning unstructured text and PromptRepresentation validation breaking) is properly addressed. Good catch with MiniMax M2.7.

✏️ Learnings added

Learnt from: paralizeer
URL: https://github.com/plastic-labs/honcho/pull/466

Timestamp: 2026-03-30T17:35:51.248Z
Learning: In src/utils/clients.py (`honcho_llm_call_inner`), the vllm provider (OpenAI-compatible endpoint) previously was excluded from receiving `response_format: {"type": "json_object"}` when `json_mode=True`. The fix removes that exclusion. The vllm-specific `json_schema` structured-output path (used when `response_model` is set) is a separate branch gated on `provider == "vllm" and response_model`, so the two branches don't conflict. Without this fix, the deriver falls back to unstructured output and `PromptRepresentation` validation breaks silently for vllm-routed providers (e.g. MiniMax M2.7, Mistral, Llama variants).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

coderabbitai bot reviewed Mar 30, 2026

View reviewed changes

src/utils/clients.py Outdated Show resolved Hide resolved

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: enable json_mode for vllm provider and add explicit JSON schema to deriver prompt#466

fix: enable json_mode for vllm provider and add explicit JSON schema to deriver prompt#466
paralizeer wants to merge 2 commits intoplastic-labs:mainfrom
paralizeer:fix/vllm-json-mode-and-deriver-prompt

paralizeer commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading

❌ Failed checks (1 warning)

coderabbitai bot left a comment

Uh oh!

paralizeer commented Mar 30, 2026

coderabbitai bot commented Mar 30, 2026

Labels

1 participant

Conversation

paralizeer commented Mar 30, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

1. vllm excluded from json_object mode

2. Deriver prompt has no explicit JSON schema

Fix

Result

Summary by CodeRabbit

coderabbitai bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Poem

❌ Failed checks (1 warning)

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

paralizeer commented Mar 30, 2026

coderabbitai bot commented Mar 30, 2026

Labels

1 participant

paralizeer commented Mar 30, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 30, 2026 •

edited

Loading