Skip to content

fix: enable json_mode for vllm provider and add explicit JSON schema to deriver prompt#466

Open
paralizeer wants to merge 2 commits intoplastic-labs:mainfrom
paralizeer:fix/vllm-json-mode-and-deriver-prompt
Open

fix: enable json_mode for vllm provider and add explicit JSON schema to deriver prompt#466
paralizeer wants to merge 2 commits intoplastic-labs:mainfrom
paralizeer:fix/vllm-json-mode-and-deriver-prompt

Conversation

@paralizeer
Copy link
Copy Markdown

@paralizeer paralizeer commented Mar 30, 2026

Problem

Two related issues when using the vllm provider (OpenAI-compatible endpoints like MiniMax, Mistral, Llama variants):

1. vllm excluded from json_object mode

if json_mode and provider != "vllm":
    openai_params["response_format"] = {"type": "json_object"}

The vllm provider is excluded from json_object response format injection. When json_mode=True with no response_model, vllm providers receive no JSON format instruction and return plain text — causing the deriver's PromptRepresentation validation to fail.

2. Deriver prompt has no explicit JSON schema

The minimal deriver prompt instructs the model to extract observations but doesn't specify the expected JSON structure. OpenAI and Anthropic infer this from response_format, but other providers need the schema stated explicitly in the prompt itself.

Fix

  1. clients.py: Remove vllm from the exclusion when json_mode=True and no response_model is set
  2. prompts.py: Append the exact expected JSON structure to the minimal deriver prompt

Result

The deriver now works correctly with:

  • MiniMax M2.7 (via vllm provider pointing to https://api.minimax.io/v1)
  • Any Ollama model that supports JSON mode
  • Other OpenAI-compatible providers

Tested: deriver successfully extracts and stores observations using MiniMax M2.7.

Summary by CodeRabbit

  • Bug Fixes

    • Resolved a conflict that could force an incorrect JSON output format when structured responses were requested.
  • Improvements

    • Enforced stricter JSON-only output for minimal extraction prompts to ensure predictable, machine-readable responses.
    • Left structured response handling unchanged so models respect explicit schema-based outputs.
Two related fixes for using OpenAI-compatible providers (vllm, custom)
with json_mode=True:

1. clients.py: vllm was excluded from json_object response_format injection
   when json_mode=True. This caused providers like MiniMax M2.7 (routed
   via the vllm provider config) to return plain text instead of JSON,
   failing the PromptRepresentation pydantic validation.
   Fix: remove vllm from the exclusion list when no response_model is set.

2. prompts.py: the minimal deriver prompt had no explicit JSON schema
   instruction. OpenAI and Anthropic infer the correct format from
   json_mode + the pydantic schema, but other providers (MiniMax, Mistral,
   Llama variants) need the schema stated explicitly in the prompt.
   Fix: append the exact expected JSON structure to the prompt.

These two fixes together enable the deriver to work correctly with
non-OpenAI/Anthropic providers including MiniMax M2.7, Ollama-served
models, and other OpenAI-compatible endpoints via the vllm provider.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f2a603c6-bf1a-4758-98de-6fda548c0de8

📥 Commits

Reviewing files that changed from the base of the PR and between 1a4129c and 2a0774a.

📒 Files selected for processing (1)
  • src/utils/clients.py
✅ Files skipped from review due to trivial changes (1)
  • src/utils/clients.py

Walkthrough

This PR tightens JSON output constraints: the minimal deriver prompt now requires responses that are strictly a JSON object matching {"explicit": [{"content": "<observation>"}]}, and the OpenAI response_format is only forced to json_object when json_mode is true and no response_model is provided.

Changes

Cohort / File(s) Summary
Prompt Output Format Clarification
src/deriver/prompts.py
Added an explicit requirement to minimal_deriver_prompt that the model must return only a JSON object following the schema {"explicit": [{"content": "<observation>"}]} (no extra text/markdown/explanation).
Client JSON Mode Condition Refinement
src/utils/clients.py
Changed logic so openai_params["response_format"] = {"type": "json_object"} is set only when json_mode is true AND response_model is not provided; leaves response_format unset when a response_model is used (including vllm's own handling).

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Poem

🐇 I nibble on schemas, tidy and neat,
JSON-only carrots are quite the treat,
Prompts aligned and clients tuned right,
Observations hop into the light,
Hooray — structured outputs take flight!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately captures the two main changes: enabling json_mode for vllm provider and adding explicit JSON schema to the deriver prompt, matching the PR's core objectives.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/utils/clients.py`:
- Around line 1898-1899: The current conditional prevents injecting JSON mode
into vllm-backed providers: modify the check around the openai_params assignment
so that when json_mode is True and there is no response_model the code sets
openai_params["response_format"] = {"type": "json_object"} regardless of
provider (remove the provider != "vllm" exclusion); update the logic near the
variables json_mode, provider, response_model and the openai_params assignment
so vllm-routed OpenAI-compatible models receive the JSON response_format.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 152b7093-f03a-4734-89af-abc5f8f3c433

📥 Commits

Reviewing files that changed from the base of the PR and between a5423b5 and 1a4129c.

📒 Files selected for processing (2)
  • src/deriver/prompts.py
  • src/utils/clients.py
Remove the provider != 'vllm' exclusion so vllm-routed OpenAI-compatible
backends (e.g. MiniMax via vllm) receive response_format: json_object
when json_mode=True. The existing vllm response_model path (json_schema)
is unaffected since it is gated on response_model being set.

Co-authored-by: CodeRabbit <coderabbit@coderabbit.ai>
@paralizeer
Copy link
Copy Markdown
Author

Applied the fix from @coderabbitai — removed the provider not in ("vllm",) exclusion so vllm-routed OpenAI-compatible backends receive response_format: {"type": "json_object"} when json_mode=True.

The existing vllm response_model path (which produces json_schema format) is unaffected since it's gated on response_model being set — the two branches don't conflict.

Context: this was found while running the deriver against MiniMax M2.7 via a vllm-compatible endpoint. Without this fix, the deriver falls back to unstructured output and observation extraction breaks silently.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 30, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

Let me quickly verify the actual change in clients.py to make sure the implementation matches the stated intent.


🏁 Script executed:

#!/bin/bash
# Check the current state of clients.py around the json_mode / vllm condition
rg -n "vllm|json_mode|json_object|response_format|response_model" --type=py src/utils/clients.py -A2 -B2

Length of output: 16014


@paralizeer Thanks for the context — I verified the implementation in clients.py and it looks exactly right.

What the code now does (Lines 1898–1915):

# Generic json_object path — now applies to ALL providers including vllm
if json_mode and not response_model:
    openai_params["response_format"] = {"type": "json_object"}

# vllm-specific json_schema path — only fires when response_model is set
if provider == "vllm" and response_model:
    openai_params["response_format"] = {
        "type": "json_schema",
        "json_schema": { ... }
    }
    ...

The two branches are cleanly mutually exclusive:

  • json_mode=True, no response_modeljson_object (now reaches vllm too ✓)
  • provider == "vllm", response_model setjson_schema structured output path (unchanged ✓)
  • response_model set, non-vllm → falls through to response_model path at Line 1990 ✓

The fix is well-structured and the silent failure mode (deriver returning unstructured text and PromptRepresentation validation breaking) is properly addressed. Good catch with MiniMax M2.7.


✏️ Learnings added
Learnt from: paralizeer
URL: https://github.com/plastic-labs/honcho/pull/466

Timestamp: 2026-03-30T17:35:51.248Z
Learning: In src/utils/clients.py (`honcho_llm_call_inner`), the vllm provider (OpenAI-compatible endpoint) previously was excluded from receiving `response_format: {"type": "json_object"}` when `json_mode=True`. The fix removes that exclusion. The vllm-specific `json_schema` structured-output path (used when `response_model` is set) is a separate branch gated on `provider == "vllm" and response_model`, so the two branches don't conflict. Without this fix, the deriver falls back to unstructured output and `PromptRepresentation` validation breaks silently for vllm-routed providers (e.g. MiniMax M2.7, Mistral, Llama variants).

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant