-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[Bug]: model.max_tokens in config.yaml has no effect — setting is never passed to AIAgent #4404
Description
Bug Description
Setting max_tokens under the model key in config.yaml does not increase the output token limit. Responses are silently truncated mid-generation for moderately long tasks. The config value is read and merged but never extracted and forwarded to the AIAgent constructor, making the setting completely ineffective.
model:
default: MiniMax-M2.7
provider: custom
base_url: https://api.minimax.io/v1
max_tokens: 8192 # ← this does nothingSteps to Reproduce
- Set
model.max_tokens: 8192inconfig.yaml - Run
hermes chatand engage in a conversation requiring a long response - Observe the response gets truncated with no error message
- Check the API request —
max_tokensis not included
Expected Behavior
model.max_tokens in config.yaml should be passed to the AIAgent constructor, which then sends it to the API.
Actual Behavior
The AIAgent constructor accepts a max_tokens parameter (run_agent.py#L660), but callers never provide it:
- cli.py#L2100:
self.agent = AIAgent(...)— called withoutmax_tokens - gateway/run.py#L781-#L789:
_resolve_turn_agent_config()builds aprimarydict withoutmax_tokens, so it never flows through to AIAgent
The _build_api_kwargs method (run_agent.py#L4864) only adds max_tokens to the API request when self.max_tokens is not None:
if self.max_tokens is not None:
api_kwargs.update(self._max_tokens_param(self.max_tokens))
elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
# Band-aid: use hardcoded per-model limits
_model_output_limit = _get_anthropic_max_output(self.model)
api_kwargs["max_tokens"] = _model_output_limitSince callers never pass max_tokens, it is always None and the parameter is never sent — except for a hardcoded band-aid for OpenRouter + Claude (see below).
Root Cause Analysis (confirmed with live API intercept)
I patched the installed hermes-agent to log what _build_api_kwargs() sends to the API. Two tests were run:
Test 1 — Current behavior (BUG):
[DEBUG max_tokens] self.max_tokens=None | api_kwargs max_token keys={} | model=MiniMax-M2.7
🚨 BUG CONFIRMED: self.max_tokens is None AND no max_token in API request!
Test 2 — With max_tokens=50 passed to AIAgent:
[DEBUG max_tokens] self.max_tokens=50 | api_kwargs max_token keys={'max_tokens': 50} | model=MiniMax-M2.7
✅ FIX CONFIRMED: max_tokens IS being sent to the API!
MiniMax API respects max_tokens (verified with live API call):
| Setting | completion_tokens |
finish_reason |
|---|---|---|
max_tokens=50 |
50 | length (truncated) |
No max_tokens (default) |
787 | stop (complete) |
With max_tokens=50, the API returned exactly 50 tokens and set finish_reason=length, confirming the parameter is respected.
Note on OpenRouter + Claude Band-Aid
run_agent.py#L4865 has a hardcoded band-aid for OpenRouter + Claude because Anthropic's API requires max_tokens:
elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
_model_output_limit = _get_anthropic_max_output(self.model)
api_kwargs["max_tokens"] = _model_output_limitThis uses a hardcoded lookup table (_get_anthropic_max_output) and ignores the user's model.max_tokens config entirely. It was added as a workaround for the missing config passthrough, not a proper fix. Our fix makes config work for all providers including OpenRouter + Claude.
Provider Compatibility
This fix is safe for all providers:
- If
max_tokensis NOT in config.yaml → the fix extractsNone→ behavior is identical to before (no change) - If
max_tokensIS in config.yaml → user explicitly configured it for their provider/model - Most OpenAI-compatible APIs ignore unsupported parameters rather than erroring
- The existing band-aid for OpenRouter + Claude confirms that passing
max_tokensto providers that support it is the intended behavior
Proposed Fix
Proof-of-concept on fix/model-max-tokens-config branch: https://github.com/shokollm/hermes-agent/tree/fix/model-max-tokens-config
Changes:
- cli.py: Extract
max_tokens = CLI_CONFIG["model"].get("max_tokens")and pass to AIAgent - gateway/run.py: Add
user_configparameter to_resolve_turn_agent_config(), extractmax_tokensfromuser_config.get("model", {}).get("max_tokens"), include it in theprimarydict so it flows through to AIAgent
Affected Component
- CLI (interactive chat)
- Gateway (Telegram/Discord/Slack/WhatsApp)
- Configuration (config.yaml, .env, hermes setup)
Are you willing to submit a PR for this?
- I'd like to fix this myself and submit a PR