Skip to content

[Bug]: model.max_tokens in config.yaml has no effect — setting is never passed to AIAgent #4404

@shokollm

Description

@shokollm

Bug Description

Setting max_tokens under the model key in config.yaml does not increase the output token limit. Responses are silently truncated mid-generation for moderately long tasks. The config value is read and merged but never extracted and forwarded to the AIAgent constructor, making the setting completely ineffective.

model:
  default: MiniMax-M2.7
  provider: custom
  base_url: https://api.minimax.io/v1
  max_tokens: 8192  # ← this does nothing

Steps to Reproduce

  1. Set model.max_tokens: 8192 in config.yaml
  2. Run hermes chat and engage in a conversation requiring a long response
  3. Observe the response gets truncated with no error message
  4. Check the API request — max_tokens is not included

Expected Behavior

model.max_tokens in config.yaml should be passed to the AIAgent constructor, which then sends it to the API.

Actual Behavior

The AIAgent constructor accepts a max_tokens parameter (run_agent.py#L660), but callers never provide it:

  • cli.py#L2100: self.agent = AIAgent(...) — called without max_tokens
  • gateway/run.py#L781-#L789: _resolve_turn_agent_config() builds a primary dict without max_tokens, so it never flows through to AIAgent

The _build_api_kwargs method (run_agent.py#L4864) only adds max_tokens to the API request when self.max_tokens is not None:

if self.max_tokens is not None:
    api_kwargs.update(self._max_tokens_param(self.max_tokens))
elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
    # Band-aid: use hardcoded per-model limits
    _model_output_limit = _get_anthropic_max_output(self.model)
    api_kwargs["max_tokens"] = _model_output_limit

Since callers never pass max_tokens, it is always None and the parameter is never sent — except for a hardcoded band-aid for OpenRouter + Claude (see below).

Root Cause Analysis (confirmed with live API intercept)

I patched the installed hermes-agent to log what _build_api_kwargs() sends to the API. Two tests were run:

Test 1 — Current behavior (BUG):

[DEBUG max_tokens] self.max_tokens=None | api_kwargs max_token keys={} | model=MiniMax-M2.7
🚨 BUG CONFIRMED: self.max_tokens is None AND no max_token in API request!

Test 2 — With max_tokens=50 passed to AIAgent:

[DEBUG max_tokens] self.max_tokens=50 | api_kwargs max_token keys={'max_tokens': 50} | model=MiniMax-M2.7
✅ FIX CONFIRMED: max_tokens IS being sent to the API!

MiniMax API respects max_tokens (verified with live API call):

Setting completion_tokens finish_reason
max_tokens=50 50 length (truncated)
No max_tokens (default) 787 stop (complete)

With max_tokens=50, the API returned exactly 50 tokens and set finish_reason=length, confirming the parameter is respected.

Note on OpenRouter + Claude Band-Aid

run_agent.py#L4865 has a hardcoded band-aid for OpenRouter + Claude because Anthropic's API requires max_tokens:

elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
    _model_output_limit = _get_anthropic_max_output(self.model)
    api_kwargs["max_tokens"] = _model_output_limit

This uses a hardcoded lookup table (_get_anthropic_max_output) and ignores the user's model.max_tokens config entirely. It was added as a workaround for the missing config passthrough, not a proper fix. Our fix makes config work for all providers including OpenRouter + Claude.

Provider Compatibility

This fix is safe for all providers:

  • If max_tokens is NOT in config.yaml → the fix extracts None → behavior is identical to before (no change)
  • If max_tokens IS in config.yaml → user explicitly configured it for their provider/model
  • Most OpenAI-compatible APIs ignore unsupported parameters rather than erroring
  • The existing band-aid for OpenRouter + Claude confirms that passing max_tokens to providers that support it is the intended behavior

Proposed Fix

Proof-of-concept on fix/model-max-tokens-config branch: https://github.com/shokollm/hermes-agent/tree/fix/model-max-tokens-config

Changes:

  1. cli.py: Extract max_tokens = CLI_CONFIG["model"].get("max_tokens") and pass to AIAgent
  2. gateway/run.py: Add user_config parameter to _resolve_turn_agent_config(), extract max_tokens from user_config.get("model", {}).get("max_tokens"), include it in the primary dict so it flows through to AIAgent

Affected Component

  • CLI (interactive chat)
  • Gateway (Telegram/Discord/Slack/WhatsApp)
  • Configuration (config.yaml, .env, hermes setup)

Are you willing to submit a PR for this?

  • I'd like to fix this myself and submit a PR

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions