[Bug]: model.max_tokens in config.yaml has no effect — setting is never passed to AIAgent

Bug Description

Setting max_tokens under the model key in config.yaml does not increase the output token limit. Responses are silently truncated mid-generation for moderately long tasks. The config value is read and merged but never extracted and forwarded to the AIAgent constructor, making the setting completely ineffective.

model:
  default: MiniMax-M2.7
  provider: custom
  base_url: https://api.minimax.io/v1
  max_tokens: 8192  # ← this does nothing

Steps to Reproduce

Set model.max_tokens: 8192 in config.yaml
Run hermes chat and engage in a conversation requiring a long response
Observe the response gets truncated with no error message
Check the API request — max_tokens is not included

Expected Behavior

model.max_tokens in config.yaml should be passed to the AIAgent constructor, which then sends it to the API.

Actual Behavior

The AIAgent constructor accepts a max_tokens parameter (run_agent.py#L660), but callers never provide it:

cli.py#L2100: self.agent = AIAgent(...) — called without max_tokens
gateway/run.py#L781-#L789: _resolve_turn_agent_config() builds a primary dict without max_tokens, so it never flows through to AIAgent

The _build_api_kwargs method (run_agent.py#L4864) only adds max_tokens to the API request when self.max_tokens is not None:

if self.max_tokens is not None:
    api_kwargs.update(self._max_tokens_param(self.max_tokens))
elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
    # Band-aid: use hardcoded per-model limits
    _model_output_limit = _get_anthropic_max_output(self.model)
    api_kwargs["max_tokens"] = _model_output_limit

Since callers never pass max_tokens, it is always None and the parameter is never sent — except for a hardcoded band-aid for OpenRouter + Claude (see below).

Root Cause Analysis (confirmed with live API intercept)

I patched the installed hermes-agent to log what _build_api_kwargs() sends to the API. Two tests were run:

Test 1 — Current behavior (BUG):

[DEBUG max_tokens] self.max_tokens=None | api_kwargs max_token keys={} | model=MiniMax-M2.7
🚨 BUG CONFIRMED: self.max_tokens is None AND no max_token in API request!

Test 2 — With max_tokens=50 passed to AIAgent:

[DEBUG max_tokens] self.max_tokens=50 | api_kwargs max_token keys={'max_tokens': 50} | model=MiniMax-M2.7
✅ FIX CONFIRMED: max_tokens IS being sent to the API!

MiniMax API respects max_tokens (verified with live API call):

Setting	`completion_tokens`	`finish_reason`
`max_tokens=50`	50	`length` (truncated)
No `max_tokens` (default)	787	`stop` (complete)

With max_tokens=50, the API returned exactly 50 tokens and set finish_reason=length, confirming the parameter is respected.

Note on OpenRouter + Claude Band-Aid

run_agent.py#L4865 has a hardcoded band-aid for OpenRouter + Claude because Anthropic's API requires max_tokens:

elif self._is_openrouter_url() and "claude" in (self.model or "").lower():
    _model_output_limit = _get_anthropic_max_output(self.model)
    api_kwargs["max_tokens"] = _model_output_limit

This uses a hardcoded lookup table (_get_anthropic_max_output) and ignores the user's model.max_tokens config entirely. It was added as a workaround for the missing config passthrough, not a proper fix. Our fix makes config work for all providers including OpenRouter + Claude.

Provider Compatibility

This fix is safe for all providers:

If max_tokens is NOT in config.yaml → the fix extracts None → behavior is identical to before (no change)
If max_tokens IS in config.yaml → user explicitly configured it for their provider/model
Most OpenAI-compatible APIs ignore unsupported parameters rather than erroring
The existing band-aid for OpenRouter + Claude confirms that passing max_tokens to providers that support it is the intended behavior

Proposed Fix

Proof-of-concept on fix/model-max-tokens-config branch: https://github.com/shokollm/hermes-agent/tree/fix/model-max-tokens-config

Changes:

cli.py: Extract max_tokens = CLI_CONFIG["model"].get("max_tokens") and pass to AIAgent
gateway/run.py: Add user_config parameter to _resolve_turn_agent_config(), extract max_tokens from user_config.get("model", {}).get("max_tokens"), include it in the primary dict so it flows through to AIAgent

Affected Component

CLI (interactive chat)
Gateway (Telegram/Discord/Slack/WhatsApp)
Configuration (config.yaml, .env, hermes setup)

Are you willing to submit a PR for this?

I'd like to fix this myself and submit a PR

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: model.max_tokens in config.yaml has no effect — setting is never passed to AIAgent #4404

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause Analysis (confirmed with live API intercept)

Note on OpenRouter + Claude Band-Aid

Provider Compatibility

Proposed Fix

Affected Component

Are you willing to submit a PR for this?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: model.max_tokens in config.yaml has no effect — setting is never passed to AIAgent #4404

Description

Bug Description

Steps to Reproduce

Expected Behavior

Actual Behavior

Root Cause Analysis (confirmed with live API intercept)

Note on OpenRouter + Claude Band-Aid

Provider Compatibility

Proposed Fix

Affected Component

Are you willing to submit a PR for this?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions