Skip to content

bug: setup flow writes provider to config.yaml before model selection, causing gateway race condition #1182

@teknium1

Description

@teknium1

Summary

When a user runs hermes setup or hermes model while the gateway is running (e.g. on Home Assistant, Telegram, Discord), there's a race condition that can cause the gateway to send requests with an incompatible model name to the new provider.

Root Cause

_update_config_for_provider() in hermes_cli/auth.py (line ~1544) writes the new provider and base_url to config.yaml immediately — before model selection happens.

In the MiniMax setup flow (setup.py lines 1036-1037):

_update_config_for_provider("minimax", pconfig.inference_base_url)  # writes to disk NOW
_set_model_provider(config, "minimax", pconfig.inference_base_url)
# ... model selection happens much later at line ~1278

Since the gateway re-reads config.yaml per-message, this creates a window where:

  • model.provider = minimax, model.base_url = https://api.minimax.io/v1
  • model.default = anthropic/claude-opus-4.6 (unchanged from previous provider)

The gateway then sends anthropic/claude-opus-4.6 to MiniMax's API, which doesn't serve that model.

Reproduction

  1. Configure OpenRouter with anthropic/claude-opus-4.6 (default)
  2. Start the gateway (hermes gateway run)
  3. Chat via Home Assistant / Telegram — works fine
  4. In a separate terminal, run hermes setup or hermes model
  5. Select MiniMax as provider, enter API key
  6. Before completing model selection, send a message via the gateway
  7. The gateway picks up MiniMax as provider but still uses the Claude model name → fails

Even without the race: if the user selects "Keep current" at the model selection step, the model stays as anthropic/claude-opus-4.6 permanently — which is always wrong for non-OpenRouter providers.

Affected Providers

Any non-OpenRouter provider where the model name format differs: MiniMax, MiniMax-CN, Z.AI, Kimi, Anthropic (native). The OpenRouter-formatted model name (anthropic/claude-opus-4.6) won't work on these endpoints.

Suggested Fixes

Option A (minimal): Don't write config.yaml until the full setup flow completes. Buffer provider + base_url changes in memory, only flush to disk after model selection.

Option B (defensive): When _update_config_for_provider changes the provider, also set model.default to the first model in that provider's default list (e.g. MiniMax-M2.5 for minimax). The user can still change it during model selection, but at least the intermediate state is valid.

Option C (validation): Add a gateway-side sanity check: if the resolved provider is not openrouter and the model name contains a / prefix (OpenRouter format), log a warning and refuse to start the agent until the config is fixed.

User Report

Reported by a Discord user (Hunter) running Hermes on Home Assistant. His session log showed model: anthropic/claude-opus-4.6 with base_url: https://api.minimax.io/v1 — the agent appeared to "switch providers mid-conversation" from his perspective.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions