Skip to content

feat: centralized provider router, call_llm API, unified /model command#1003

Merged
teknium1 merged 12 commits intomainfrom
hermes/hermes-cf9f7d54
Mar 12, 2026
Merged

feat: centralized provider router, call_llm API, unified /model command#1003
teknium1 merged 12 commits intomainfrom
hermes/hermes-cf9f7d54

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

@teknium1 teknium1 commented Mar 12, 2026

Summary

Complete provider routing infrastructure overhaul. Every LLM call now flows through a centralized router that handles auth, request formatting, and provider-specific quirks in one place.

Core Architecture

resolve_provider_client(provider, model, async_mode, raw_codex)

Central entry point for creating LLM clients. Handles:

  • Auth lookup (env vars for API-key providers, OAuth tokens for Nous/Codex)
  • Base URL resolution with env var overrides
  • API format (Chat Completions vs Responses API adapter for Codex)
  • Provider-specific headers (OpenRouter attribution, Kimi User-Agent)
  • raw_codex mode for main agent (direct responses.stream access)

call_llm(task, provider, model, messages, ...) / async_call_llm(...)

Full request lifecycle:

  1. Resolve provider+model from task config or explicit args
  2. Get/create cached client via router
  3. Build request kwargs (max_tokens handling, provider extra_body)
  4. Make API call with max_tokens/max_completion_tokens retry
  5. Return response

Config slots (config.yaml v7)

Every auxiliary task has a provider:model config slot:

auxiliary:
  compression: {provider: auto, model: ""}
  vision: {provider: auto, model: ""}
  web_extract: {provider: auto, model: ""}
  session_search: {provider: auto, model: ""}
  skills_hub: {provider: auto, model: ""}
  mcp: {provider: auto, model: ""}
  flush_memories: {provider: auto, model: ""}

Changes by Area

Auxiliary Consumers Migrated (all use call_llm/async_call_llm)

  • context_compressor.py — compression summaries
  • vision_tools.py — image analysis (fixed Codex bypass)
  • web_tools.py — page summarization
  • session_search_tool.py — session summarization
  • browser_tool.py — snapshot summarization + browser vision
  • mcp_tool.py — MCP sampling
  • skills_guard.py — skill audit
  • run_agent.py flush_memories — memory flush
  • trajectory_compressor.py — trajectory summarization
  • mini_swe_runner.py — SWE benchmark runner
  • openrouter_client.py — shared OpenRouter client

Main Agent (run_agent.py)

  • __init__: Uses router when no explicit creds provided
  • _try_activate_fallback: Replaced duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS dicts with single resolve_provider_client() call
  • Removed _resolve_fallback_credentials method

Provider Fixes

  • Nous Portal: Don't send OpenRouter provider preferences (only, ignore, etc.) — caused 404. Don't send reasoning: {enabled: false} — Nous requires reasoning enabled
  • Codex vision bypass: vision_tools.py was constructing raw AsyncOpenAI bypassing Codex adapter → crash. Fixed with get_async_vision_auxiliary_client()
  • Vision errors: Clear message when model doesn't support vision

Config & Auth

  • Removed LLM_MODEL env var — config.yaml is sole source of truth. Avoids conflicts in multi-agent setups. Removed from cli.py, auth.py, setup.py, gateway, cron
  • Removed nous-api provider — Nous Portal is OAuth only

UX

  • Unified /model and /provider — both show same view with all authenticated providers, their models, current selection, and switch examples
  • Added curated model lists for Nous Portal and OpenAI Codex

Test Results

3251 passed, 2 pre-existing unrelated failures

Live Testing

  • Tested hermes model switching between OpenRouter, Nous Portal, OpenAI Codex
  • Tested /model provider:model mid-conversation switching (history preserved)
  • Tested tool calls, vision, web search across all providers
  • Tested auxiliary routing (vision/compression/web_extract) with each primary provider
…error handling

Three interconnected fixes for auxiliary client infrastructure:

1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py)
   Add resolve_provider_client(provider, model, async_mode) — a single
   entry point for creating properly configured clients. Given a provider
   name and optional model, it handles auth lookup (env vars, OAuth
   tokens, auth.json), base URL resolution, provider-specific headers,
   and API format differences (Chat Completions vs Responses API for
   Codex). All auxiliary consumers should route through this instead of
   ad-hoc env var lookups.

   Refactored get_text_auxiliary_client, get_async_text_auxiliary_client,
   and get_vision_auxiliary_client to use the router internally.

2. FIX CODEX VISION BYPASS (vision_tools.py)
   vision_tools.py was constructing a raw AsyncOpenAI client from the
   sync vision client's api_key/base_url, completely bypassing the Codex
   Responses API adapter. When the vision provider resolved to Codex,
   the raw client would hit chatgpt.com/backend-api/codex with
   chat.completions.create() which only supports the Responses API.

   Fix: Added get_async_vision_auxiliary_client() which properly wraps
   Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this
   instead of manual client construction.

3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING
   - context_compressor.py: Removed _get_fallback_client() which blindly
     looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth,
     API-key providers, users without OPENAI_BASE_URL set). Replaced
     with fallback loop through resolve_provider_client() for each
     known provider, with same-provider dedup.

   - vision_tools.py: Added error detection for vision capability
     failures. Returns clear message to the model when the configured
     model doesn't support vision, instead of a generic error.

Addresses #886
Route all remaining ad-hoc auxiliary LLM call sites through
resolve_provider_client() so auth, headers, and API format (Chat
Completions vs Responses API) are handled consistently in one place.

Files changed:

- tools/openrouter_client.py: Replace manual AsyncOpenAI construction
  with resolve_provider_client('openrouter', async_mode=True). The
  shared client module now delegates entirely to the router.

- tools/skills_guard.py: Replace inline OpenAI client construction
  (hardcoded OpenRouter base_url, manual api_key lookup, manual
  headers) with resolve_provider_client('openrouter'). Remove unused
  OPENROUTER_BASE_URL import.

- trajectory_compressor.py: Add _detect_provider() to map config
  base_url to a provider name, then route through
  resolve_provider_client. Falls back to raw construction for
  unrecognized custom endpoints.

- mini_swe_runner.py: Route default case (no explicit api_key/base_url)
  through resolve_provider_client('openrouter') with auto-detection
  fallback. Preserves direct construction when explicit creds are
  passed via CLI args.

- agent/auxiliary_client.py: Fix stale module docstring — vision auto
  mode now correctly documents that Codex and custom endpoints are
  tried (not skipped).
@teknium1 teknium1 changed the title feat: centralized provider router + fix Codex vision bypass + vision error handling Mar 12, 2026
Nous Portal only supports OAuth authentication. Remove the 'nous-api'
provider which allowed direct API key access via NOUS_API_KEY env var.

Removed from:
- hermes_cli/auth.py: PROVIDER_REGISTRY entry + aliases
- hermes_cli/config.py: OPTIONAL_ENV_VARS entry
- hermes_cli/setup.py: setup wizard option + model selection handler
  (reindexed remaining provider choices)
- agent/auxiliary_client.py: docstring references
- tests/test_runtime_provider_resolution.py: nous-api test
- tests/integration/test_web_tools.py: renamed dict key
Add centralized call_llm() and async_call_llm() functions that own the
full LLM request lifecycle:
  1. Resolve provider + model from task config or explicit args
  2. Get or create a cached client for that provider
  3. Format request args (max_tokens handling, provider extra_body)
  4. Make the API call with max_tokens/max_completion_tokens retry
  5. Return the response

Config: expanded auxiliary section with provider:model slots for all
tasks (compression, vision, web_extract, session_search, skills_hub,
mcp, flush_memories). Config version bumped to 7.

Migrated all auxiliary consumers:
- context_compressor.py: uses call_llm(task='compression')
- vision_tools.py: uses async_call_llm(task='vision')
- web_tools.py: uses async_call_llm(task='web_extract')
- session_search_tool.py: uses async_call_llm(task='session_search')
- browser_tool.py: uses call_llm(task='vision'/'web_extract')
- mcp_tool.py: uses call_llm(task='mcp')
- skills_guard.py: uses call_llm(provider='openrouter')
- run_agent.py flush_memories: uses call_llm(task='flush_memories')

Tests updated for context_compressor and MCP tool. Some test mocks
still need updating (15 remaining failures from mock pattern changes,
2 pre-existing).
Update 14 test files to use the new call_llm/async_call_llm mock
patterns instead of the old get_text_auxiliary_client/
get_vision_auxiliary_client tuple returns.

- vision_tools tests: mock async_call_llm instead of _aux_async_client
- browser tests: mock call_llm instead of _aux_vision_client
- flush_memories tests: mock call_llm instead of get_text_auxiliary_client
- session_search tests: mock async_call_llm with RuntimeError
- mcp_tool tests: fix whitelist model config, use side_effect for
  multi-response tests
- auxiliary_config_bridge: update for model=None (resolved in router)

3251 passed, 2 pre-existing unrelated failures.
Phase 2 of the provider router migration — route the main agent's
client construction and fallback activation through
resolve_provider_client() instead of duplicated ad-hoc logic.

run_agent.py:
- __init__: When no explicit api_key/base_url, use
  resolve_provider_client(provider, raw_codex=True) for client
  construction. Explicit creds (from CLI/gateway runtime provider)
  still construct directly.
- _try_activate_fallback: Replace _resolve_fallback_credentials and
  its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS
  dicts with a single resolve_provider_client() call. The router
  handles all provider types (API-key, OAuth, Codex) centrally.
- Remove _resolve_fallback_credentials method and both fallback dicts.

agent/auxiliary_client.py:
- Add raw_codex parameter to resolve_provider_client(). When True,
  returns the raw OpenAI client for Codex providers instead of wrapping
  in CodexAuxiliaryClient. The main agent needs this for direct
  responses.stream() access.

3251 passed, 2 pre-existing unrelated failures.
…ource of truth

Model selection now comes exclusively from config.yaml (set via
'hermes model' or 'hermes setup'). The LLM_MODEL env var is no longer
read or written anywhere in production code.

Why: env vars are per-process/per-user and would conflict in
multi-agent or multi-tenant setups. Config.yaml is file-based and
can be scoped per-user or eventually per-session.

Changes:
- cli.py: Read model from CLI_CONFIG only, not LLM_MODEL/OPENAI_MODEL
- hermes_cli/auth.py: _save_model_choice() no longer writes LLM_MODEL
  to .env
- hermes_cli/setup.py: Remove 12 save_env_value('LLM_MODEL', ...)
  calls from all provider setup flows
- gateway/run.py: Remove LLM_MODEL fallback (HERMES_MODEL still works
  for gateway process runtime)
- cron/scheduler.py: Same
- agent/auxiliary_client.py: Remove LLM_MODEL from custom endpoint
  model detection
Two bugs in _build_api_kwargs that broke Nous Portal:

1. Provider preferences (only, ignore, order, sort) are OpenRouter-
   specific routing features. They were being sent in extra_body to ALL
   providers, including Nous Portal. When the config had
   providers_only=['google-vertex'], Nous Portal returned 404 'Inference
   host not found' because it doesn't have a google-vertex backend.

   Fix: Only include provider preferences when _is_openrouter is True.

2. Reasoning config with enabled=false was being sent to Nous Portal,
   which requires reasoning and returns 400 'Reasoning is mandatory for
   this endpoint and cannot be disabled.'

   Fix: Omit the reasoning parameter for Nous when enabled=false.

Root cause found via HERMES_DUMP_REQUESTS=1 which showed the exact
request payload being sent to Nous Portal's inference API.
Nous Portal backend will become a transparent proxy for OpenRouter-
specific parameters (provider preferences, etc.), so keep sending them
to all providers. The reasoning disabled fix is kept (that's a real
constraint of the Nous endpoint).
Both /model and /provider now show the same unified display:

  Current: anthropic/claude-opus-4.6 via OpenRouter

  Authenticated providers & models:
    [openrouter] ← active
      anthropic/claude-opus-4.6 ← current
      anthropic/claude-sonnet-4.5
      ...
    [nous]
      claude-opus-4-6
      gemini-3-flash
      ...
    [openai-codex]
      gpt-5.2-codex
      gpt-5.1-codex-mini
      ...

  Not configured: Z.AI / GLM, Kimi / Moonshot, ...

  Switch model:    /model <model-name>
  Switch provider: /model <provider>:<model-name>
  Example: /model nous:claude-opus-4-6

Users can see all authenticated providers and their models at a glance,
making it easy to switch mid-conversation.

Also added curated model lists for Nous Portal and OpenAI Codex to
hermes_cli/models.py.
@teknium1 teknium1 changed the title feat: centralized provider router + route all auxiliary LLM calls through it Mar 12, 2026
- Custom endpoints can serve any model, so skip validation for
  provider='custom' in validate_requested_model(). Previously it
  would reject any model name since there's no static catalog or
  live API to check against.
- Show clear setup instructions when switching to custom endpoint
  without OPENAI_BASE_URL/OPENAI_API_KEY configured.
- Added curated model lists for Nous Portal and OpenAI Codex to
  _PROVIDER_MODELS so /model shows their available models.
- gateway/run.py: Take main's _resolve_gateway_model() helper
- hermes_cli/setup.py: Re-apply nous-api removal after merge brought
  it back. Fix provider_idx offset (Custom is now index 3, not 4).
- tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)
@teknium1 teknium1 merged commit 9cb9d1a into main Mar 12, 2026
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant