feat: centralized provider router, call_llm API, unified /model command by teknium1 · Pull Request #1003 · NousResearch/hermes-agent

teknium1 · 2026-03-12T02:47:07Z

Summary

Complete provider routing infrastructure overhaul. Every LLM call now flows through a centralized router that handles auth, request formatting, and provider-specific quirks in one place.

Core Architecture

`resolve_provider_client(provider, model, async_mode, raw_codex)`

Central entry point for creating LLM clients. Handles:

Auth lookup (env vars for API-key providers, OAuth tokens for Nous/Codex)
Base URL resolution with env var overrides
API format (Chat Completions vs Responses API adapter for Codex)
Provider-specific headers (OpenRouter attribution, Kimi User-Agent)
raw_codex mode for main agent (direct responses.stream access)

`call_llm(task, provider, model, messages, ...)` / `async_call_llm(...)`

Full request lifecycle:

Resolve provider+model from task config or explicit args
Get/create cached client via router
Build request kwargs (max_tokens handling, provider extra_body)
Make API call with max_tokens/max_completion_tokens retry
Return response

Config slots (`config.yaml` v7)

Every auxiliary task has a provider:model config slot:

auxiliary:
  compression: {provider: auto, model: ""}
  vision: {provider: auto, model: ""}
  web_extract: {provider: auto, model: ""}
  session_search: {provider: auto, model: ""}
  skills_hub: {provider: auto, model: ""}
  mcp: {provider: auto, model: ""}
  flush_memories: {provider: auto, model: ""}

Changes by Area

Auxiliary Consumers Migrated (all use `call_llm`/`async_call_llm`)

context_compressor.py — compression summaries
vision_tools.py — image analysis (fixed Codex bypass)
web_tools.py — page summarization
session_search_tool.py — session summarization
browser_tool.py — snapshot summarization + browser vision
mcp_tool.py — MCP sampling
skills_guard.py — skill audit
run_agent.py flush_memories — memory flush
trajectory_compressor.py — trajectory summarization
mini_swe_runner.py — SWE benchmark runner
openrouter_client.py — shared OpenRouter client

Main Agent (`run_agent.py`)

__init__: Uses router when no explicit creds provided
_try_activate_fallback: Replaced duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS dicts with single resolve_provider_client() call
Removed _resolve_fallback_credentials method

Provider Fixes

Nous Portal: Don't send OpenRouter provider preferences (only, ignore, etc.) — caused 404. Don't send reasoning: {enabled: false} — Nous requires reasoning enabled
Codex vision bypass: vision_tools.py was constructing raw AsyncOpenAI bypassing Codex adapter → crash. Fixed with get_async_vision_auxiliary_client()
Vision errors: Clear message when model doesn't support vision

Config & Auth

Removed LLM_MODEL env var — config.yaml is sole source of truth. Avoids conflicts in multi-agent setups. Removed from cli.py, auth.py, setup.py, gateway, cron
Removed nous-api provider — Nous Portal is OAuth only

UX

Unified /model and /provider — both show same view with all authenticated providers, their models, current selection, and switch examples
Added curated model lists for Nous Portal and OpenAI Codex

Test Results

3251 passed, 2 pre-existing unrelated failures

Live Testing

Tested hermes model switching between OpenRouter, Nous Portal, OpenAI Codex
Tested /model provider:model mid-conversation switching (history preserved)
Tested tool calls, vision, web search across all providers
Tested auxiliary routing (vision/compression/web_extract) with each primary provider

…error handling Three interconnected fixes for auxiliary client infrastructure: 1. CENTRALIZED PROVIDER ROUTER (auxiliary_client.py) Add resolve_provider_client(provider, model, async_mode) — a single entry point for creating properly configured clients. Given a provider name and optional model, it handles auth lookup (env vars, OAuth tokens, auth.json), base URL resolution, provider-specific headers, and API format differences (Chat Completions vs Responses API for Codex). All auxiliary consumers should route through this instead of ad-hoc env var lookups. Refactored get_text_auxiliary_client, get_async_text_auxiliary_client, and get_vision_auxiliary_client to use the router internally. 2. FIX CODEX VISION BYPASS (vision_tools.py) vision_tools.py was constructing a raw AsyncOpenAI client from the sync vision client's api_key/base_url, completely bypassing the Codex Responses API adapter. When the vision provider resolved to Codex, the raw client would hit chatgpt.com/backend-api/codex with chat.completions.create() which only supports the Responses API. Fix: Added get_async_vision_auxiliary_client() which properly wraps Codex into AsyncCodexAuxiliaryClient. vision_tools.py now uses this instead of manual client construction. 3. FIX COMPRESSION FALLBACK + VISION ERROR HANDLING - context_compressor.py: Removed _get_fallback_client() which blindly looked for OPENAI_API_KEY + OPENAI_BASE_URL (fails for Codex OAuth, API-key providers, users without OPENAI_BASE_URL set). Replaced with fallback loop through resolve_provider_client() for each known provider, with same-provider dedup. - vision_tools.py: Added error detection for vision capability failures. Returns clear message to the model when the configured model doesn't support vision, instead of a generic error. Addresses #886

Route all remaining ad-hoc auxiliary LLM call sites through resolve_provider_client() so auth, headers, and API format (Chat Completions vs Responses API) are handled consistently in one place. Files changed: - tools/openrouter_client.py: Replace manual AsyncOpenAI construction with resolve_provider_client('openrouter', async_mode=True). The shared client module now delegates entirely to the router. - tools/skills_guard.py: Replace inline OpenAI client construction (hardcoded OpenRouter base_url, manual api_key lookup, manual headers) with resolve_provider_client('openrouter'). Remove unused OPENROUTER_BASE_URL import. - trajectory_compressor.py: Add _detect_provider() to map config base_url to a provider name, then route through resolve_provider_client. Falls back to raw construction for unrecognized custom endpoints. - mini_swe_runner.py: Route default case (no explicit api_key/base_url) through resolve_provider_client('openrouter') with auto-detection fallback. Preserves direct construction when explicit creds are passed via CLI args. - agent/auxiliary_client.py: Fix stale module docstring — vision auto mode now correctly documents that Codex and custom endpoints are tried (not skipped).

Nous Portal only supports OAuth authentication. Remove the 'nous-api' provider which allowed direct API key access via NOUS_API_KEY env var. Removed from: - hermes_cli/auth.py: PROVIDER_REGISTRY entry + aliases - hermes_cli/config.py: OPTIONAL_ENV_VARS entry - hermes_cli/setup.py: setup wizard option + model selection handler (reindexed remaining provider choices) - agent/auxiliary_client.py: docstring references - tests/test_runtime_provider_resolution.py: nous-api test - tests/integration/test_web_tools.py: renamed dict key

Add centralized call_llm() and async_call_llm() functions that own the full LLM request lifecycle: 1. Resolve provider + model from task config or explicit args 2. Get or create a cached client for that provider 3. Format request args (max_tokens handling, provider extra_body) 4. Make the API call with max_tokens/max_completion_tokens retry 5. Return the response Config: expanded auxiliary section with provider:model slots for all tasks (compression, vision, web_extract, session_search, skills_hub, mcp, flush_memories). Config version bumped to 7. Migrated all auxiliary consumers: - context_compressor.py: uses call_llm(task='compression') - vision_tools.py: uses async_call_llm(task='vision') - web_tools.py: uses async_call_llm(task='web_extract') - session_search_tool.py: uses async_call_llm(task='session_search') - browser_tool.py: uses call_llm(task='vision'/'web_extract') - mcp_tool.py: uses call_llm(task='mcp') - skills_guard.py: uses call_llm(provider='openrouter') - run_agent.py flush_memories: uses call_llm(task='flush_memories') Tests updated for context_compressor and MCP tool. Some test mocks still need updating (15 remaining failures from mock pattern changes, 2 pre-existing).

Update 14 test files to use the new call_llm/async_call_llm mock patterns instead of the old get_text_auxiliary_client/ get_vision_auxiliary_client tuple returns. - vision_tools tests: mock async_call_llm instead of _aux_async_client - browser tests: mock call_llm instead of _aux_vision_client - flush_memories tests: mock call_llm instead of get_text_auxiliary_client - session_search tests: mock async_call_llm with RuntimeError - mcp_tool tests: fix whitelist model config, use side_effect for multi-response tests - auxiliary_config_bridge: update for model=None (resolved in router) 3251 passed, 2 pre-existing unrelated failures.

Phase 2 of the provider router migration — route the main agent's client construction and fallback activation through resolve_provider_client() instead of duplicated ad-hoc logic. run_agent.py: - __init__: When no explicit api_key/base_url, use resolve_provider_client(provider, raw_codex=True) for client construction. Explicit creds (from CLI/gateway runtime provider) still construct directly. - _try_activate_fallback: Replace _resolve_fallback_credentials and its duplicated _FALLBACK_API_KEY_PROVIDERS / _FALLBACK_OAUTH_PROVIDERS dicts with a single resolve_provider_client() call. The router handles all provider types (API-key, OAuth, Codex) centrally. - Remove _resolve_fallback_credentials method and both fallback dicts. agent/auxiliary_client.py: - Add raw_codex parameter to resolve_provider_client(). When True, returns the raw OpenAI client for Codex providers instead of wrapping in CodexAuxiliaryClient. The main agent needs this for direct responses.stream() access. 3251 passed, 2 pre-existing unrelated failures.

…ource of truth Model selection now comes exclusively from config.yaml (set via 'hermes model' or 'hermes setup'). The LLM_MODEL env var is no longer read or written anywhere in production code. Why: env vars are per-process/per-user and would conflict in multi-agent or multi-tenant setups. Config.yaml is file-based and can be scoped per-user or eventually per-session. Changes: - cli.py: Read model from CLI_CONFIG only, not LLM_MODEL/OPENAI_MODEL - hermes_cli/auth.py: _save_model_choice() no longer writes LLM_MODEL to .env - hermes_cli/setup.py: Remove 12 save_env_value('LLM_MODEL', ...) calls from all provider setup flows - gateway/run.py: Remove LLM_MODEL fallback (HERMES_MODEL still works for gateway process runtime) - cron/scheduler.py: Same - agent/auxiliary_client.py: Remove LLM_MODEL from custom endpoint model detection

Two bugs in _build_api_kwargs that broke Nous Portal: 1. Provider preferences (only, ignore, order, sort) are OpenRouter- specific routing features. They were being sent in extra_body to ALL providers, including Nous Portal. When the config had providers_only=['google-vertex'], Nous Portal returned 404 'Inference host not found' because it doesn't have a google-vertex backend. Fix: Only include provider preferences when _is_openrouter is True. 2. Reasoning config with enabled=false was being sent to Nous Portal, which requires reasoning and returns 400 'Reasoning is mandatory for this endpoint and cannot be disabled.' Fix: Omit the reasoning parameter for Nous when enabled=false. Root cause found via HERMES_DUMP_REQUESTS=1 which showed the exact request payload being sent to Nous Portal's inference API.

Nous Portal backend will become a transparent proxy for OpenRouter- specific parameters (provider preferences, etc.), so keep sending them to all providers. The reasoning disabled fix is kept (that's a real constraint of the Nous endpoint).

Both /model and /provider now show the same unified display: Current: anthropic/claude-opus-4.6 via OpenRouter Authenticated providers & models: [openrouter] ← active anthropic/claude-opus-4.6 ← current anthropic/claude-sonnet-4.5 ... [nous] claude-opus-4-6 gemini-3-flash ... [openai-codex] gpt-5.2-codex gpt-5.1-codex-mini ... Not configured: Z.AI / GLM, Kimi / Moonshot, ... Switch model: /model <model-name> Switch provider: /model <provider>:<model-name> Example: /model nous:claude-opus-4-6 Users can see all authenticated providers and their models at a glance, making it easy to switch mid-conversation. Also added curated model lists for Nous Portal and OpenAI Codex to hermes_cli/models.py.

- Custom endpoints can serve any model, so skip validation for provider='custom' in validate_requested_model(). Previously it would reject any model name since there's no static catalog or live API to check against. - Show clear setup instructions when switching to custom endpoint without OPENAI_BASE_URL/OPENAI_API_KEY configured. - Added curated model lists for Nous Portal and OpenAI Codex to _PROVIDER_MODELS so /model shows their available models.

- gateway/run.py: Take main's _resolve_gateway_model() helper - hermes_cli/setup.py: Re-apply nous-api removal after merge brought it back. Fix provider_idx offset (Custom is now index 3, not 4). - tests/hermes_cli/test_setup.py: Fix custom setup test index (3→4)

teknium1 added 2 commits March 11, 2026 19:46

teknium1 changed the title ~~feat: centralized provider router + fix Codex vision bypass + vision error handling~~ Mar 12, 2026

teknium1 added 8 commits March 11, 2026 20:14

teknium1 changed the title ~~feat: centralized provider router + route all auxiliary LLM calls through it~~ Mar 12, 2026

teknium1 added 2 commits March 11, 2026 23:29

teknium1 merged commit 9cb9d1a into main Mar 12, 2026
1 check failed

teknium1 mentioned this pull request Mar 12, 2026

fix: strip call_id/response_item_id from tool_calls for Mistral compatibility #1058

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: centralized provider router, call_llm API, unified /model command#1003

feat: centralized provider router, call_llm API, unified /model command#1003
teknium1 merged 12 commits intomainfrom
hermes/hermes-cf9f7d54

teknium1 commented Mar 12, 2026 •

edited

Loading

Uh oh!

Labels

1 participant

Conversation

teknium1 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Core Architecture

resolve_provider_client(provider, model, async_mode, raw_codex)

call_llm(task, provider, model, messages, ...) / async_call_llm(...)

Config slots (config.yaml v7)

Changes by Area

Auxiliary Consumers Migrated (all use call_llm/async_call_llm)

Main Agent (run_agent.py)

Provider Fixes

Config & Auth

UX

Test Results

Live Testing

Uh oh!

Labels

1 participant

teknium1 commented Mar 12, 2026 •

edited

Loading

`resolve_provider_client(provider, model, async_mode, raw_codex)`

`call_llm(task, provider, model, messages, ...)` / `async_call_llm(...)`

Config slots (`config.yaml` v7)

Auxiliary Consumers Migrated (all use `call_llm`/`async_call_llm`)

Main Agent (`run_agent.py`)