Skip to content

fix: send max_tokens for Claude/OpenRouter + retry SSE connection errors#3497

Merged
teknium1 merged 1 commit intomainfrom
hermes/hermes-8a2f15b3
Mar 28, 2026
Merged

fix: send max_tokens for Claude/OpenRouter + retry SSE connection errors#3497
teknium1 merged 1 commit intomainfrom
hermes/hermes-8a2f15b3

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

Summary

Fixes persistent "Network connection lost" failures when Claude Opus generates large tool call responses (e.g. write_file with a plan file) through OpenRouter.

Root Cause Investigation

OpenRouter has an undocumented ~125s inactivity timeout on their upstream proxy to Anthropic. When Opus thinks for >125s before generating tool call output, the proxy kills the connection.

Proved with controlled experiments:

Path Tools Result Time
OpenRouter, no max_tokens ❌ Network connection lost 128s
OpenRouter, max_tokens=128K ❌ Network connection lost 129s
OpenRouter, no tools (text) ✅ 12K tokens 155s
OpenRouter, small tool call ✅ 2.8K chars 18s
OpenRouter, medium tool call ✅ 19K chars 75s
Direct Anthropic API 47K chars 173s

Chunk timing trace confirmed: all 26 chunks arrive in a 3-second burst at stream start, then complete silence for 125s while Opus thinks, then the connection dies. The direct Anthropic API handles the same silence and returns successfully at 173s.

Additionally, echo_upstream_body debug revealed OpenRouter defaults to max_tokens: 65536 when we don't send it — only half of Opus 4.6's 128K output capacity.

Changes

1. Send explicit max_tokens for Claude through OpenRouter

When self.max_tokens is not set by the user, we now send the model's actual output limit (from _get_anthropic_max_output()) for Claude models on OpenRouter. This ensures full output capacity instead of OpenRouter's 65K default. Extends the practical success window for medium-length responses.

2. Classify SSE connection errors as retryable

OpenRouter sends {"error":{"message":"Network connection lost."}} as SSE events when the upstream drops. The OpenAI SDK raises APIError from these — but our streaming retry logic only recognized httpx-level errors (ReadTimeout, RemoteProtocolError). Now SSE errors with connection-related messages (no HTTP status code) are retried with fresh connections, same as httpx errors.

3. Actionable error guidance

When stream-drop retries are exhausted, the error message now explains the issue and suggests alternatives (execute_code with Python open(), write in smaller sections).

Tests

  • 2 new tests: test_sse_connection_lost_retried_as_transient, test_sse_non_connection_error_falls_back_immediately
  • All 220 streaming + run_agent tests pass
  • Full suite: 6532 passed

Note for OpenRouter

This is ultimately an OpenRouter-side limitation — their upstream proxy timeout doesn't account for Opus's long thinking phase on complex tool calls. Should be reported to them.

@teknium1 teknium1 force-pushed the hermes/hermes-8a2f15b3 branch from a7840ee to 394394e Compare March 28, 2026 14:35
… SSE errors

Root cause: Anthropic buffers entire tool call arguments and goes silent
for minutes while thinking (verified: 167s gap with zero SSE events on
direct API).  OpenRouter's upstream proxy times out after ~125s of
inactivity and drops the connection with 'Network connection lost'.

Fix: Send the x-anthropic-beta: fine-grained-tool-streaming-2025-05-14
header for Claude models on OpenRouter.  This makes Anthropic stream
tool call arguments token-by-token instead of buffering them, keeping
the connection alive through OpenRouter's proxy.

Live-tested: the exact prompt that consistently failed at ~128s now
completes successfully — 2,972 lines written, 49K tokens, 8 minutes.

Additional improvements:

1. Send explicit max_tokens for Claude through OpenRouter.  Without it,
   OpenRouter defaults to 65,536 (confirmed via echo_upstream_body) —
   only half of Opus 4.6's 128K limit.

2. Classify SSE 'Network connection lost' as retryable in the streaming
   inner retry loop.  The OpenAI SDK raises APIError from SSE error
   events, which was bypassing our transient error retry logic.

3. Actionable diagnostic guidance when stream-drop retries exhaust.
@teknium1 teknium1 force-pushed the hermes/hermes-8a2f15b3 branch from 394394e to 38eef67 Compare March 28, 2026 14:54
@teknium1 teknium1 merged commit 80a899a into main Mar 28, 2026
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant