fix: send max_tokens for Claude/OpenRouter + retry SSE connection errors#3497
Merged
fix: send max_tokens for Claude/OpenRouter + retry SSE connection errors#3497
Conversation
a7840ee to
394394e
Compare
… SSE errors Root cause: Anthropic buffers entire tool call arguments and goes silent for minutes while thinking (verified: 167s gap with zero SSE events on direct API). OpenRouter's upstream proxy times out after ~125s of inactivity and drops the connection with 'Network connection lost'. Fix: Send the x-anthropic-beta: fine-grained-tool-streaming-2025-05-14 header for Claude models on OpenRouter. This makes Anthropic stream tool call arguments token-by-token instead of buffering them, keeping the connection alive through OpenRouter's proxy. Live-tested: the exact prompt that consistently failed at ~128s now completes successfully — 2,972 lines written, 49K tokens, 8 minutes. Additional improvements: 1. Send explicit max_tokens for Claude through OpenRouter. Without it, OpenRouter defaults to 65,536 (confirmed via echo_upstream_body) — only half of Opus 4.6's 128K limit. 2. Classify SSE 'Network connection lost' as retryable in the streaming inner retry loop. The OpenAI SDK raises APIError from SSE error events, which was bypassing our transient error retry logic. 3. Actionable diagnostic guidance when stream-drop retries exhaust.
394394e to
38eef67
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes persistent "Network connection lost" failures when Claude Opus generates large tool call responses (e.g.
write_filewith a plan file) through OpenRouter.Root Cause Investigation
OpenRouter has an undocumented ~125s inactivity timeout on their upstream proxy to Anthropic. When Opus thinks for >125s before generating tool call output, the proxy kills the connection.
Proved with controlled experiments:
Chunk timing trace confirmed: all 26 chunks arrive in a 3-second burst at stream start, then complete silence for 125s while Opus thinks, then the connection dies. The direct Anthropic API handles the same silence and returns successfully at 173s.
Additionally,
echo_upstream_bodydebug revealed OpenRouter defaults tomax_tokens: 65536when we don't send it — only half of Opus 4.6's 128K output capacity.Changes
1. Send explicit
max_tokensfor Claude through OpenRouterWhen
self.max_tokensis not set by the user, we now send the model's actual output limit (from_get_anthropic_max_output()) for Claude models on OpenRouter. This ensures full output capacity instead of OpenRouter's 65K default. Extends the practical success window for medium-length responses.2. Classify SSE connection errors as retryable
OpenRouter sends
{"error":{"message":"Network connection lost."}}as SSE events when the upstream drops. The OpenAI SDK raisesAPIErrorfrom these — but our streaming retry logic only recognized httpx-level errors (ReadTimeout, RemoteProtocolError). Now SSE errors with connection-related messages (no HTTP status code) are retried with fresh connections, same as httpx errors.3. Actionable error guidance
When stream-drop retries are exhausted, the error message now explains the issue and suggests alternatives (execute_code with Python open(), write in smaller sections).
Tests
test_sse_connection_lost_retried_as_transient,test_sse_non_connection_error_falls_back_immediatelyNote for OpenRouter
This is ultimately an OpenRouter-side limitation — their upstream proxy timeout doesn't account for Opus's long thinking phase on complex tool calls. Should be reported to them.