-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Anthropic "prompt is too long" 400 error not detected as context length error — aborts instead of compressing #813
Description
Bug Description
When the Anthropic API returns a 400 error with the message "prompt is too long: 233153 tokens > 200000 maximum", the agent treats it as a non-retryable client error and immediately aborts — instead of triggering context compression.
This happens because "prompt is too long" is not in the is_context_length_error phrase list in run_agent.py (line ~3685). The error falls through to the generic 400 handler which gives up immediately.
Steps to Reproduce
- Use an Anthropic model (via OpenRouter or direct API) with a 200k context limit
- Have a long-running gateway session (Discord, Telegram, etc.) that accumulates large conversation history
- Send a message when the context exceeds 200k tokens
- Actual: Agent returns
"Non-retryable client error detected. Aborting immediately." - Expected: Agent triggers context compression, summarizes middle turns, and retries
Root Cause
The is_context_length_error check in run_agent.py (~line 3685) checks for these phrases:
'context length', 'context size', 'maximum context',
'token limit', 'too many tokens', 'reduce the length',
'exceeds the limit', 'context window',
'request entity too large',Anthropic's error format "prompt is too long: N tokens > M maximum" doesn't match any of these. It falls through to the generic 4xx handler at line ~3755 which treats all unrecognized 400 errors as non-retryable and aborts.
Contributing Factors
Token estimation undercounts JSON-heavy tool messages
The preflight compression uses estimate_messages_tokens_rough() which divides total chars by 4. Tool call messages with JSON payloads tokenize at closer to 2-3 chars/token, so the rough estimate can significantly undercount — causing preflight compression to not trigger when it should.
Gateway session hygiene only estimates simple messages
The gateway's session hygiene auto-compression (~line 1002 in gateway/run.py) filters to only user/assistant messages for its token estimate, but the actual API call includes full tool_calls and tool results. A session can pass the hygiene check while still being over-limit with the full message payload.
Proposed Fix
Primary — run_agent.py
Add 'prompt is too long' to the is_context_length_error detection list.
Secondary — agent/model_metadata.py
Make estimate_messages_tokens_rough() more conservative (e.g. 3.2 chars/token + per-message overhead) so preflight compression triggers earlier for JSON-heavy conversations.
Additional Notes
- Other providers may have their own unique error messages for context overflow that aren't currently detected — might want a more robust regex-based approach rather than exact phrase matching
- The
parse_context_limit_from_error()function already correctly parses the200000limit from this error format, so once detection works the step-down logic is fine
Environment
- Hermes Agent main branch
- Model: anthropic/claude-opus-4.6 via OpenRouter
- Platform: Discord gateway
- Error:
Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'prompt is too long: 233153 tokens > 200000 maximum'}}