Anthropic "prompt is too long" 400 error not detected as context length error — aborts instead of compressing

Bug Description

When the Anthropic API returns a 400 error with the message "prompt is too long: 233153 tokens > 200000 maximum", the agent treats it as a non-retryable client error and immediately aborts — instead of triggering context compression.

This happens because "prompt is too long" is not in the is_context_length_error phrase list in run_agent.py (line ~3685). The error falls through to the generic 400 handler which gives up immediately.

Steps to Reproduce

Use an Anthropic model (via OpenRouter or direct API) with a 200k context limit
Have a long-running gateway session (Discord, Telegram, etc.) that accumulates large conversation history
Send a message when the context exceeds 200k tokens
Actual: Agent returns "Non-retryable client error detected. Aborting immediately."
Expected: Agent triggers context compression, summarizes middle turns, and retries

Root Cause

The is_context_length_error check in run_agent.py (~line 3685) checks for these phrases:

'context length', 'context size', 'maximum context',
'token limit', 'too many tokens', 'reduce the length',
'exceeds the limit', 'context window',
'request entity too large',

Anthropic's error format "prompt is too long: N tokens > M maximum" doesn't match any of these. It falls through to the generic 4xx handler at line ~3755 which treats all unrecognized 400 errors as non-retryable and aborts.

Contributing Factors

Token estimation undercounts JSON-heavy tool messages

The preflight compression uses estimate_messages_tokens_rough() which divides total chars by 4. Tool call messages with JSON payloads tokenize at closer to 2-3 chars/token, so the rough estimate can significantly undercount — causing preflight compression to not trigger when it should.

Gateway session hygiene only estimates simple messages

The gateway's session hygiene auto-compression (~line 1002 in gateway/run.py) filters to only user/assistant messages for its token estimate, but the actual API call includes full tool_calls and tool results. A session can pass the hygiene check while still being over-limit with the full message payload.

Proposed Fix

Primary — `run_agent.py`

Add 'prompt is too long' to the is_context_length_error detection list.

Secondary — `agent/model_metadata.py`

Make estimate_messages_tokens_rough() more conservative (e.g. 3.2 chars/token + per-message overhead) so preflight compression triggers earlier for JSON-heavy conversations.

Additional Notes

Other providers may have their own unique error messages for context overflow that aren't currently detected — might want a more robust regex-based approach rather than exact phrase matching
The parse_context_limit_from_error() function already correctly parses the 200000 limit from this error format, so once detection works the step-down logic is fine

Environment

Hermes Agent main branch
Model: anthropic/claude-opus-4.6 via OpenRouter
Platform: Discord gateway
Error: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'prompt is too long: 233153 tokens > 200000 maximum'}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anthropic "prompt is too long" 400 error not detected as context length error — aborts instead of compressing #813

Bug Description

Steps to Reproduce

Root Cause

Contributing Factors

Token estimation undercounts JSON-heavy tool messages

Gateway session hygiene only estimates simple messages

Proposed Fix

Primary — `run_agent.py`

Secondary — `agent/model_metadata.py`

Additional Notes

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Anthropic "prompt is too long" 400 error not detected as context length error — aborts instead of compressing #813

Description

Bug Description

Steps to Reproduce

Root Cause

Contributing Factors

Token estimation undercounts JSON-heavy tool messages

Gateway session hygiene only estimates simple messages

Proposed Fix

Primary — run_agent.py

Secondary — agent/model_metadata.py

Additional Notes

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Primary — `run_agent.py`

Secondary — `agent/model_metadata.py`