Skip to content

Anthropic "prompt is too long" 400 error not detected as context length error — aborts instead of compressing #813

@DaAwesomeRazor

Description

@DaAwesomeRazor

Bug Description

When the Anthropic API returns a 400 error with the message "prompt is too long: 233153 tokens > 200000 maximum", the agent treats it as a non-retryable client error and immediately aborts — instead of triggering context compression.

This happens because "prompt is too long" is not in the is_context_length_error phrase list in run_agent.py (line ~3685). The error falls through to the generic 400 handler which gives up immediately.

Steps to Reproduce

  1. Use an Anthropic model (via OpenRouter or direct API) with a 200k context limit
  2. Have a long-running gateway session (Discord, Telegram, etc.) that accumulates large conversation history
  3. Send a message when the context exceeds 200k tokens
  4. Actual: Agent returns "Non-retryable client error detected. Aborting immediately."
  5. Expected: Agent triggers context compression, summarizes middle turns, and retries

Root Cause

The is_context_length_error check in run_agent.py (~line 3685) checks for these phrases:

'context length', 'context size', 'maximum context',
'token limit', 'too many tokens', 'reduce the length',
'exceeds the limit', 'context window',
'request entity too large',

Anthropic's error format "prompt is too long: N tokens > M maximum" doesn't match any of these. It falls through to the generic 4xx handler at line ~3755 which treats all unrecognized 400 errors as non-retryable and aborts.

Contributing Factors

Token estimation undercounts JSON-heavy tool messages

The preflight compression uses estimate_messages_tokens_rough() which divides total chars by 4. Tool call messages with JSON payloads tokenize at closer to 2-3 chars/token, so the rough estimate can significantly undercount — causing preflight compression to not trigger when it should.

Gateway session hygiene only estimates simple messages

The gateway's session hygiene auto-compression (~line 1002 in gateway/run.py) filters to only user/assistant messages for its token estimate, but the actual API call includes full tool_calls and tool results. A session can pass the hygiene check while still being over-limit with the full message payload.

Proposed Fix

Primary — run_agent.py

Add 'prompt is too long' to the is_context_length_error detection list.

Secondary — agent/model_metadata.py

Make estimate_messages_tokens_rough() more conservative (e.g. 3.2 chars/token + per-message overhead) so preflight compression triggers earlier for JSON-heavy conversations.

Additional Notes

  • Other providers may have their own unique error messages for context overflow that aren't currently detected — might want a more robust regex-based approach rather than exact phrase matching
  • The parse_context_limit_from_error() function already correctly parses the 200000 limit from this error format, so once detection works the step-down logic is fine

Environment

  • Hermes Agent main branch
  • Model: anthropic/claude-opus-4.6 via OpenRouter
  • Platform: Discord gateway
  • Error: Error code: 400 - {'type': 'error', 'error': {'type': 'invalid_request_error', 'message': 'prompt is too long: 233153 tokens > 200000 maximum'}}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions