Fix context overrun crash with local LLM backends by ch3ronsa · Pull Request #403 · NousResearch/hermes-agent

ch3ronsa · 2026-03-04T22:13:05Z

Fixes #348

Problem

Local inference backends (LM Studio, Ollama, llama.cpp) return HTTP 400 with error messages like "Context size has been exceeded" when the context window is full. The context-length error phrase list did not include "context size" or "context window", so these errors fell through to the generic 4xx abort handler — crashing the session instead of triggering compression.

Error flow before this fix:

LM Studio returns 400: "Context size has been exceeded"
  → 413 check: no match
  → 4xx abort check: status 400 matches (400 >= 400 and < 500)
  → ❌ "Non-retryable client error. Aborting immediately."
  → Session crashes, context never compressed

Fix

Moved context-length check above the generic 4xx handler (same pattern as the existing 413 check) so context errors are caught before they reach the abort path
Added missing phrases to the detection list: "context size" (LM Studio), "context window" (Ollama)
Guarded the 4xx handler with not is_context_length_error so context-related 400s are never treated as non-retryable

Error flow after this fix:

LM Studio returns 400: "Context size has been exceeded"
  → 413 check: no match
  → Context-length check: "context size" matches!
  → ⚠️ "Context length exceeded - attempting compression..."
  → 🗜️ Compressed 42 → 18 messages, retrying...
  → ✅ Session continues

Tested error messages

Backend	HTTP	Error message	Result
LM Studio	400	"Context size has been exceeded"	COMPRESS
Ollama	400	"context window exceeded"	COMPRESS
llama.cpp	400	"the context size is too small"	COMPRESS
vLLM	400	"maximum context length is 8192"	COMPRESS
OpenAI	400	"maximum context length is 128000"	COMPRESS
Auth error	401	"invalid api key"	ABORT
Model error	404	"model not found"	ABORT
Generic 400	400	"invalid json in request body"	ABORT

Test plan

Verified LM Studio's exact error message from Context overrun #348 now triggers compression
Verified Ollama and llama.cpp error patterns also match
Verified non-context 4xx errors (auth, model, generic) still abort correctly
No changes to 413 handling (unchanged)

…#348) Local backends (LM Studio, Ollama, llama.cpp) return HTTP 400 with messages like "Context size has been exceeded" when the context window is full. The error phrase list did not include "context size" or "context window", so these errors fell through to the generic 4xx abort handler instead of triggering compression. Changes: - Move context-length check above generic 4xx handler so it runs first (same pattern as the existing 413 check) - Add "context size" and "context window" to the phrase list - Guard 4xx handler with `not is_context_length_error` to prevent context-related 400s from being treated as non-retryable

teknium1 · 2026-03-05T01:48:59Z

Merged in commit 3220bb8 — your PR was based on an older main where the error handler ordering was different, so it had merge conflicts, but the fix was applied with your changes preserved (added context size and context window phrases, removed error code: 400 from non-retryable list). Thanks for the thorough analysis and test matrix! 🙏

teknium1 merged commit 3220bb8 into NousResearch:main Mar 5, 2026

teknium1 mentioned this pull request Mar 5, 2026

Context overrun #348

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix context overrun crash with local LLM backends#403

Fix context overrun crash with local LLM backends#403
teknium1 merged 1 commit intoNousResearch:mainfrom
ch3ronsa:fix/context-size-error-phrase

ch3ronsa commented Mar 4, 2026

teknium1 commented Mar 5, 2026

Labels

2 participants

Conversation