fix(agent): detect thinking-budget exhaustion on truncation, skip useless retries by teknium1 · Pull Request #3444 · NousResearch/hermes-agent

teknium1 · 2026-03-27T20:16:21Z

Summary

When finish_reason='length' and the response contains only reasoning content (think blocks or empty text), the model exhausted its entire output token budget on thinking with nothing left for the actual response.

Before this fix, two things happened depending on the API mode:

chat_completions: 3 useless continuation retry attempts (the model hits the same token limit every time, wasting ~30s and 3 API calls)
anthropic/codex: generic "Response truncated due to output length limit" error with rollback — gives no indication that reasoning was the cause

After this fix: the think-only + length condition is detected immediately. Returns a targeted error:

Model used all output tokens on reasoning with none left for the response. Try lowering reasoning effort or increasing max_tokens.

This saves 2 wasted API calls on the chat_completions path and gives users actionable guidance.

What's NOT changed

The existing think-only retry logic for finish_reason='stop' is untouched — that path handles genuine model glitches where the model stopped intentionally but only produced reasoning. Retrying there is correct and useful.

How it works

After extracting finish_reason and before entering the continuation/rollback paths, the code now:

Extracts the text content from the response (mode-aware: chat_completions reads message.content, anthropic reads text-type content blocks)
Checks if content is think-only using the existing _has_content_after_think_block() helper
If think-only AND finish_reason='length' → return immediately with the targeted error
If there's real content → fall through to the existing continuation/rollback logic unchanged

Companion to PR #3426

PR #3426 increased the Anthropic adapter's max_tokens from 16K to the model's native limit (64-128K), which dramatically reduces how often thinking-budget exhaustion occurs. This PR handles the remaining edge cases where it still can happen (user-configured low max_tokens, very complex reasoning with high effort, etc.).

Test plan

3 new tests: think-only + length skips continuation, empty content + length detected, normal truncation still continues
Updated existing parametrized test (removed the think-only + continuation case which is now handled differently)
Full suite: 6497 passed, 1 pre-existing unrelated failure (test_auto_does_not_select_copilot_from_github_token — env pollution from local HuggingFace token)

…less retries When finish_reason='length' and the response contains only reasoning (think blocks or empty content), the model exhausted its output token budget on thinking with nothing left for the actual response. Previously, this fell into either: - chat_completions: 3 useless continuation retries (model hits same limit) - anthropic/codex: generic 'Response truncated' error with rollback Now: detect the think-only + length condition early and return immediately with a targeted error message: 'Model used all output tokens on reasoning with none left for the response. Try lowering reasoning effort or increasing max_tokens.' This saves 2 wasted API calls on the chat_completions path and gives users actionable guidance instead of a cryptic error. The existing think-only retry logic (finish_reason='stop') is unchanged — that's a genuine model glitch where retrying can help.

teknium1 force-pushed the hermes/hermes-9420d6a3 branch from bc4ab38 to 4da939a Compare March 27, 2026 20:31

teknium1 merged commit 8fdfc4b into main Mar 27, 2026
4 checks passed

teknium1 mentioned this pull request Mar 27, 2026

Response truncated due to output length limit #2706

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): detect thinking-budget exhaustion on truncation, skip useless retries#3444

fix(agent): detect thinking-budget exhaustion on truncation, skip useless retries#3444
teknium1 merged 1 commit intomainfrom
hermes/hermes-9420d6a3

teknium1 commented Mar 27, 2026

Uh oh!

Labels

1 participant

Conversation