fix(agent): include tool tokens in preflight estimate, guard context probe persistence by teknium1 · Pull Request #3164 · NousResearch/hermes-agent

teknium1 · 2026-03-26T09:00:43Z

Salvaged from PR #2600 by @paraddox (the useful pieces, without the silent reverts).

Changes

Tool-inclusive preflight estimation — Preflight compression now counts tool schema tokens. With 50+ tools, schemas add 20-30K tokens that were invisible to the old sys+msg estimate, delaying compression until the API rejected the request.

Context probe persistence guard — When stepping down context tiers after overflow errors, only provider-confirmed limits (parsed from the error message) are cached to disk. Guessed tiers from get_next_probe_tier() stay in-memory only.

Full suite: 6183 passed, 1 pre-existing failure (unrelated test_429_exhausts_all_retries_before_raising).

Co-authored-by: paraddox paraddox@users.noreply.github.com

…probe persistence Two improvements salvaged from PR #2600 (paraddox): 1. Preflight compression now counts tool schema tokens alongside system prompt and messages. With 50+ tools enabled, schemas can add 20-30K tokens that were previously invisible to the estimator, delaying compression until the API rejected the request. 2. Context probe persistence guard: when the agent steps down context tiers after a context-length error, only provider-confirmed numeric limits (parsed from the error message) are cached to disk. Guessed fallback tiers from get_next_probe_tier() stay in-memory only, preventing wrong values from polluting the persistent cache. Co-authored-by: paraddox <paraddox@users.noreply.github.com>

…probe persistence (NousResearch#3164) Two improvements salvaged from PR NousResearch#2600 (paraddox): 1. Preflight compression now counts tool schema tokens alongside system prompt and messages. With 50+ tools enabled, schemas can add 20-30K tokens that were previously invisible to the estimator, delaying compression until the API rejected the request. 2. Context probe persistence guard: when the agent steps down context tiers after a context-length error, only provider-confirmed numeric limits (parsed from the error message) are cached to disk. Guessed fallback tiers from get_next_probe_tier() stay in-memory only, preventing wrong values from polluting the persistent cache. Co-authored-by: paraddox <paraddox@users.noreply.github.com>

teknium1 merged commit 43af094 into main Mar 26, 2026
3 of 4 checks passed

teknium1 mentioned this pull request Mar 26, 2026

fix(agent): handle GLM context overflow and compaction correctly #2600

Closed

19 tasks

paraddox mentioned this pull request Mar 26, 2026

[Bug]: GLM gateway sessions can undercount request size, overflow late, and persist guessed fallback context limits #2599

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(agent): include tool tokens in preflight estimate, guard context probe persistence#3164

fix(agent): include tool tokens in preflight estimate, guard context probe persistence#3164
teknium1 merged 1 commit intomainfrom
hermes/hermes-52a54135

teknium1 commented Mar 26, 2026

Uh oh!

Labels

1 participant

Conversation

teknium1 commented Mar 26, 2026

Changes

Uh oh!

Labels

1 participant