Skip to content

fix(agent): include tool tokens in preflight estimate, guard context probe persistence#3164

Merged
teknium1 merged 1 commit intomainfrom
hermes/hermes-52a54135
Mar 26, 2026
Merged

fix(agent): include tool tokens in preflight estimate, guard context probe persistence#3164
teknium1 merged 1 commit intomainfrom
hermes/hermes-52a54135

Conversation

@teknium1
Copy link
Copy Markdown
Contributor

Salvaged from PR #2600 by @paraddox (the useful pieces, without the silent reverts).

Changes

Tool-inclusive preflight estimation — Preflight compression now counts tool schema tokens. With 50+ tools, schemas add 20-30K tokens that were invisible to the old sys+msg estimate, delaying compression until the API rejected the request.

Context probe persistence guard — When stepping down context tiers after overflow errors, only provider-confirmed limits (parsed from the error message) are cached to disk. Guessed tiers from get_next_probe_tier() stay in-memory only.

Full suite: 6183 passed, 1 pre-existing failure (unrelated test_429_exhausts_all_retries_before_raising).

Co-authored-by: paraddox paraddox@users.noreply.github.com

…probe persistence

Two improvements salvaged from PR #2600 (paraddox):

1. Preflight compression now counts tool schema tokens alongside system
   prompt and messages.  With 50+ tools enabled, schemas can add 20-30K
   tokens that were previously invisible to the estimator, delaying
   compression until the API rejected the request.

2. Context probe persistence guard: when the agent steps down context
   tiers after a context-length error, only provider-confirmed numeric
   limits (parsed from the error message) are cached to disk.  Guessed
   fallback tiers from get_next_probe_tier() stay in-memory only,
   preventing wrong values from polluting the persistent cache.

Co-authored-by: paraddox <paraddox@users.noreply.github.com>
@teknium1 teknium1 merged commit 43af094 into main Mar 26, 2026
3 of 4 checks passed
outsourc-e pushed a commit to outsourc-e/hermes-agent that referenced this pull request Mar 26, 2026
…probe persistence (NousResearch#3164)

Two improvements salvaged from PR NousResearch#2600 (paraddox):

1. Preflight compression now counts tool schema tokens alongside system
   prompt and messages.  With 50+ tools enabled, schemas can add 20-30K
   tokens that were previously invisible to the estimator, delaying
   compression until the API rejected the request.

2. Context probe persistence guard: when the agent steps down context
   tiers after a context-length error, only provider-confirmed numeric
   limits (parsed from the error message) are cached to disk.  Guessed
   fallback tiers from get_next_probe_tier() stay in-memory only,
   preventing wrong values from polluting the persistent cache.

Co-authored-by: paraddox <paraddox@users.noreply.github.com>
StreamOfRon pushed a commit to StreamOfRon/hermes-agent that referenced this pull request Mar 29, 2026
…probe persistence (NousResearch#3164)

Two improvements salvaged from PR NousResearch#2600 (paraddox):

1. Preflight compression now counts tool schema tokens alongside system
   prompt and messages.  With 50+ tools enabled, schemas can add 20-30K
   tokens that were previously invisible to the estimator, delaying
   compression until the API rejected the request.

2. Context probe persistence guard: when the agent steps down context
   tiers after a context-length error, only provider-confirmed numeric
   limits (parsed from the error message) are cached to disk.  Guessed
   fallback tiers from get_next_probe_tier() stay in-memory only,
   preventing wrong values from polluting the persistent cache.

Co-authored-by: paraddox <paraddox@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

1 participant