fix(agent): include tool tokens in preflight estimate, guard context probe persistence#3164
Merged
fix(agent): include tool tokens in preflight estimate, guard context probe persistence#3164
Conversation
…probe persistence Two improvements salvaged from PR #2600 (paraddox): 1. Preflight compression now counts tool schema tokens alongside system prompt and messages. With 50+ tools enabled, schemas can add 20-30K tokens that were previously invisible to the estimator, delaying compression until the API rejected the request. 2. Context probe persistence guard: when the agent steps down context tiers after a context-length error, only provider-confirmed numeric limits (parsed from the error message) are cached to disk. Guessed fallback tiers from get_next_probe_tier() stay in-memory only, preventing wrong values from polluting the persistent cache. Co-authored-by: paraddox <paraddox@users.noreply.github.com>
19 tasks
1 task
outsourc-e
pushed a commit
to outsourc-e/hermes-agent
that referenced
this pull request
Mar 26, 2026
…probe persistence (NousResearch#3164) Two improvements salvaged from PR NousResearch#2600 (paraddox): 1. Preflight compression now counts tool schema tokens alongside system prompt and messages. With 50+ tools enabled, schemas can add 20-30K tokens that were previously invisible to the estimator, delaying compression until the API rejected the request. 2. Context probe persistence guard: when the agent steps down context tiers after a context-length error, only provider-confirmed numeric limits (parsed from the error message) are cached to disk. Guessed fallback tiers from get_next_probe_tier() stay in-memory only, preventing wrong values from polluting the persistent cache. Co-authored-by: paraddox <paraddox@users.noreply.github.com>
StreamOfRon
pushed a commit
to StreamOfRon/hermes-agent
that referenced
this pull request
Mar 29, 2026
…probe persistence (NousResearch#3164) Two improvements salvaged from PR NousResearch#2600 (paraddox): 1. Preflight compression now counts tool schema tokens alongside system prompt and messages. With 50+ tools enabled, schemas can add 20-30K tokens that were previously invisible to the estimator, delaying compression until the API rejected the request. 2. Context probe persistence guard: when the agent steps down context tiers after a context-length error, only provider-confirmed numeric limits (parsed from the error message) are cached to disk. Guessed fallback tiers from get_next_probe_tier() stay in-memory only, preventing wrong values from polluting the persistent cache. Co-authored-by: paraddox <paraddox@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Salvaged from PR #2600 by @paraddox (the useful pieces, without the silent reverts).
Changes
Tool-inclusive preflight estimation — Preflight compression now counts tool schema tokens. With 50+ tools, schemas add 20-30K tokens that were invisible to the old sys+msg estimate, delaying compression until the API rejected the request.
Context probe persistence guard — When stepping down context tiers after overflow errors, only provider-confirmed limits (parsed from the error message) are cached to disk. Guessed tiers from
get_next_probe_tier()stay in-memory only.Full suite: 6183 passed, 1 pre-existing failure (unrelated
test_429_exhausts_all_retries_before_raising).Co-authored-by: paraddox paraddox@users.noreply.github.com