feat(skills): smart ranking, usage tracking, and lifecycle management…#4406
Open
fathah wants to merge 3 commits intoNousResearch:mainfrom
Open
feat(skills): smart ranking, usage tracking, and lifecycle management…#4406fathah wants to merge 3 commits intoNousResearch:mainfrom
fathah wants to merge 3 commits intoNousResearch:mainfrom
Conversation
… Skills in the system prompt are now ranked by a combination of usage frequency and keyword relevance to the user's message, replacing the previous alphabetical dump. Adds a skill_usage table (schema v7) that tracks views, invocations, and management actions — feeding a normalized scoring system that surfaces the right skill for the task. New capabilities: - Token budget and max_prompt_skills caps (opt-in, defaults unchanged) - Pinned skills that survive budget cuts - Suffix stemming and domain synonym expansion for keyword matching - Auto-archival of stale skills (background thread, opt-in) - CLI: hermes skills stats/archive/restore/prune - Deduplication warnings on skill creation - Archived skills discoverable via skills_list(include_archived=True) Benchmark on 98 real skills: correct skill in top 20 improved from 29% to 93%. Verified end-to-end with LLM picking the right skill 6/6 vs 4/6 on alphabetical ordering.
… Skills in the system prompt are now ranked by a combination of usage frequency and keyword relevance to the user's message, replacing the previous alphabetical dump. Adds a skill_usage table (schema v7) that tracks views, invocations, and management actions — feeding a normalized scoring system that surfaces the right skill for the task. New capabilities: - Token budget and max_prompt_skills caps (opt-in, defaults unchanged) - Pinned skills that survive budget cuts - Suffix stemming and domain synonym expansion for keyword matching - Auto-archival of stale skills (background thread, opt-in) - CLI: hermes skills stats/archive/restore/prune - Deduplication warnings on skill creation - Archived skills discoverable via skills_list(include_archived=True) Benchmark on 98 real skills: correct skill in top 20 improved from 29% to 93%. Verified end-to-end with LLM picking the right skill 6/6 vs 4/6 on alphabetical ordering.
…s-agent into skills-overflow-fix
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Skills in the system prompt are now ranked by usage frequency + keyword relevance to the user's message, replacing the alphabetical dump that buried the right skills.
Also adds usage tracking, opt-in token budgets, auto-archival of stale skills, and CLI commands to manage skill health.
Problem
Every skill is injected into the system prompt alphabetically with no limits. With 98 skills,
ml-paper-writingsits at position 86 andsystematic-debuggingat 95. The LLM scans through dozens of irrelevant skills before finding the one that matches — or gives up and improvises.The system prompt is immune to context compression, so this gets worse over time as skills accumulate.
How it works
skill_usagetable (schema v7) records every view, invoke, and slash command. Scored with recency-weighted frequency in a single SQL query.tweet->twitter,bug->debug).Related Issue
#4356 #4379 #4319 #4391 #4404
Type of Change
Changes Made
agent/prompt_builder.py— keyword relevance scoring, suffix stemmer, synonym map, token budget, normalized merge, flat ranked outputhermes_state.py— schema v7 migration withskill_usagetable, ranking/stats/last-used queries, self-cleaning purgetools/skill_manager_tool.py— archive/restore, bundled skill detection, dedup check on create,find_archivable_skills()tools/skills_tool.py— usage tracking on skill_view,.archiveexclusion,include_archivedparam, archive fallback with restore hintagent/skill_commands.py— usage tracking on slash command invocationsagent/skill_utils.py—.archiveadded toEXCLUDED_SKILL_DIRShermes_cli/config.py—skillsconfig block (token_budget, max_prompt_skills, pinned_skills, auto_archive_days)hermes_cli/main.py— argparse for stats/archive/restore/prune subcommandshermes_cli/skills_config.py— CLI implementations for stats, archive, restore, prunerun_agent.py— loads skills config, computes usage scores, passes user_message to prompt builder, background auto-archivetests/test_skills_overflow.py— 47 tests covering all new featuresAll config defaults preserve existing behavior (0 = unlimited/disabled). No breaking changes.
How to Test
pytest tests/test_skills_overflow.py -v— 47 tests, all passpytest tests/ -k skill -q— full skill test suite, 0 new regressionsskills.token_budget: 4000— skills section capped, footer shows omitted counthermes skills stats— shows usage data after interacting with skillshermes skills archive <name>thenhermes skills restore <name>hermes skills prune --days 90— lists unused skills, prompts for confirmationBenchmark (98 real skills)
Right skill in top 20: 29% -> 93%
End-to-end with gemma-3-4b: LLM picked the correct skill 6/6 vs 4/6 on alphabetical ordering.