feat(skills): smart ranking, usage tracking, and lifecycle management… by fathah · Pull Request #4406 · NousResearch/hermes-agent

fathah · 2026-04-01T06:37:36Z

What does this PR do?

Skills in the system prompt are now ranked by usage frequency + keyword relevance to the user's message, replacing the alphabetical dump that buried the right skills.

Also adds usage tracking, opt-in token budgets, auto-archival of stale skills, and CLI commands to manage skill health.

Problem

Every skill is injected into the system prompt alphabetically with no limits. With 98 skills, ml-paper-writing sits at position 86 and systematic-debugging at 95. The LLM scans through dozens of irrelevant skills before finding the one that matches — or gives up and improvises.

The system prompt is immune to context compression, so this gets worse over time as skills accumulate.

How it works

Usage tracking — skill_usage table (schema v7) records every view, invoke, and slash command. Scored with recency-weighted frequency in a single SQL query.
Keyword relevance — Jaccard similarity between user message and skill metadata (name, description, tags), expanded with suffix stemming and a domain synonym map (tweet -> twitter, bug -> debug).
Normalized merge — both signals normalized to 0-1 before combining. Relevance weighted 3x so query-relevant skills beat daily-driver habits.
Flat output — when scores are active, skills listed in score order instead of grouped by category.

Related Issue

#4356 #4379 #4319 #4391 #4404

Type of Change

✨ New feature (non-breaking change that adds functionality)
✅ Tests (adding or improving test coverage)

Changes Made

agent/prompt_builder.py — keyword relevance scoring, suffix stemmer, synonym map, token budget, normalized merge, flat ranked output
hermes_state.py — schema v7 migration with skill_usage table, ranking/stats/last-used queries, self-cleaning purge
tools/skill_manager_tool.py — archive/restore, bundled skill detection, dedup check on create, find_archivable_skills()
tools/skills_tool.py — usage tracking on skill_view, .archive exclusion, include_archived param, archive fallback with restore hint
agent/skill_commands.py — usage tracking on slash command invocations
agent/skill_utils.py — .archive added to EXCLUDED_SKILL_DIRS
hermes_cli/config.py — skills config block (token_budget, max_prompt_skills, pinned_skills, auto_archive_days)
hermes_cli/main.py — argparse for stats/archive/restore/prune subcommands
hermes_cli/skills_config.py — CLI implementations for stats, archive, restore, prune
run_agent.py — loads skills config, computes usage scores, passes user_message to prompt builder, background auto-archive
tests/test_skills_overflow.py — 47 tests covering all new features

All config defaults preserve existing behavior (0 = unlimited/disabled). No breaking changes.

How to Test

pytest tests/test_skills_overflow.py -v — 47 tests, all pass
pytest tests/ -k skill -q — full skill test suite, 0 new regressions
Start hermes with default config — all skills appear as before
Set skills.token_budget: 4000 — skills section capped, footer shows omitted count
hermes skills stats — shows usage data after interacting with skills
hermes skills archive <name> then hermes skills restore <name>
hermes skills prune --days 90 — lists unused skills, prompts for confirmation

Benchmark (98 real skills)

Query	Before	After
"write a research paper for NeurIPS"	ml-paper-writing 86, arxiv 82	2, 3
"set up a vector database for RAG"	qdrant 73, pinecone 72, chroma 70	5, 7, 8
"post a tweet about my project"	xitter 90	2
"debug my python code that crashes"	systematic-debugging 95	9
"find a restaurant nearby"	find-nearby 27	1

Right skill in top 20: 29% -> 93%

End-to-end with gemma-3-4b: LLM picked the correct skill 6/6 vs 4/6 on alphabetical ordering.

… Skills in the system prompt are now ranked by a combination of usage frequency and keyword relevance to the user's message, replacing the previous alphabetical dump. Adds a skill_usage table (schema v7) that tracks views, invocations, and management actions — feeding a normalized scoring system that surfaces the right skill for the task. New capabilities: - Token budget and max_prompt_skills caps (opt-in, defaults unchanged) - Pinned skills that survive budget cuts - Suffix stemming and domain synonym expansion for keyword matching - Auto-archival of stale skills (background thread, opt-in) - CLI: hermes skills stats/archive/restore/prune - Deduplication warnings on skill creation - Archived skills discoverable via skills_list(include_archived=True) Benchmark on 98 real skills: correct skill in top 20 improved from 29% to 93%. Verified end-to-end with LLM picking the right skill 6/6 vs 4/6 on alphabetical ordering.

…s-agent into skills-overflow-fix

fathah added 3 commits April 1, 2026 10:05

Merge branch 'skills-overflow-fix' of https://github.com/fathah/herme…

ecc0760

…s-agent into skills-overflow-fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): smart ranking, usage tracking, and lifecycle management…#4406

feat(skills): smart ranking, usage tracking, and lifecycle management…#4406
fathah wants to merge 3 commits intoNousResearch:mainfrom
fathah:skills-overflow-fix

fathah commented Apr 1, 2026

Labels

1 participant

Conversation