-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Feature: Structured Tool Result Hints (CTAs) — Next-Action Guidance to Reduce Agent Iterations (inspired by incur) #722
Description
Overview
wevm/incur is a TypeScript CLI framework designed for both humans and AI agents. Its most transferable concept for Hermes Agent is Call-to-Actions (CTAs) — structured suggestions appended to tool results that tell the agent what to do next. This reduces wasted iterations where the LLM has to figure out the obvious next step, saving significant tokens and time.
Hermes already has ad-hoc hints in some tools (e.g., read_file suggests "Use offset=X to continue reading", file-not-found suggests similar filenames). But there is no standardized mechanism, and most tools return raw results with zero guidance. This issue proposes standardizing and expanding tool result hints across the codebase.
The key insight: every wasted iteration costs a full context window re-send (often 50K-200K tokens). A 10-token hint that saves even one iteration pays for itself 5000x over.
Research Findings
How incur's CTAs Work
In incur, every command result includes a structured meta.cta block:
{
"ok": true,
"data": { "items": [...] },
"meta": {
"command": "list",
"duration": "12ms",
"cta": {
"description": "Suggested commands:",
"commands": [
{ "command": "get 1", "description": "View the first item" },
{ "command": "list --offset 10", "description": "Next page" }
]
}
}
}Key design properties:
- Structured, not free-text: Commands are objects with
command,description, optionalargs/options - Context-aware: CTAs change based on the result (error CTAs differ from success CTAs)
- Automatic for errors: COMMAND_NOT_FOUND automatically suggests
--help - Minimal token cost: A CTA block is ~30-50 tokens; saving one iteration saves 50K-200K tokens
What Hermes Has Today (Ad-Hoc Hints)
Scattered across tools, no standard pattern:
| Tool | Hint Pattern | Location |
|---|---|---|
read_file |
"Use offset=X to continue reading" |
file_operations.py:509 |
read_file (not found) |
Suggests similar filenames | file_operations.py:586 |
read_file (binary/image) |
"Use vision_analyze to inspect" |
file_operations.py:520 |
search_files |
None — no hints | — |
terminal |
None — no hints | — |
web_extract |
None — no hints | — |
browser_snapshot |
None — no hints | — |
web_search |
None — no hints | — |
Most tools return raw results. The agent must infer next steps entirely from the data, which often costs an extra iteration.
Current State in Hermes Agent
Tool result flow (run_agent.py:2695-2707):
handle_function_call()returns a string result- Truncated if >
MAX_TOOL_RESULT_CHARS(100K) - Inserted as
{"role": "tool", "content": result_string, "tool_call_id": id} - Appended to messages
There is no post-processing step that could inject hints. Each tool handler is responsible for including any guidance in its return string. This means hints must be added per-tool in their handlers — there is no centralized injection point (and adding one would be a larger refactor).
Related existing issues:
- Feature: Insertion-Time Tool Result Trimming — Cache-Friendly Context Management #415 — Insertion-time tool result trimming (overlaps on output processing, but focuses on truncation not hints)
- Feature: Granular Improvements from Roo Code Deep-Dive — Tool Output, Patch Refinements, Anti-Hallucination, Prompt Methodology #507 — Head/tail split truncation improvements (same overlap)
- Feature: Two-Phase Context Management — Prune Tool Outputs Before Full Compaction (inspired by Kilocode) #513 — Two-phase context management (prunes old tool outputs)
- Feature: Iteration Budget Pressure — Warn the LLM Before Max Iterations Hit #414 — Iteration budget pressure (warns LLM when approaching max iterations — complementary)
- Feature: Anti-Sycophancy System Prompt Rules — Eliminate Filler, Force Action (inspired by Kilocode) #511 — Anti-sycophancy rules (reduces wasted tokens in LLM output — complementary)
None of these cover structured next-action hints in tool RESULTS.
Implementation Plan
Skill vs. Tool Classification
This is a core codebase change, not a skill or tool. It modifies existing tool handlers to include contextual hints in their return strings.
What We'd Need
- A lightweight convention for hint formatting (not a schema change — just a text pattern appended to results)
- Per-tool hint logic in each handler
- No changes to the tool registry, message format, or API contract
Phased Rollout
Phase 1: Standardize Existing + Add High-Value Hints (~2-3 hours)
Establish a simple text convention and add hints to the tools where they save the most iterations:
Convention (appended to result string):
[Hint: <actionable suggestion>]
Priority hints to add:
search_files(content mode):[Hint: Use read_file("<path>", offset=N) to see more context around a match]search_files(zero results):[Hint: Try a broader pattern, different file_glob, or check the path]terminal(non-zero exit):[Hint: Exit code N. Check the error output above. Common fix: ...](for known patterns like permission denied, command not found)terminal(truncated output):[Hint: Output was truncated. Use read_file on the output file, or pipe to head/tail for specific sections]web_search:[Hint: Use web_extract(["<top_url>"]) to read the full content of a promising result]web_extract(summarized):[Hint: Content was summarized. Use browser_navigate for the full page, or web_extract specific sub-pages]browser_snapshot(truncated):[Hint: Page content truncated. Use browser_scroll("down") to reveal more, or browser_vision for visual analysis]patch(no match):[Hint: old_string not found. Use read_file to verify the current content, or try search_files to locate the text]
Phase 2: Context-Aware Hints (~3-4 hours)
Make hints adaptive:
read_fileon a file with 1000+ lines: hint about search_files to find what you need vs reading sequentiallysearch_fileswith 50 results (limit hit): hint about narrowing the search with file_glob or more specific patternterminalgit commands: hint about next git workflow steps- Pagination-aware hints: if a paginated result has more pages, include the exact next call
- Error-aware hints: map common error patterns to corrective actions
Phase 3: Hint Budget Awareness (optional, ~2-3 hours)
When context is >70% full, hints could become more directive ("You are running low on context. Consider completing the task now or summarizing progress."). This connects to #414 (iteration budget pressure) — hints could incorporate context pressure signals.
Pros & Cons
Pros
- Massive token savings: Each hint that saves one iteration saves 50K-200K tokens (full context resend). A 10-token hint has 5000:1 ROI
- Faster task completion: Agent spends fewer iterations on obvious next steps
- Better for weaker models: Stronger models often infer next steps; weaker models benefit significantly from explicit guidance
- Low implementation cost: Just appending strings to existing tool results. No schema changes, no architectural changes, no cache invalidation
- Already partially proven: read_file's pagination hint and similar-files suggestion already work well
Cons / Risks
- Hint noise: Bad hints waste tokens and could mislead the agent. Each hint must be genuinely useful
- Maintenance burden: Hints need to stay accurate as tools evolve. A wrong hint is worse than no hint
- Model interference: Strong models may treat hints as commands rather than suggestions, potentially overriding their own better judgment
- Token cost of hints themselves: Each hint is ~10-30 tokens. If added to every tool call, this adds up. Should be conditional (only when the hint is actually useful)
Open Questions
- Should hints be suppressible via a config flag (e.g.,
tool_hints: off) for users who want minimal output? - Should hints use a specific prefix (e.g.,
[Hint:],[Suggested:],[Next:]) for consistency and potential future parsing? - Should the system prompt mention that tools may include hints, and instruct the model to consider but not blindly follow them?
- For Phase 3: should hint verbosity adapt based on model capability (e.g., more hints for smaller models, fewer for frontier models)?
Additional incur Concepts Considered But Not Proposed
Several other incur features were evaluated and found to be either already covered or not worth the integration cost:
| Concept | Verdict | Reasoning |
|---|---|---|
| TOON format (token-efficient output encoding) | Pass | Tool results already use compact text formats (LINE_NUM|CONTENT, plain markdown). JSON overhead is <5% of typical results. Porting a TypeScript library to Python for marginal gains not justified. |
Output filtering (--filter-output <keys>) |
Partially exists | search_files already has output_mode (content/files_only/count). Standardizing across all tools would be a large refactor. Agent already controls what to request. |
Token pagination (--token-limit/offset) |
Covered by #415 | Read_file has line-based pagination. Token-based pagination would require runtime tokenizer (overhead). Char-based truncation with //4 heuristic is sufficient. |
Structured error envelope ({retryable: bool}) |
Minor value | Could be useful but small impact. Agent already retries on most errors. |
| Skills auto-generation from CLIs | Different paradigm | Hermes has its own skills system. incur generates skills FROM CLIs; Hermes skills ARE instructions for the agent. |
References
- wevm/incur repository — Source of the CTA concept
- incur source:
src/Cli.ts(CTA types + formatting),src/Formatter.ts(TOON integration),src/Filter.ts(output filtering) - Hermes tool result flow:
run_agent.py:2695-2707 - Existing ad-hoc hints:
tools/file_operations.py:509(pagination),:586(similar files),:520(vision suggestion)