fix(gateway): recover from hung agents — /stop hard-kills session lock by teknium1 · Pull Request #3104 · NousResearch/hermes-agent

teknium1 · 2026-03-26T01:23:58Z

Summary

Salvage of PR #2498 by @Mibayy onto current main.

When an agent thread hangs (truly blocked, never checks _interrupt_requested), /stop now force-cleans _running_agents to unlock the session immediately. Previously, /stop called agent.interrupt() which sets a flag the hung agent never reads — the session stayed locked forever, showing "writing..." with no output.

Changes

Early /stop intercept — New block in the running-agent guard (following the existing /new intercept pattern) that catches /stop, calls interrupt() on the agent, then force-deletes the entry from _running_agents and clears pending messages. Returns immediately with a confirmation.

Sentinel /stop force-clean — /stop during agent startup now force-cleans the sentinel instead of returning "nothing to stop yet", so the session actually unlocks.

10-minute hard timeout — Wraps loop.run_in_executor() in asyncio.wait_for(timeout=600). On timeout, interrupts the agent and constructs a synthetic response. The thread keeps running (Python can't kill threads) but the session lock is released.

Follow-up improvements over original PR

Consolidated duplicate resolve_command imports — single early resolution shared by /stop and /new intercepts
Updated _handle_stop_command() to also force-clean for consistency (both paths now behave identically)
Added zombie thread documentation on the timeout handler

Tests

Updated test 6 (sentinel /stop) to verify force-cleanup
Added test 6b: /stop hard-kills a running agent
Added test 6c: /stop clears pending messages

All 6178 tests pass.

Closes #2491. Cherry-picked from #2498 by @Mibayy.

When an agent thread hangs (truly blocked, never checks _interrupt_requested), /stop now force-cleans _running_agents to unlock the session immediately. Two changes: - Early /stop intercept in the running-agent guard: bypasses normal command dispatch to force-interrupt and unlock the session. Follows the same pattern as the existing /new intercept. - Sentinel /stop: force-cleans the sentinel instead of returning 'nothing to stop yet', so /stop during slow startup actually unlocks the session. Follow-up improvements over original PR: - Consolidated duplicate resolve_command imports into single early resolution - Updated _handle_stop_command to also force-clean for consistency - Removed 10-minute hard timeout on the executor (would kill legitimate long-running agent tasks; the /stop force-clean handles recovery) Cherry-picked from Mibayy's PR #2498.

github-actions · 2026-03-26T01:38:32Z

⚠️ Supply Chain Risk Detected

This PR contains patterns commonly associated with supply chain attacks. This does not mean the PR is malicious — but these patterns require careful human review before merging.

⚠️ WARNING: Outbound network calls (POST/PUT)

Outbound POST/PUT requests in new code could be data exfiltration. Verify the destination URLs are legitimate.

Matches (first 10):

144:+        with urllib.request.urlopen(req, timeout=15) as resp:
225:+        with urllib.request.urlopen(req, timeout=10) as resp:

Automated scan triggered by supply-chain-audit. If this is a false positive, a maintainer can approve after manual review.

…ousResearch#3104) When an agent thread hangs (truly blocked, never checks _interrupt_requested), /stop now force-cleans _running_agents to unlock the session immediately. Two changes: - Early /stop intercept in the running-agent guard: bypasses normal command dispatch to force-interrupt and unlock the session. Follows the same pattern as the existing /new intercept. - Sentinel /stop: force-cleans the sentinel instead of returning 'nothing to stop yet', so /stop during slow startup actually unlocks the session. Follow-up improvements over original PR: - Consolidated duplicate resolve_command imports into single early resolution - Updated _handle_stop_command to also force-clean for consistency - Removed 10-minute hard timeout on the executor (would kill legitimate long-running agent tasks; the /stop force-clean handles recovery) Cherry-picked from Mibayy's PR NousResearch#2498. Co-authored-by: Mibayy <Mibayy@users.noreply.github.com>

teknium1 force-pushed the hermes/hermes-9f3f51e2 branch from 19742e2 to ebfbfa5 Compare March 26, 2026 01:38

teknium1 merged commit 59575d6 into main Mar 26, 2026
1 of 2 checks passed

teknium1 mentioned this pull request Mar 26, 2026

fix(gateway): recover from hung agents — /stop hard-kills session lock #2498

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(gateway): recover from hung agents — /stop hard-kills session lock#3104

fix(gateway): recover from hung agents — /stop hard-kills session lock#3104
teknium1 merged 1 commit intomainfrom
hermes/hermes-9f3f51e2

teknium1 commented Mar 26, 2026

github-actions bot commented Mar 26, 2026

Uh oh!

Labels

2 participants

Conversation