-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Gateway systemd service fails to auto-restart when browser processes orphaned #1617
Copy link
Copy link
Closed as not planned
Closed as not planned
Copy link
Description
Problem
When the gateway spawns browser automation (Chrome via remote-debugging-port), those Chrome child processes join the systemd service cgroup. On service stop/restart:
ExecStoptriggers graceful Python shutdown- Python's async Telegram disconnect throws errors (
'NoneType' object has no attribute 'shutdown',dictionary changed size during iteration) - systemd waits
TimeoutStopSec(15s), then tries to kill the cgroup - Chrome orphans prevent clean cgroup teardown:
Failed to kill control group: Invalid argument - Service enters
Failed with result 'timeout'state Restart=on-failuredoesn't always trigger recovery after this state
The gateway stays down until manually restarted.
Root Causes
KillMode=mixedonly SIGKILLs the main process, leaving chrome children alive in the cgroupRestart=on-failuredoesn't cover all exit scenarios (e.g., SIGKILL after timeout)- No cleanup of leaked browser processes after stop
- No crash loop protection — if something is broken, it could restart infinitely
Fix
Update generate_systemd_unit() in hermes_cli/gateway.py:
Restart=always— self-heal on any exitKillMode=control-group— kill entire cgroup including orphan chromesExecStopPost— force-kill leaked browser processesStartLimitIntervalSec=120/StartLimitBurst=5— crash loop protectionTimeoutStopSec=20— slightly more time for graceful disconnectSendSIGKILL=yes— ensure cleanup after timeoutRestartSec=15— prevent rapid restart churn
Users who already installed the service need to run hermes gateway install --force to regenerate the unit file.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels