Background processes are lost when hermes-gateway restarts

Summary

When hermes-gateway restarts, background processes started from a messaging session can be lost.

This causes Hermes to later report that the process ID no longer exists, even though the job was launched successfully and the user expects Hermes to keep tracking it.

Why this is a bug

A gateway restart should not make Hermes forget user-started background jobs.

At minimum, Hermes should persist enough metadata to recover tracking after restart:

process/session id
pid
originating platform/chat/thread
command
output/log paths
current status

If the child process is still alive, Hermes should reattach.
If the child process died because of the restart, Hermes should still report that clearly with preserved stderr/exit state.

Right now the behavior is effectively:

background job starts
gateway restarts
process registry is reset
later poll returns No process with ID ...
user loses the job state and useful failure information

Repro

Start hermes-gateway
Launch a long-running background command from Telegram
Restart hermes-gateway
Ask Hermes to poll/check the background process

Observed

Hermes reports the background process is gone / not found.

Example:

process launched successfully with a proc_* session id
after gateway restart, Hermes responds with:
- No process with ID ...
- or says the handle is gone
~/.hermes/processes.json is reset on restart, so in-memory tracking is lost

Expected

One of:

Hermes reattaches to the existing background process and continues tracking it
Hermes restores durable job state and can report final outcome/log path after restart
If the restart terminates the child, Hermes reports that explicitly as a restart-induced termination, not “process not found”

Likely root cause

Background process tracking appears too dependent on in-memory state / processes.json state that is rewritten on restart.

The gateway should persist background job metadata durably and recover it on boot.

Impact

High for messaging users:

long backtests/research jobs become unreliable
users lose output and failure traces
restart during an active job silently breaks trust in background execution

Suggested fix direction

persist background watcher/job metadata durably
recover watchers on gateway startup
attempt reattach by pid/session id
preserve stderr/exit info across restart
distinguish “process exited” from “registry forgot process”

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Background processes are lost when hermes-gateway restarts #1144

Summary

Why this is a bug

Repro

Observed

Expected

Likely root cause

Impact

Suggested fix direction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Background processes are lost when hermes-gateway restarts #1144

Description

Summary

Why this is a bug

Repro

Observed

Expected

Likely root cause

Impact

Suggested fix direction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions