Description
The Hermes gateway (running as PID 1 in Docker) accumulates zombie processes over time from:
- MCP server processes (gbrain, bun)
- Git operations
- Browser automation subprocesses
- Shell pipe commands (head, etc.)
Evidence
$ ps aux | awk '$8 ~ /Z/ {print}'
hermes 1689 0.0 0.0 0 0 ? Zs 14:40 0:00 [agent-browser-l] <defunct>
hermes 1902 0.0 0.0 0 0 ? Zs 14:50 0:00 [git] <defunct>
hermes 1984 0.0 0.0 0 0 ? Zs 14:52 0:00 [git] <defunct>
hermes 1988 0.0 0.0 0 0 ? Zs 14:52 0:00 [git] <defunct>
hermes 2861 0.0 0.0 0 0 ? Z 14:59 0:00 [gbrain] <defunct>
hermes 2862 0.0 0.0 0 0 ? Z 14:59 0:00 [head] <defunct>
hermes 2863 0.0 0.0 0 0 ? Z 14:59 0:01 [bun] <defunct>
...
All zombies have PPID=1 (the gateway process), indicating the gateway is not reaping child processes.
Root Cause
When running as PID 1, the gateway does not handle SIGCHLD signals to reap terminated child processes. In standard Unix, orphan processes are reparented to PID 1, which is expected to call wait() on them.
Proposed Solutions
-
In gateway code: Add signal.signal(signal.SIGCHLD, signal.SIG_IGN) to auto-reap zombies, or implement a proper SIGCHLD handler that calls waitpid(-1, WNOHANG).
-
In Docker: Use --init flag or add tini as PID 1 to handle signal forwarding and zombie reaping.
-
In Dockerfile: Consider using tini as the entrypoint:
ENTRYPOINT ["/sbin/tini", "--"]
CMD ["hermes", "gateway", "run"]
Environment
- Hermes version: v0.11.0
- Deployment: Docker
- OS: Linux (Debian-based)
Impact
Zombie processes don't consume CPU/memory, but they do occupy PID slots. Long-running containers may eventually exhaust available PIDs, causing "fork: cannot allocate memory" errors.
Description
The Hermes gateway (running as PID 1 in Docker) accumulates zombie processes over time from:
Evidence
All zombies have PPID=1 (the gateway process), indicating the gateway is not reaping child processes.
Root Cause
When running as PID 1, the gateway does not handle
SIGCHLDsignals to reap terminated child processes. In standard Unix, orphan processes are reparented to PID 1, which is expected to callwait()on them.Proposed Solutions
In gateway code: Add
signal.signal(signal.SIGCHLD, signal.SIG_IGN)to auto-reap zombies, or implement a proper SIGCHLD handler that callswaitpid(-1, WNOHANG).In Docker: Use
--initflag or addtinias PID 1 to handle signal forwarding and zombie reaping.In Dockerfile: Consider using
tinias the entrypoint:Environment
Impact
Zombie processes don't consume CPU/memory, but they do occupy PID slots. Long-running containers may eventually exhaust available PIDs, causing "fork: cannot allocate memory" errors.