Skip to content

fix(docker): reap orphaned subprocesses via tini as PID 1#15116

Merged
teknium1 merged 1 commit into
mainfrom
hermes/hermes-37dfb080
Apr 24, 2026
Merged

fix(docker): reap orphaned subprocesses via tini as PID 1#15116
teknium1 merged 1 commit into
mainfrom
hermes/hermes-37dfb080

Conversation

@teknium1

Copy link
Copy Markdown
Contributor

Closes #15012.

What this PR does

Installs tini in the container image and routes ENTRYPOINT through /usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers, git, bun, browser daemons) never get wait()-ed on and accumulate as zombies under PID 1. Long-running gateway containers eventually exhaust the PID table ("fork: cannot allocate memory").

Why tini specifically

  • Standard container init — same pattern Docker's --init flag and Kubernetes's pause container use
  • Shipped in Debian stable (tini 0.19.0-1)
  • Transparent to hermes: handles SIGCHLD, reaps orphans, forwards SIGTERM/SIGINT to the entrypoint so existing graceful-shutdown handlers still run
  • -g sends signals to the whole process group so docker stop cleanly terminates hermes AND its descendants

Validation

E2E with a minimal reproducer image:

Image Spawn 5 orphans reparenting to PID 1 ZOMBIE_COUNT after 1.5s
Without tini (current main) 5 orphans created 5 zombies
With tini -g -- (this PR) 5 orphans created 0 zombies

Unit tests (tests/tools/test_dockerfile_pid1_reaping.py):
Two contract tests that assert the Dockerfile (a) installs some init (tini, dumb-init, or catatonit), and (b) the last ENTRYPOINT line routes through it. Accepts any standard init so it's not a change-detector on tini specifically. Verified both tests fail on pre-fix Dockerfile, pass on post-fix.

What I considered and rejected

  • signal.signal(SIGCHLD, SIG_IGN) in Python — race-prone with Python's own subprocess wait(), doesn't cover shell entrypoint stage
  • Telling users to pass --init on docker run — requires user-side discipline, doesn't cover compose/k8s

Baking tini into the image is the right layer — applies to every deployment path automatically.

Install tini in the container image and route ENTRYPOINT through
`/usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh`.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers,
git, bun, browser daemons) never get waited() on and accumulate as
zombies. Long-running gateway containers eventually exhaust the PID
table and hit "fork: cannot allocate memory".

tini is the standard container init (same pattern Docker's --init flag
and Kubernetes pause container use). It handles SIGCHLD, reaps orphans,
and forwards SIGTERM/SIGINT to the entrypoint so hermes's existing
graceful-shutdown handlers still run. The -g flag sends signals to the
whole process group so `docker stop` cleanly terminates hermes and its
descendants, not just direct children.

Closes #15012.

E2E-verified with a minimal reproducer image: spawning 5 orphans that
reparent to PID 1 leaves 5 zombies without tini and 0 with tini.
@teknium1 teknium1 merged commit acd78a4 into main Apr 24, 2026
11 of 12 checks passed
@teknium1 teknium1 deleted the hermes/hermes-37dfb080 branch April 24, 2026 12:22
@alt-glitch alt-glitch added type/bug Something isn't working P2 Medium — degraded but workaround exists area/docker Docker image, Compose, packaging comp/gateway Gateway runner, session dispatch, delivery labels Apr 24, 2026
nekorytaylor666 pushed a commit to nekorytaylor666/hermes-agent that referenced this pull request Apr 24, 2026
…ch#15116)

Install tini in the container image and route ENTRYPOINT through
`/usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh`.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers,
git, bun, browser daemons) never get waited() on and accumulate as
zombies. Long-running gateway containers eventually exhaust the PID
table and hit "fork: cannot allocate memory".

tini is the standard container init (same pattern Docker's --init flag
and Kubernetes pause container use). It handles SIGCHLD, reaps orphans,
and forwards SIGTERM/SIGINT to the entrypoint so hermes's existing
graceful-shutdown handlers still run. The -g flag sends signals to the
whole process group so `docker stop` cleanly terminates hermes and its
descendants, not just direct children.

Closes NousResearch#15012.

E2E-verified with a minimal reproducer image: spawning 5 orphans that
reparent to PID 1 leaves 5 zombies without tini and 0 with tini.
justrhoto pushed a commit to justrhoto/hermes-agent that referenced this pull request Apr 24, 2026
…ch#15116)

Install tini in the container image and route ENTRYPOINT through
`/usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh`.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers,
git, bun, browser daemons) never get waited() on and accumulate as
zombies. Long-running gateway containers eventually exhaust the PID
table and hit "fork: cannot allocate memory".

tini is the standard container init (same pattern Docker's --init flag
and Kubernetes pause container use). It handles SIGCHLD, reaps orphans,
and forwards SIGTERM/SIGINT to the entrypoint so hermes's existing
graceful-shutdown handlers still run. The -g flag sends signals to the
whole process group so `docker stop` cleanly terminates hermes and its
descendants, not just direct children.

Closes NousResearch#15012.

E2E-verified with a minimal reproducer image: spawning 5 orphans that
reparent to PID 1 leaves 5 zombies without tini and 0 with tini.
ulasbilgen pushed a commit to ulasbilgen/hermes-adhd-agent that referenced this pull request May 1, 2026
…ch#15116)

Install tini in the container image and route ENTRYPOINT through
`/usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh`.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers,
git, bun, browser daemons) never get waited() on and accumulate as
zombies. Long-running gateway containers eventually exhaust the PID
table and hit "fork: cannot allocate memory".

tini is the standard container init (same pattern Docker's --init flag
and Kubernetes pause container use). It handles SIGCHLD, reaps orphans,
and forwards SIGTERM/SIGINT to the entrypoint so hermes's existing
graceful-shutdown handlers still run. The -g flag sends signals to the
whole process group so `docker stop` cleanly terminates hermes and its
descendants, not just direct children.

Closes NousResearch#15012.

E2E-verified with a minimal reproducer image: spawning 5 orphans that
reparent to PID 1 leaves 5 zombies without tini and 0 with tini.
aj-nt pushed a commit to aj-nt/hermes-agent that referenced this pull request May 1, 2026
…ch#15116)

Install tini in the container image and route ENTRYPOINT through
`/usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh`.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers,
git, bun, browser daemons) never get waited() on and accumulate as
zombies. Long-running gateway containers eventually exhaust the PID
table and hit "fork: cannot allocate memory".

tini is the standard container init (same pattern Docker's --init flag
and Kubernetes pause container use). It handles SIGCHLD, reaps orphans,
and forwards SIGTERM/SIGINT to the entrypoint so hermes's existing
graceful-shutdown handlers still run. The -g flag sends signals to the
whole process group so `docker stop` cleanly terminates hermes and its
descendants, not just direct children.

Closes NousResearch#15012.

E2E-verified with a minimal reproducer image: spawning 5 orphans that
reparent to PID 1 leaves 5 zombies without tini and 0 with tini.
donald131 pushed a commit to donald131/hermes-agent that referenced this pull request May 2, 2026
…ch#15116)

Install tini in the container image and route ENTRYPOINT through
`/usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh`.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers,
git, bun, browser daemons) never get waited() on and accumulate as
zombies. Long-running gateway containers eventually exhaust the PID
table and hit "fork: cannot allocate memory".

tini is the standard container init (same pattern Docker's --init flag
and Kubernetes pause container use). It handles SIGCHLD, reaps orphans,
and forwards SIGTERM/SIGINT to the entrypoint so hermes's existing
graceful-shutdown handlers still run. The -g flag sends signals to the
whole process group so `docker stop` cleanly terminates hermes and its
descendants, not just direct children.

Closes NousResearch#15012.

E2E-verified with a minimal reproducer image: spawning 5 orphans that
reparent to PID 1 leaves 5 zombies without tini and 0 with tini.
02356abc pushed a commit to 02356abc/hermes-agent that referenced this pull request May 14, 2026
…ch#15116)

Install tini in the container image and route ENTRYPOINT through
`/usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh`.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers,
git, bun, browser daemons) never get waited() on and accumulate as
zombies. Long-running gateway containers eventually exhaust the PID
table and hit "fork: cannot allocate memory".

tini is the standard container init (same pattern Docker's --init flag
and Kubernetes pause container use). It handles SIGCHLD, reaps orphans,
and forwards SIGTERM/SIGINT to the entrypoint so hermes's existing
graceful-shutdown handlers still run. The -g flag sends signals to the
whole process group so `docker stop` cleanly terminates hermes and its
descendants, not just direct children.

Closes NousResearch#15012.

E2E-verified with a minimal reproducer image: spawning 5 orphans that
reparent to PID 1 leaves 5 zombies without tini and 0 with tini.
gweeteve pushed a commit to gweeteve/hermes-agent that referenced this pull request Jun 2, 2026
…ch#15116)

Install tini in the container image and route ENTRYPOINT through
`/usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh`.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers,
git, bun, browser daemons) never get waited() on and accumulate as
zombies. Long-running gateway containers eventually exhaust the PID
table and hit "fork: cannot allocate memory".

tini is the standard container init (same pattern Docker's --init flag
and Kubernetes pause container use). It handles SIGCHLD, reaps orphans,
and forwards SIGTERM/SIGINT to the entrypoint so hermes's existing
graceful-shutdown handlers still run. The -g flag sends signals to the
whole process group so `docker stop` cleanly terminates hermes and its
descendants, not just direct children.

Closes NousResearch#15012.

E2E-verified with a minimal reproducer image: spawning 5 orphans that
reparent to PID 1 leaves 5 zombies without tini and 0 with tini.
Egavasyug pushed a commit to Egavasyug/hermes-agent that referenced this pull request Jun 10, 2026
…ch#15116)

Install tini in the container image and route ENTRYPOINT through
`/usr/bin/tini -g -- /opt/hermes/docker/entrypoint.sh`.

Without a PID-1 init, orphans reparented to hermes (MCP stdio servers,
git, bun, browser daemons) never get waited() on and accumulate as
zombies. Long-running gateway containers eventually exhaust the PID
table and hit "fork: cannot allocate memory".

tini is the standard container init (same pattern Docker's --init flag
and Kubernetes pause container use). It handles SIGCHLD, reaps orphans,
and forwards SIGTERM/SIGINT to the entrypoint so hermes's existing
graceful-shutdown handlers still run. The -g flag sends signals to the
whole process group so `docker stop` cleanly terminates hermes and its
descendants, not just direct children.

Closes NousResearch#15012.

E2E-verified with a minimal reproducer image: spawning 5 orphans that
reparent to PID 1 leaves 5 zombies without tini and 0 with tini.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/docker Docker image, Compose, packaging comp/gateway Gateway runner, session dispatch, delivery P2 Medium — degraded but workaround exists type/bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] Gateway accumulates zombie processes from MCP servers and subprocess calls

2 participants