Skip to content

Docker image PID 1 (entrypoint.sh) doesn't reap reparented zombies #1287

@Aaronontheweb

Description

@Aaronontheweb

Problem

In the official Docker image, docker/entrypoint.sh runs as PID 1 and supervises netclawd:

while true; do
    /usr/local/bin/netclawd "$@" &
    PID=$!
    wait $PID
    ...
done

It only waits on its own direct child ($PID). It does not reap reparented processes: when a descendant of netclawd (e.g. a shell_execute tool subprocess) is orphaned, it reparents to PID 1, and on exit becomes a <defunct> zombie that entrypoint.sh never wait()s for. Reaping arbitrary reparented children is exactly the job a real init (or tini) does and a supervisor shell loop does not.

Over a long-lived container these defunct entries accumulate — each holds a PID slot + exit code (negligible CPU/memory individually), but they clutter ps and, in pathological cases (many orphaned tool subprocesses), can exhaust the PID table.

Evidence

Surfaced while a user was debugging daemon lifecycle. In their case PID 1 was sleep infinity (a dev-sandbox keep-alive that also never reaps), so netclaw daemon stop left <defunct> netclawd entries reparented to PID 1, unkillable until the container was restarted ("zombies … i only can restart it").

The official image is better — entrypoint.sh reaps its own netclawd child via wait $PID, so the supervised daemon itself won't zombie — but it shares the same gap for reparented grandchildren of netclawd.

Impact

  • Zombies hold PID slots and clutter process listings; pathologically, PID-table exhaustion.
  • Operator confusion ("unkillable / defunct processes" inside the container).

Proposed fix (options)

  1. Bundle a real init as PID 1 (recommended, self-contained). Make tini the image entrypoint and run entrypoint.sh under it, e.g.
    ENTRYPOINT ["/usr/bin/tini", "-g", "--", "/opt/netclaw/entrypoint.sh"]
    (tini -g forwards signals to the whole process group). tini reaps orphans and forwards SIGTERM, so entrypoint.sh keeps its supervision role while PID 1 handles reaping/signals. tini is in Ubuntu's repos (apt-get install -y tini).
  2. Document/require docker run --init. Docker injects its own tini as PID 1. Cheaper, but relies on operators (and K8s pod specs via shareProcessNamespace/init) remembering it — not self-contained.
  3. Reap inside entrypoint.sh. Harder in pure bash: the loop blocks on wait $PID, and a trap '…' CHLD + periodic wait is fragile. Bash isn't a good init — prefer option 1.

Recommend option 1.

Scope

Orthogonal to #1279 / #1282 (daemon split-brain). That fix prevents spawning duplicate daemons; this is about reaping orphaned descendants. Tracking separately, as agreed in the #1282 review.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions