Problem
In the official Docker image, docker/entrypoint.sh runs as PID 1 and supervises netclawd:
while true; do
/usr/local/bin/netclawd "$@" &
PID=$!
wait $PID
...
done
It only waits on its own direct child ($PID). It does not reap reparented processes: when a descendant of netclawd (e.g. a shell_execute tool subprocess) is orphaned, it reparents to PID 1, and on exit becomes a <defunct> zombie that entrypoint.sh never wait()s for. Reaping arbitrary reparented children is exactly the job a real init (or tini) does and a supervisor shell loop does not.
Over a long-lived container these defunct entries accumulate — each holds a PID slot + exit code (negligible CPU/memory individually), but they clutter ps and, in pathological cases (many orphaned tool subprocesses), can exhaust the PID table.
Evidence
Surfaced while a user was debugging daemon lifecycle. In their case PID 1 was sleep infinity (a dev-sandbox keep-alive that also never reaps), so netclaw daemon stop left <defunct> netclawd entries reparented to PID 1, unkillable until the container was restarted ("zombies … i only can restart it").
The official image is better — entrypoint.sh reaps its own netclawd child via wait $PID, so the supervised daemon itself won't zombie — but it shares the same gap for reparented grandchildren of netclawd.
Impact
- Zombies hold PID slots and clutter process listings; pathologically, PID-table exhaustion.
- Operator confusion ("unkillable / defunct processes" inside the container).
Proposed fix (options)
- Bundle a real init as PID 1 (recommended, self-contained). Make
tini the image entrypoint and run entrypoint.sh under it, e.g.
ENTRYPOINT ["/usr/bin/tini", "-g", "--", "/opt/netclaw/entrypoint.sh"]
(tini -g forwards signals to the whole process group). tini reaps orphans and forwards SIGTERM, so entrypoint.sh keeps its supervision role while PID 1 handles reaping/signals. tini is in Ubuntu's repos (apt-get install -y tini).
- Document/require
docker run --init. Docker injects its own tini as PID 1. Cheaper, but relies on operators (and K8s pod specs via shareProcessNamespace/init) remembering it — not self-contained.
- Reap inside
entrypoint.sh. Harder in pure bash: the loop blocks on wait $PID, and a trap '…' CHLD + periodic wait is fragile. Bash isn't a good init — prefer option 1.
Recommend option 1.
Scope
Orthogonal to #1279 / #1282 (daemon split-brain). That fix prevents spawning duplicate daemons; this is about reaping orphaned descendants. Tracking separately, as agreed in the #1282 review.
Problem
In the official Docker image,
docker/entrypoint.shruns as PID 1 and supervisesnetclawd:It only
waits on its own direct child ($PID). It does not reap reparented processes: when a descendant ofnetclawd(e.g. ashell_executetool subprocess) is orphaned, it reparents to PID 1, and on exit becomes a<defunct>zombie thatentrypoint.shneverwait()s for. Reaping arbitrary reparented children is exactly the job a real init (ortini) does and a supervisor shell loop does not.Over a long-lived container these defunct entries accumulate — each holds a PID slot + exit code (negligible CPU/memory individually), but they clutter
psand, in pathological cases (many orphaned tool subprocesses), can exhaust the PID table.Evidence
Surfaced while a user was debugging daemon lifecycle. In their case PID 1 was
sleep infinity(a dev-sandbox keep-alive that also never reaps), sonetclaw daemon stopleft<defunct>netclawd entries reparented to PID 1, unkillable until the container was restarted ("zombies … i only can restart it").The official image is better —
entrypoint.shreaps its own netclawd child viawait $PID, so the supervised daemon itself won't zombie — but it shares the same gap for reparented grandchildren ofnetclawd.Impact
Proposed fix (options)
tinithe image entrypoint and runentrypoint.shunder it, e.g.ENTRYPOINT ["/usr/bin/tini", "-g", "--", "/opt/netclaw/entrypoint.sh"](
tini -gforwards signals to the whole process group).tinireaps orphans and forwards SIGTERM, soentrypoint.shkeeps its supervision role while PID 1 handles reaping/signals.tiniis in Ubuntu's repos (apt-get install -y tini).docker run --init. Docker injects its owntinias PID 1. Cheaper, but relies on operators (and K8s pod specs viashareProcessNamespace/init) remembering it — not self-contained.entrypoint.sh. Harder in pure bash: the loop blocks onwait $PID, and atrap '…' CHLD+ periodicwaitis fragile. Bash isn't a good init — prefer option 1.Recommend option 1.
Scope
Orthogonal to #1279 / #1282 (daemon split-brain). That fix prevents spawning duplicate daemons; this is about reaping orphaned descendants. Tracking separately, as agreed in the #1282 review.