Skip to content

fix(bash): reap the process group after a command completes (#3702)#3748

Merged
esengine merged 2 commits into
main-v2from
fix/3702-reap-bash-process-group
Jun 10, 2026
Merged

fix(bash): reap the process group after a command completes (#3702)#3748
esengine merged 2 commits into
main-v2from
fix/3702-reap-bash-process-group

Conversation

@esengine

@esengine esengine commented Jun 10, 2026

Copy link
Copy Markdown
Owner

Problem (#3702)

A foreground bash command that forks a lingering child — e.g. bazel run, which starts a persistent build server — left that process alive after the command returned. Setpgid puts the child in the command's process group, but cmd.Wait() only reaps the shell leader, and setKillTree's Cancel only kills the group on cancel/timeout. On a normal exit nothing cleaned the group, so each such invocation leaked a process → memory climbed until OOM (reporter saw it on ubuntu with bazel run).

Fix

Kill the process group after cmd.Run() returns, on both the foreground and background-job paths. The group id is the shell leader's pid (Setpgid), so kill(-pgid, SIGKILL) clears any strays; an empty group is a harmless ESRCH. Cancel/timeout already did this via setKillTree; this covers normal completion.

POSIX-only. On Windows it's a documented no-op: once the leader exits there's no live parent for taskkill /T to walk, and spawning taskkill on every bash call would tax the hot path — proper after-exit tree cleanup there needs a Job Object (out of scope).

Note: a foreground command therefore can't leave a daemon running past its turn — intended for an agent tool; long-lived processes should use run_in_background.

Tests

TestReapTreeKillsGroupStragglers (POSIX) runs a shell that backgrounds a sleep 60 and exits, then asserts reapTree kills the stray. Runs in the ubuntu/macos CI test jobs. Build-verified on windows (no-op path) and linux (vet).

Closes #3702

A foreground bash command that forked a lingering child (e.g. bazel run, which starts a persistent server) left it alive: Setpgid put it in the command process group, but cmd.Wait only reaps the shell leader and setKillTree only kills the group on cancel/timeout. Normal completion left the strays running, accumulating into an OOM (#3702). Kill the group after Run() on both the foreground and background paths. POSIX-only; Windows is a documented no-op (after-exit tree cleanup there needs a Job Object).
@esengine esengine requested a review from SivanCola as a code owner June 10, 2026 01:14
@github-actions github-actions Bot added v2 Go rewrite (1.x) — main-v2 branch, active development skills Skill system (internal/skill, internal/tool) labels Jun 10, 2026
The first cut used exec.Command with setKillTree, but Go rejects a non-nil Cancel without CommandContext; it also read the bg pid from the inherited stdout, which the backgrounded child holds open. Create with context.Background(), redirect the child fds, and pass the pid through a temp file.
@esengine esengine merged commit ceceff2 into main-v2 Jun 10, 2026
13 checks passed
@esengine esengine deleted the fix/3702-reap-bash-process-group branch June 10, 2026 01:24
SuMuxi66 pushed a commit to SuMuxi66/DeepSeek-Reasonix that referenced this pull request Jun 10, 2026
…#3702) (esengine#3748)

* fix(bash): reap the process group after a command completes

A foreground bash command that forked a lingering child (e.g. bazel run, which starts a persistent server) left it alive: Setpgid put it in the command process group, but cmd.Wait only reaps the shell leader and setKillTree only kills the group on cancel/timeout. Normal completion left the strays running, accumulating into an OOM (esengine#3702). Kill the group after Run() on both the foreground and background paths. POSIX-only; Windows is a documented no-op (after-exit tree cleanup there needs a Job Object).

* test(bash): fix reap test — use CommandContext and pass pid via file

The first cut used exec.Command with setKillTree, but Go rejects a non-nil Cancel without CommandContext; it also read the bg pid from the inherited stdout, which the backgrounded child holds open. Create with context.Background(), redirect the child fds, and pass the pid through a temp file.

---------

Co-authored-by: reasonix <reasonix@deepseek.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

skills Skill system (internal/skill, internal/tool) v2 Go rewrite (1.x) — main-v2 branch, active development

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: 执行 bash 途中内存泄漏

1 participant