fix(bash): reap the process group after a command completes (#3702)#3748
Merged
Conversation
A foreground bash command that forked a lingering child (e.g. bazel run, which starts a persistent server) left it alive: Setpgid put it in the command process group, but cmd.Wait only reaps the shell leader and setKillTree only kills the group on cancel/timeout. Normal completion left the strays running, accumulating into an OOM (#3702). Kill the group after Run() on both the foreground and background paths. POSIX-only; Windows is a documented no-op (after-exit tree cleanup there needs a Job Object).
The first cut used exec.Command with setKillTree, but Go rejects a non-nil Cancel without CommandContext; it also read the bg pid from the inherited stdout, which the backgrounded child holds open. Create with context.Background(), redirect the child fds, and pass the pid through a temp file.
SuMuxi66
pushed a commit
to SuMuxi66/DeepSeek-Reasonix
that referenced
this pull request
Jun 10, 2026
…#3702) (esengine#3748) * fix(bash): reap the process group after a command completes A foreground bash command that forked a lingering child (e.g. bazel run, which starts a persistent server) left it alive: Setpgid put it in the command process group, but cmd.Wait only reaps the shell leader and setKillTree only kills the group on cancel/timeout. Normal completion left the strays running, accumulating into an OOM (esengine#3702). Kill the group after Run() on both the foreground and background paths. POSIX-only; Windows is a documented no-op (after-exit tree cleanup there needs a Job Object). * test(bash): fix reap test — use CommandContext and pass pid via file The first cut used exec.Command with setKillTree, but Go rejects a non-nil Cancel without CommandContext; it also read the bg pid from the inherited stdout, which the backgrounded child holds open. Create with context.Background(), redirect the child fds, and pass the pid through a temp file. --------- Co-authored-by: reasonix <reasonix@deepseek.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem (#3702)
A foreground bash command that forks a lingering child — e.g.
bazel run, which starts a persistent build server — left that process alive after the command returned.Setpgidputs the child in the command's process group, butcmd.Wait()only reaps the shell leader, andsetKillTree'sCancelonly kills the group on cancel/timeout. On a normal exit nothing cleaned the group, so each such invocation leaked a process → memory climbed until OOM (reporter saw it on ubuntu withbazel run).Fix
Kill the process group after
cmd.Run()returns, on both the foreground and background-job paths. The group id is the shell leader's pid (Setpgid), sokill(-pgid, SIGKILL)clears any strays; an empty group is a harmless ESRCH. Cancel/timeout already did this viasetKillTree; this covers normal completion.POSIX-only. On Windows it's a documented no-op: once the leader exits there's no live parent for
taskkill /Tto walk, and spawningtaskkillon every bash call would tax the hot path — proper after-exit tree cleanup there needs a Job Object (out of scope).Note: a foreground command therefore can't leave a daemon running past its turn — intended for an agent tool; long-lived processes should use
run_in_background.Tests
TestReapTreeKillsGroupStragglers(POSIX) runs a shell that backgrounds asleep 60and exits, then assertsreapTreekills the stray. Runs in the ubuntu/macos CI test jobs. Build-verified on windows (no-op path) and linux (vet).Closes #3702