Skip to content

fix: clean command process groups on parent death#2125

Merged
yohamta0 merged 6 commits into
mainfrom
codex/fix-command-parent-death-cleanup
May 8, 2026
Merged

fix: clean command process groups on parent death#2125
yohamta0 merged 6 commits into
mainfrom
codex/fix-command-parent-death-cleanup

Conversation

@yohamta0

@yohamta0 yohamta0 commented May 8, 2026

Copy link
Copy Markdown
Collaborator

Summary

  • add a Unix parent-exit watcher for command executor child process groups
  • stop the watcher on normal command completion and keep Windows on existing process-tree cleanup
  • add a regression test that hard-kills a parent test process and verifies the script process group exits

Testing

  • go test ./internal/runtime/builtin/command -count=1
  • go test ./internal/cmn/cmdutil -count=1
  • go test ./internal/intg/distr -run TestDistributedRun_WorkerCrash_MarkedFailed -count=1

Summary by CodeRabbit

  • New Features
    • Parent-exit watcher added on Unix (Windows remains a no-op).
  • Bug Fixes
    • Command execution now starts the parent-exit watcher and ensures child process groups are terminated if watcher startup fails or when needed.
  • Tests
    • Added Unix-only test validating process-group cleanup when a parent dies.
    • Made dispatch worker test deterministic to reliably verify pull-and-stop behavior.

@coderabbitai

coderabbitai Bot commented May 8, 2026

Copy link
Copy Markdown

Review Change Stack

Caution

Review failed

The pull request is closed.

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review
ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2fdadb1c-b208-4662-bfb9-44c8acf98e6c

📥 Commits

Reviewing files that changed from the base of the PR and between 5dab183 and b8e1349.

📒 Files selected for processing (5)
  • internal/cmn/cmdutil/parent_exit_watcher_unix.go
  • internal/cmn/cmdutil/parent_exit_watcher_windows.go
  • internal/runtime/builtin/command/command.go
  • internal/runtime/builtin/command/command_unix_test.go
  • internal/service/scheduler/github_dispatch_test.go

📝 Walkthrough

Walkthrough

Adds parent-process-exit monitoring (Unix watchdog, Windows no-op), integrates watcher into commandExecutor.Run after cmd.Start(), defers watcher stop, and adds a Unix integration test verifying process-group cleanup. Also makes a GitHub-dispatch worker cancellation test deterministic by signaling via a channel.

Changes

Parent Exit Watcher

Layer / File(s) Summary
Platform Implementations
internal/cmn/cmdutil/parent_exit_watcher_unix.go, internal/cmn/cmdutil/parent_exit_watcher_windows.go
Unix spawns a /bin/sh watchdog with an os.Pipe token handshake that exits on receiving the OK token; Windows returns a no-op cleanup function.
Command Executor Integration
internal/runtime/builtin/command/command.go
commandExecutor.Run calls StartParentExitWatcher after cmd.Start(), defers the returned stop function, and on watcher startup failure kills the process group and returns an error.
Integration Tests & Helpers
internal/runtime/builtin/command/command_unix_test.go
Unix-only test spawns a helper that writes a script PID and signals readiness; the test kills the helper process group, asserts helper exit, and polls until the script process group is gone; includes helper runner and liveness probe.

GitHub Dispatch Worker Test

Layer / File(s) Summary
Test orchestration
internal/service/scheduler/github_dispatch_test.go
Rewrites TestGitHubDispatchWorker_StartRunsLoopsUntilContextCanceled to run the worker in a goroutine, wait for a pulled channel signal from the stub client, cancel the context, and assert the worker goroutine exits; the stub client gains a pulled channel and sync.Once to close it exactly once.

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • dagucloud/dagu#2125: Implements the same StartParentExitWatcher additions (Unix watchdog, Windows no-op), integration into commandExecutor.Run, and accompanying Unix parent-death test.
🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 18.18% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'fix: clean command process groups on parent death' clearly and concisely summarizes the main objective of the changeset: ensuring command process groups are cleaned up when the parent process terminates.
Description check ✅ Passed The description covers the Summary and Testing sections from the template with specific details, but omits the Changes list, Related Issues, and Checklist sections that are part of the standard template.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/fix-command-parent-death-cleanup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/runtime/builtin/command/command_test.go`:
- Around line 421-438: The t.Cleanup handlers unconditionally send SIGKILL which
can hit a reused PID/group; modify the cleanup logic around cmd and scriptPID to
avoid killing already-terminated processes: after you explicitly stop/wait the
helper, clear or nil out cmd (or set cmd.Process = nil) and set scriptPID = 0,
and update the t.Cleanup closures for the cmd block (using cmd and
cmdutil.KillProcessGroup) and the script block (using syscall.Kill and
pidFile/scriptPID) to first check liveness (e.g., cmd != nil && cmd.Process !=
nil for the command, and scriptPID > 0 plus a syscall/OS check that the process
group exists) before sending signals, so cleanup only signals live processes and
avoids affecting reused PIDs.
- Around line 400-492: The parent-death test uses Unix-only process-group
semantics (syscall.Kill(-pid,...)) so move the
TestCommandExecutor_CleansProcessGroupWhenParentDies,
runCommandParentDeathHelper, and processGroupAlive helpers into a new file named
command_unix_test.go and add a build constraint line //go:build !windows (and
the matching +build comment if you want older toolchains) at the top; ensure the
test and helper functions (TestCommandExecutor_CleansProcessGroupWhenParentDies,
runCommandParentDeathHelper, processGroupAlive) and any required imports are
copied over and any Windows-specific skips removed since the file itself is
Unix-only.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 513184bb-d8d0-45ee-a097-098a7e5472de

📥 Commits

Reviewing files that changed from the base of the PR and between befa9cb and 03beb17.

📒 Files selected for processing (4)
  • internal/cmn/cmdutil/parent_exit_watcher_unix.go
  • internal/cmn/cmdutil/parent_exit_watcher_windows.go
  • internal/runtime/builtin/command/command.go
  • internal/runtime/builtin/command/command_test.go

Comment thread internal/runtime/builtin/command/command_test.go Outdated
Comment thread internal/runtime/builtin/command/command_test.go Outdated
@yohamta0

yohamta0 commented May 8, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 8, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
internal/runtime/builtin/command/command_unix_test.go (1)

34-34: 💤 Low value

Optional: anchor the -test.run regex to prevent unintended matches.

-test.run is an unanchored regular expression — because it is unanchored, an empty expression matches any string. Without anchors any future test whose name contains this substring would also be launched inside the helper subprocess, exercising unexpected code paths.

🔧 Proposed fix
-	cmd := exec.Command(os.Args[0], "-test.run=TestCommandExecutor_CleansProcessGroupWhenParentDies")
+	cmd := exec.Command(os.Args[0], "-test.run=^TestCommandExecutor_CleansProcessGroupWhenParentDies$")
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/runtime/builtin/command/command_unix_test.go` at line 34, The test
helper invocation builds a subprocess with
"-test.run=TestCommandExecutor_CleansProcessGroupWhenParentDies" which is an
unanchored regex and may match unintended tests; update the argument passed to
exec.Command (where the string is constructed in the test setup for
TestCommandExecutor_CleansProcessGroupWhenParentDies) to anchor the regex (e.g.
wrap with ^ and $) so it only matches the exact test name.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@internal/runtime/builtin/command/command_unix_test.go`:
- Line 34: The test helper invocation builds a subprocess with
"-test.run=TestCommandExecutor_CleansProcessGroupWhenParentDies" which is an
unanchored regex and may match unintended tests; update the argument passed to
exec.Command (where the string is constructed in the test setup for
TestCommandExecutor_CleansProcessGroupWhenParentDies) to anchor the regex (e.g.
wrap with ^ and $) so it only matches the exact test name.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: a6770c4e-629d-4013-b9ab-9d8ffe281f26

📥 Commits

Reviewing files that changed from the base of the PR and between 03beb17 and 83a3bbd.

📒 Files selected for processing (1)
  • internal/runtime/builtin/command/command_unix_test.go

@yohamta0

yohamta0 commented May 8, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 8, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@internal/runtime/builtin/command/command_unix_test.go`:
- Around line 88-127: The test runCommandParentDeathHelper currently uses
require.NoError(t, err) on the value received from errCh but Run(ctx) may
legitimately return a non-nil "killed" error when the parent-exit watcher kills
the script; relax this assertion to accept either a nil error or an error whose
message indicates the process was killed (e.g., check err == nil ||
strings.Contains(err.Error(), "killed") or inspect os.IsProcessDone/exit
status), replacing the require.NoError(t, err) in the goroutine receiver block
to tolerate the killed-exit path while still failing on unexpected errors.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1ea49d87-1c66-4997-a14b-80905545909f

📥 Commits

Reviewing files that changed from the base of the PR and between 83a3bbd and 930b21f.

📒 Files selected for processing (1)
  • internal/runtime/builtin/command/command_unix_test.go

Comment thread internal/runtime/builtin/command/command_unix_test.go
@yohamta0

yohamta0 commented May 8, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 8, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (1)
internal/runtime/builtin/command/command_unix_test.go (1)

121-128: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

require.Contains(t, err.Error(), "killed") still stricter than the past suggestion to just log the error.

This select block is only reachable in the narrow race window where the watcher kills the script before the parent kills the helper. In that window err.Error() will be "signal: killed" (SIGKILL from the watcher), so the check is usually safe — but if commandExec.Run wraps the error with a non-"killed" prefix, or the context is cancelled first (producing "context canceled"), the assertion fails spuriously. The past review recommended dropping the assertion to t.Logf.

💡 Proposed fix to match the past comment's suggestion
 	select {
 	case err := <-errCh:
 		if err != nil {
-			require.Contains(t, err.Error(), "killed")
+			t.Logf("commandExec.Run returned: %v (expected if killed by watcher)", err)
 		}
 	case <-time.After(30 * time.Second):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@internal/runtime/builtin/command/command_unix_test.go` around lines 121 -
128, The test's select branch currently asserts that err.Error() contains
"killed", which is brittle; replace the strict assertion in the case receiving
from errCh with a non-failing logger (e.g. call t.Logf or t.Log) that records
the error value instead of require.Contains so the test won't fail if
commandExec.Run wraps the error or the context is canceled; locate the select
that reads from errCh in the test (the block containing require.Contains(t,
err.Error(), "killed")) and change it to log the error when non-nil.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@internal/runtime/builtin/command/command_unix_test.go`:
- Around line 121-128: The test's select branch currently asserts that
err.Error() contains "killed", which is brittle; replace the strict assertion in
the case receiving from errCh with a non-failing logger (e.g. call t.Logf or
t.Log) that records the error value instead of require.Contains so the test
won't fail if commandExec.Run wraps the error or the context is canceled; locate
the select that reads from errCh in the test (the block containing
require.Contains(t, err.Error(), "killed")) and change it to log the error when
non-nil.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: df694e2f-57ae-43d8-a55c-b207ea96335d

📥 Commits

Reviewing files that changed from the base of the PR and between 930b21f and 5dab183.

📒 Files selected for processing (2)
  • internal/runtime/builtin/command/command_unix_test.go
  • internal/service/scheduler/github_dispatch_test.go

@yohamta0 yohamta0 force-pushed the codex/fix-command-parent-death-cleanup branch from 0d7ea15 to e3e9f70 Compare May 8, 2026 04:40
@yohamta0

yohamta0 commented May 8, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented May 8, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@yohamta0

yohamta0 commented May 8, 2026

Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@yohamta0 yohamta0 merged commit 9b28d5e into main May 8, 2026
10 checks passed
@yohamta0 yohamta0 deleted the codex/fix-command-parent-death-cleanup branch May 8, 2026 05:23
@coderabbitai

coderabbitai Bot commented May 8, 2026

Copy link
Copy Markdown
✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant