Skip to content

fix(exec): kill child process tree on timeout to prevent orphaned tasks#378

Merged
mengzhuo merged 1 commit intosipeed:mainfrom
lunareed720:fix/exec-timeout-process-tree
Feb 20, 2026
Merged

fix(exec): kill child process tree on timeout to prevent orphaned tasks#378
mengzhuo merged 1 commit intosipeed:mainfrom
lunareed720:fix/exec-timeout-process-tree

Conversation

@lunareed720
Copy link

@lunareed720 lunareed720 commented Feb 17, 2026

Summary

This fixes orphaned background processes when exec commands time out.

What changed

  • Run shell commands in a dedicated process group on Unix (Setpgid: true)
  • On timeout/cancel, terminate the full process tree instead of only the parent shell
    • Unix: kill process group (kill -SIGKILL -pgid)
    • Windows: taskkill /T /F /PID fallback
  • Switch ExecTool execution flow to Start + Wait with explicit timeout path and cleanup
  • Add regression test (Unix): verifies timed-out command with spawned child (sleep) does not leave child process alive

Why

Issue #311 reports that failed/timed-out shell commands keep running in background. This can leak resources and degrade long-running agent stability.

Testing

  • go test ./pkg/tools -run TestShellTool -count=1

Closes #311

Copy link

@nikolasdehor nikolasdehor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid PR — the approach (process groups on Unix, taskkill /T on Windows) is the canonical solution for orphaned child processes.

What I liked:

  • Setpgid: true + syscall.Kill(-pid, SIGKILL) is the textbook Go approach for process group cleanup ✓
  • Build tags (!windows / windows) for platform separation are idiomatic ✓
  • The regression test spawns sleep 60 and verifies it gets killed — well-structured ✓
  • Already-exited process handling is correct (nil checks, discarded ESRCH errors) ✓
  • Race between normal completion and timeout is handled correctly by the select
  • errors.Is improvement over == for context.DeadlineExceeded

Suggestions (non-blocking):

  1. Double-kill interaction: The command is created with exec.CommandContext, so Go's internal goroutine will also try to kill the direct child when the context expires. This is benign (killing an already-killed process is harmless), but slightly confusing. On Go 1.20+, you could use cmd.Cancel to hook terminateProcessTree directly into the CommandContext machinery, eliminating the hybrid approach:

    cmd.Cancel = func() error {
        return terminateProcessTree(cmd)
    }
    cmd.WaitDelay = 2 * time.Second
  2. terminateProcessTree always returns nil — either return meaningful errors or change to no return value, since callers always discard it with _ =.

  3. Consider adding a normal-completion test alongside the timeout test, to verify the Start + Wait refactor doesn't break the happy path.

  4. SIGTERM before SIGKILL: Currently goes straight to SIGKILL. For an exec tool where the command has already timed out this is defensible, but a brief SIGTERM grace period (e.g., 500ms) before SIGKILL would allow children to flush buffers and clean up temp files.

Approving — these are all improvements, not blockers.

@mengzhuo mengzhuo merged commit 5b525f6 into sipeed:main Feb 20, 2026
3 checks passed
@Orgmar
Copy link
Contributor

Orgmar commented Feb 21, 2026

@lunareed720 Solid fix! Killing the entire process group instead of just the parent shell is the right call to avoid orphaned tasks. The cross-platform handling with Setpgid on Unix and taskkill on Windows is thoughtful too.

We're building the PicoClaw Dev Group on Discord for contributors to connect. If you're interested, drop an email to support@sipeed.com with the subject [Join PicoClaw Dev Group] lunareed720 and we'll send you an invite!

@lxowalle lxowalle mentioned this pull request Mar 3, 2026
10 tasks
hyperwd pushed a commit to hyperwd/picoclaw that referenced this pull request Mar 5, 2026
…ss-tree

fix(exec): kill child process tree on timeout to prevent orphaned tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Shell process need kill

4 participants