feat(supervisor): use process groups for atomic daemon shutdown by jdx · Pull Request #239 · jdx/pitchfork

jdx · 2026-02-16T22:59:02Z

Summary

Spawn each daemon in its own session via setsid() so the daemon PID becomes the process group leader (PID == PGID)
Replace the sequential /proc-walking child kill logic with a single killpg() call that atomically signals the entire process group
Eliminates race conditions where a child could fork between the proc scan and the kill, and is faster for multi-child daemons

Test plan

cargo build — clean, no warnings
cargo nextest run — all 154 tests pass (including test_stop_kills_child_processes)
cargo clippy and cargo fmt --check — clean
Manual: start a daemon that spawns children (e.g. npm run dev), stop it, verify no orphans
Manual: start a daemon that ignores SIGTERM, verify SIGKILL escalation kills the whole group

🤖 Generated with Claude Code

Note

Medium Risk
Changes core daemon lifecycle behavior and low-level signal/session handling, which could affect shutdown reliability or accidentally signal the wrong processes if PGID assumptions break.

Overview
Daemon shutdown is now process-group based. On Unix, daemon spawn now calls setsid() so each daemon becomes its own process-group leader, enabling atomic signaling of the whole tree.

Stopping a daemon switches from /proc-walking and sequentially killing children + parent to a single killpg()-based graceful shutdown path (SIGTERM with ~3s polling, then SIGKILL fallback), with a non-Unix fallback to the existing single-process kill.

Minor test cleanups: simplify kill_port’s lsof handling and drop an unused chrono import.

^{Written by Cursor Bugbot for commit 5c0dfb6. This will update automatically on new commits. Configure here.}

Spawn each daemon in its own session via setsid() so that stopping a daemon can kill the entire process tree with a single killpg() call instead of walking /proc to find children sequentially. This eliminates race conditions where a child could fork between the scan and the kill, and is faster for multi-child daemons since the whole group receives the signal atomically. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

gemini-code-assist · 2026-02-16T22:59:17Z

Summary of Changes

Hello @jdx, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the daemon termination process to be more reliable and efficient. By leveraging Unix process groups, the system can now ensure that when a daemon is stopped, all its associated child processes are also terminated atomically. This change addresses potential race conditions and improves the overall stability of daemon management.

Highlights

Atomic Daemon Shutdown: Implemented a robust mechanism for atomically shutting down daemons and all their descendant processes using Unix process groups.
Process Group Creation: Daemons are now spawned in their own session via setsid(), making the daemon's PID the process group leader (PGID).
Efficient Termination: Replaced the previous sequential /proc-walking child kill logic with a single killpg() call, which signals the entire process group, eliminating race conditions and improving speed.

Changelog

src/procs.rs
- Added kill_process_group_async and kill_process_group functions for graceful and atomic termination of process groups, including SIGTERM and SIGKILL escalation.
- Annotated the all_children function with #[allow(dead_code)] and updated its documentation to reflect that it is no longer used in the kill path.
src/supervisor/lifecycle.rs
- Modified the daemon spawning logic to use libc::setsid() on Unix systems, ensuring each daemon runs in its own session and process group.
- Updated the daemon stopping mechanism to call the new kill_process_group_async function, replacing the previous iterative child process killing.
tests/common/mod.rs
- Refactored a conditional check for command output success using the && operator for improved readability.
tests/test_e2e_logs.rs
- Removed an unused chrono import statement.

Activity

No specific activity (comments, reviews, progress) has been recorded for this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a more robust and atomic way to shut down daemons by using process groups. The change from iterating through child processes to using killpg is a significant improvement, eliminating potential race conditions. The implementation correctly uses setsid when spawning processes and provides a graceful shutdown sequence with SIGTERM followed by SIGKILL. The code is well-structured, and the changes are also reflected in the tests. I have one suggestion to refactor some duplicated code for better maintainability, but overall this is a solid improvement.

gemini-code-assist · 2026-02-16T23:00:19Z

src/procs.rs

+        // Fast check: 10ms intervals for first 100ms
+        for i in 0..10 {
+            std::thread::sleep(std::time::Duration::from_millis(10));
+            self.refresh_pids(&[pid]);
+            if self.is_terminated_or_zombie(sysinfo::Pid::from_u32(pid)) {
+                debug!(
+                    "process group {pgid} terminated after SIGTERM ({} ms)",
+                    (i + 1) * 10
+                );
+                return true;
+            }
+        }
+
+        // Slower check: 50ms intervals for up to ~3 more seconds
+        for i in 0..58 {
+            std::thread::sleep(std::time::Duration::from_millis(50));
+            self.refresh_pids(&[pid]);
+            if self.is_terminated_or_zombie(sysinfo::Pid::from_u32(pid)) {
+                debug!(
+                    "process group {pgid} terminated after SIGTERM ({} ms)",
+                    100 + (i + 1) * 50
+                );
+                return true;
+            }
+        }


The waiting logic with polling after sending SIGTERM is implemented in two separate loops, which leads to some code duplication. This can be refactored into a single loop using an iterator chain to make the code more concise and easier to maintain.

// Wait for graceful shutdown, with a fast initial check. let fast_checks = std::iter::repeat(std::time::Duration::from_millis(10)).take(10); let slow_checks = std::iter::repeat(std::time::Duration::from_millis(50)).take(58); let mut elapsed_ms = 0; for sleep_duration in fast_checks.chain(slow_checks) { std::thread::sleep(sleep_duration); self.refresh_pids(&[pid]); elapsed_ms += sleep_duration.as_millis() as u64; if self.is_terminated_or_zombie(sysinfo::Pid::from_u32(pid)) { debug!( "process group {pgid} terminated after SIGTERM ({} ms)", elapsed_ms ); return true; } }

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{Bugbot Autofix is ON, but a Cloud Agent failed to start.}

src/procs.rs

When the group leader is a zombie, child processes may still be running. The early return on zombie detection would skip killpg(), leaving those children as orphans. Remove the zombie check before killpg() and let ESRCH from killpg() handle the "no such process group" case instead. Also refactors the two polling loops into a single iterator chain per reviewer feedback. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Copilot

Pull request overview

This PR updates the supervisor’s daemon shutdown strategy to use Unix process groups, enabling an atomic “kill the whole daemon tree” operation and avoiding races inherent in /proc-walking.

Changes:

Spawn daemons in their own session/process group via setsid() (Unix-only) to make the daemon PID the process group ID.
Replace per-child termination logic with a single process-group kill path (killpg) with SIGTERM→wait→SIGKILL escalation.
Minor test cleanup/refactor (remove unused import; simplify/adjust port cleanup loop).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
`src/supervisor/lifecycle.rs`	Adds `setsid()` at spawn time and switches stop logic to process-group kill.
`src/procs.rs`	Introduces process-group kill implementation and keeps old `/proc` traversal helper for diagnostics.
`tests/common/mod.rs`	Refactors `kill_port` to avoid sleeping per PID and uses a let-chain style condition.
`tests/test_e2e_logs.rs`	Removes an unused `chrono` import.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-16T23:07:13Z

src/procs.rs

+        debug!("killing process group {pgid}");
+
+        // Send SIGTERM to the entire process group.
+        // killpg sends to all processes in the group atomically.


kill_process_group() uses the leader PID’s status (is_terminated_or_zombie(pid)) as the stopping condition. That can return early/false while other processes in the process group are still running (e.g., the group leader exits quickly but children keep running/ignore SIGTERM), which can leave orphans even though the API reports success. Consider checking process-group existence instead (e.g., poll killpg(pgid, 0) until it returns ESRCH, and don’t treat leader exit/zombie as sufficient).

Copilot · 2026-02-16T23:07:14Z

src/procs.rs

+            if err.raw_os_error() == Some(libc::ESRCH) {
+                debug!("process group {pgid} no longer exists");
+                return false;
+            }
+            warn!("failed to send SIGTERM to process group {pgid}: {err}");
+        }
+
+        // Wait for graceful shutdown: fast initial check then slower polling.
+        let fast_checks = std::iter::repeat_n(std::time::Duration::from_millis(10), 10);


When libc::killpg() fails with an error other than ESRCH, the function only logs a warning and continues, ultimately returning true. That can cause stop() to mark a daemon as Stopped even though no signals were delivered (e.g., EPERM). Consider propagating this as an error/false return, or at least validating the process group is actually gone (again via killpg(pgid, 0) -> ESRCH) before returning success.

Copilot · 2026-02-16T23:07:14Z

src/supervisor/lifecycle.rs

+                    // Kill the entire process group atomically (daemon PID == PGID
+                    // because we called setsid() at spawn time)
+                    if let Err(e) = PROCS.kill_process_group_async(pid).await {


stop() only attempts the new process-group kill inside the if PROCS.is_running(pid) branch. With process groups, the group may still contain running children even if the original leader PID has already exited, so this guard can skip the killpg path and leave orphan processes. Consider attempting the process-group kill based on the stored PGID regardless of leader liveness (and let killpg(..., 0)/ESRCH determine whether anything remains).

## 🤖 New release * `pitchfork-cli`: 1.4.3 -> 1.5.0 <details><summary>Changelog</summary> <blockquote> ## [1.5.0](v1.4.3...v1.5.0) - 2026-02-16 ### Added - *(supervisor)* use process groups for atomic daemon shutdown ([#239](#239)) - SIGKILL after SIGTERM ([#238](#238)) </blockquote> </details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/).  --- > [!NOTE] > **Low Risk** > Release metadata only (version/changelog updates) with no functional code changes in this PR. > > **Overview** > Prepares the `v1.5.0` release by bumping the `pitchfork-cli` version from `1.4.3` to `1.5.0` in `Cargo.toml`/`Cargo.lock`. > > Updates `CHANGELOG.md` with a new `1.5.0` entry noting supervisor shutdown improvements (process groups for atomic shutdown) and enforcing `SIGKILL` after `SIGTERM`. > > Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit a3def3b. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).

Copilot AI review requested due to automatic review settings February 16, 2026 22:59

gemini-code-assist bot reviewed Feb 16, 2026

View reviewed changes

cursor bot reviewed Feb 16, 2026

View reviewed changes

src/procs.rs Outdated Show resolved Hide resolved

Copilot started reviewing on behalf of jdx February 16, 2026 23:01 View session

Copilot AI reviewed Feb 16, 2026

View reviewed changes

jdx merged commit 1e8f64d into main Feb 16, 2026
5 checks passed

jdx deleted the feat/process-group-isolation branch February 16, 2026 23:09

jdx mentioned this pull request Feb 16, 2026

chore: release v1.5.0 #234

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(supervisor): use process groups for atomic daemon shutdown#239

feat(supervisor): use process groups for atomic daemon shutdown#239
jdx merged 2 commits intomainfrom
feat/process-group-isolation

jdx commented Feb 16, 2026 •

edited by cursor bot

Loading

Uh oh!

gemini-code-assist bot commented Feb 16, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Feb 16, 2026

Uh oh!

cursor bot left a comment

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Copilot AI Feb 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jdx commented Feb 16, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

gemini-code-assist bot commented Feb 16, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jdx commented Feb 16, 2026 •

edited by cursor bot

Loading