Conversation
Summary of ChangesHello @gaojunran, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the daemon termination process by introducing a progressive shutdown strategy. Instead of a single SIGTERM, the system now attempts a graceful shutdown with two SIGTERM signals, followed by a forceful SIGKILL if the process remains unresponsive. This change aims to improve the reliability of stopping daemons, ensuring that processes are terminated effectively while allowing well-behaved applications sufficient time to clean up resources. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a more robust graceful shutdown strategy, escalating from SIGTERM to SIGKILL, which is a great improvement for process management. The implementation looks solid, and the accompanying documentation updates across various files are clear and consistent. I've made one suggestion in src/procs.rs to refactor the new termination logic for better readability and maintainability by extracting the polling logic into a helper function and using constants for timing values. Overall, this is a well-executed feature enhancement.
| #[cfg(unix)] | ||
| { | ||
| // Phase 1: First SIGTERM with quick check intervals | ||
| if let Some(process) = self.lock_system().process(sysinfo_pid) { | ||
| debug!("sending SIGTERM to process {pid}"); | ||
| process.kill_with(Signal::Term); | ||
| } | ||
|
|
||
| // Fast check: 10ms intervals for first 100ms (for processes that exit immediately) | ||
| for i in 0..10 { | ||
| std::thread::sleep(std::time::Duration::from_millis(10)); | ||
| self.refresh_pids(&[pid]); | ||
| if self.lock_system().process(sysinfo_pid).is_none() { | ||
| debug!( | ||
| "process {pid} terminated after first SIGTERM ({} ms)", | ||
| (i + 1) * 10 | ||
| ); | ||
| return true; | ||
| } | ||
| } | ||
|
|
||
| // Slower check: 50ms intervals for up to 3 more seconds (100ms + 2900ms = 3000ms total) | ||
| for i in 0..58 { | ||
| std::thread::sleep(std::time::Duration::from_millis(50)); | ||
| self.refresh_pids(&[pid]); | ||
| if self.lock_system().process(sysinfo_pid).is_none() { | ||
| debug!( | ||
| "process {pid} terminated after first SIGTERM ({} ms)", | ||
| 100 + (i + 1) * 50 | ||
| ); | ||
| return true; | ||
| } | ||
| } | ||
|
|
||
| // Phase 2: Second SIGTERM for processes that need more time | ||
| if let Some(process) = self.lock_system().process(sysinfo_pid) { | ||
| debug!("process {pid} still running after ~3s, sending second SIGTERM"); | ||
| process.kill_with(Signal::Term); | ||
| } | ||
|
|
||
| // Wait up to 2 more seconds | ||
| for i in 0..40 { | ||
| std::thread::sleep(std::time::Duration::from_millis(50)); | ||
| self.refresh_pids(&[pid]); | ||
| if self.lock_system().process(sysinfo_pid).is_none() { | ||
| debug!( | ||
| "process {pid} terminated after second SIGTERM ({} ms)", | ||
| 3000 + (i + 1) * 50 | ||
| ); | ||
| return true; | ||
| } | ||
| } | ||
|
|
||
| // Phase 3: SIGKILL as last resort | ||
| if let Some(process) = self.lock_system().process(sysinfo_pid) { | ||
| warn!("process {pid} did not respond to SIGTERM after 5s, sending SIGKILL"); | ||
| process.kill_with(Signal::Kill); | ||
| process.wait(); | ||
| } | ||
|
|
||
| true | ||
| } else { | ||
| false | ||
| } |
There was a problem hiding this comment.
The graceful shutdown logic is well-implemented, but it contains some duplicated code and magic numbers for timings and loop counts. This can make the code harder to read and maintain.
I suggest refactoring this block by:
- Introducing constants for the timing values to make their purpose explicit.
- Extracting the polling logic into a
poll_for_exithelper function to reduce code duplication.
This will make the graceful shutdown strategy clearer and easier to modify in the future.
{
fn poll_for_exit(
procs: &Procs,
pid: u32,
sysinfo_pid: sysinfo::Pid,
checks: u32,
interval: std::time::Duration,
base_time_ms: u64,
signal_name: &str,
) -> bool {
for i in 0..checks {
std::thread::sleep(interval);
procs.refresh_pids(&[pid]);
if procs.lock_system().process(sysinfo_pid).is_none() {
let elapsed = base_time_ms + (i + 1) as u64 * interval.as_millis() as u64;
debug!(
"process {pid} terminated after {signal_name} ({} ms)",
elapsed
);
return true;
}
}
false
}
const FAST_CHECK_INTERVAL: std::time::Duration = std::time::Duration::from_millis(10);
const FAST_CHECKS: u32 = 10;
const SLOW_CHECK_INTERVAL: std::time::Duration = std::time::Duration::from_millis(50);
const PHASE1_SLOW_CHECKS: u32 = 58; // (3000ms - 100ms) / 50ms
const PHASE2_CHECKS: u32 = 40; // 2000ms / 50ms
// Phase 1: First SIGTERM
if let Some(process) = self.lock_system().process(sysinfo_pid) {
debug!("sending SIGTERM to process {pid}");
process.kill_with(Signal::Term);
}
if poll_for_exit(self, pid, sysinfo_pid, FAST_CHECKS, FAST_CHECK_INTERVAL, 0, "first SIGTERM") {
return true;
}
if poll_for_exit(self, pid, sysinfo_pid, PHASE1_SLOW_CHECKS, SLOW_CHECK_INTERVAL, 100, "first SIGTERM") {
return true;
}
// Phase 2: Second SIGTERM
if let Some(process) = self.lock_system().process(sysinfo_pid) {
debug!("process {pid} still running after ~3s, sending second SIGTERM");
process.kill_with(Signal::Term);
}
if poll_for_exit(self, pid, sysinfo_pid, PHASE2_CHECKS, SLOW_CHECK_INTERVAL, 3000, "second SIGTERM") {
return true;
}
// Phase 3: SIGKILL
if let Some(process) = self.lock_system().process(sysinfo_pid) {
warn!("process {pid} did not respond to SIGTERM after 5s, sending SIGKILL");
process.kill_with(Signal::Kill);
process.wait();
}
true
}There was a problem hiding this comment.
Pull request overview
This PR implements a progressive shutdown strategy for daemon processes, adding automatic SIGKILL escalation after two SIGTERM attempts. The implementation addresses issue #100 by ensuring stubborn processes that don't respond to SIGTERM are eventually force-terminated after ~5 seconds.
Changes:
- Implements SIGTERM → SIGTERM → SIGKILL escalation strategy in
src/procs.rswith intelligent timing (10ms checks initially, then 50ms intervals) - Removes redundant process termination verification from
src/supervisor/lifecycle.rssince the new kill strategy is more robust - Updates all documentation to reflect the new shutdown behavior
- Fixes unrelated macOS test failures by canonicalizing paths to resolve symlinks
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| src/procs.rs | Implements progressive kill strategy with 3-phase escalation (SIGTERM at 0s, SIGTERM at 3s, SIGKILL at 5s) on Unix; Windows uses simple kill() |
| src/supervisor/lifecycle.rs | Removes old termination verification loop, relying on new robust kill implementation |
| src/cli/stop.rs | Updates CLI help text to document the new shutdown strategy |
| pitchfork.usage.kdl | Updates usage specification with new shutdown documentation |
| docs/troubleshooting.md | Adds comprehensive troubleshooting section for daemon stop issues |
| docs/concepts/architecture.md | Documents the progressive termination strategy and adds "Stopping" state |
| docs/cli/stop.md | Updates stop command documentation to reflect new behavior |
| docs/cli/commands.json | Updates generated command documentation |
| tests/test_e2e.rs | Fixes macOS test failures by canonicalizing paths to resolve symlinks like /var → /private/var |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #[cfg(windows)] | ||
| { | ||
| if let Some(process) = self.lock_system().process(sysinfo_pid) { | ||
| process.kill(); | ||
| process.wait(); | ||
| } | ||
| return true; | ||
| } |
There was a problem hiding this comment.
The Windows implementation does not use the same progressive SIGTERM -> SIGTERM -> SIGKILL strategy that's used on Unix. On Windows, it immediately kills the process with no retry logic. This discrepancy should be documented, or the Windows implementation should be updated to provide similar graceful shutdown behavior (though Windows doesn't have SIGTERM/SIGKILL, it could implement retries with TerminateProcess or similar). At minimum, the documentation (cli/stop.rs, docs) should note that the progressive shutdown only applies to Unix systems.
| |-------|---------| | ||
| | Running | Process is alive | | ||
| | Waiting | Waiting for ready check | | ||
| | Stopping | Being terminated (SIGTERM sent) | |
There was a problem hiding this comment.
The description "Being terminated (SIGTERM sent)" is not fully accurate since the Stopping state covers the entire termination process, including when SIGKILL is sent after 5 seconds. Consider updating to something like "Being terminated (shutdown in progress)" to be more accurate about what this state represents.
| | Stopping | Being terminated (SIGTERM sent) | | |
| | Stopping | Being terminated (shutdown in progress) | |
src/procs.rs
Outdated
| // Phase 3: SIGKILL as last resort | ||
| if let Some(process) = self.lock_system().process(sysinfo_pid) { | ||
| warn!("process {pid} did not respond to SIGTERM after 5s, sending SIGKILL"); | ||
| process.kill_with(Signal::Kill); | ||
| process.wait(); | ||
| } | ||
|
|
There was a problem hiding this comment.
After sending SIGKILL, there's no verification that the process actually terminated. While SIGKILL is generally unignorable, there's a brief window where the process could be in an uninterruptible state (e.g., kernel I/O wait). Consider adding a final check after wait() to ensure the process is gone, and log a warning if it's still running. This would make the behavior consistent with the earlier phases and help identify edge cases where SIGKILL might not immediately succeed.
| // Phase 3: SIGKILL as last resort | |
| if let Some(process) = self.lock_system().process(sysinfo_pid) { | |
| warn!("process {pid} did not respond to SIGTERM after 5s, sending SIGKILL"); | |
| process.kill_with(Signal::Kill); | |
| process.wait(); | |
| } | |
| // Phase 3: SIGKILL as last resort | |
| let mut sigkill_sent = false; | |
| if let Some(process) = self.lock_system().process(sysinfo_pid) { | |
| warn!("process {pid} did not respond to SIGTERM after 5s, sending SIGKILL"); | |
| process.kill_with(Signal::Kill); | |
| process.wait(); | |
| sigkill_sent = true; | |
| } | |
| // Final verification after SIGKILL and wait() | |
| if sigkill_sent { | |
| self.refresh_pids(&[pid]); | |
| if self.lock_system().process(sysinfo_pid).is_some() { | |
| warn!("process {pid} appears to still be running after SIGKILL and wait()"); | |
| } | |
| } |
| // Canonicalize to resolve symlinks (e.g., /var -> /private/var on macOS) | ||
| let expected = subdir.canonicalize().unwrap(); | ||
| assert_eq!( | ||
| pwd, | ||
| subdir.to_str().unwrap(), | ||
| expected.to_str().unwrap(), | ||
| "Daemon should run in the subdirectory" | ||
| ); |
There was a problem hiding this comment.
The test changes that add canonicalize() to resolve symlinks on macOS appear to be unrelated to the main purpose of this PR (adding SIGKILL escalation). While these are valid fixes for macOS test failures, they would be better suited for a separate PR or at least mentioned in the PR description. This makes it clearer what changes are part of the SIGKILL feature vs. what are test infrastructure improvements.
jdx
left a comment
There was a problem hiding this comment.
AI-generated review (Claude Code)
Overall
Solid approach — progressive shutdown is a real need for a daemon supervisor. The docs and help text updates are thorough. A few concerns below, roughly ordered by severity.
Issues
Zombie process detection
The polling loop uses self.lock_system().process(sysinfo_pid).is_none() to detect exit. On Linux, zombie processes still have /proc/[pid] entries, so sysinfo returns Some(process) for them. If the tokio Child monitoring task doesn't reap quickly enough, a process that exited cleanly on the first SIGTERM will appear "still running" and get unnecessarily escalated through all phases to SIGKILL.
Consider also checking for zombie status:
if self.lock_system().process(sysinfo_pid)
.map(|p| matches!(p.status(), sysinfo::ProcessStatus::Zombie))
.unwrap_or(true)
{
return true;
}No process.wait() for SIGTERM exits
The old code always called process.wait() to reap the child. The new code only calls wait() in the SIGKILL path. For processes that exit from SIGTERM (detected via polling), there's no explicit reap — this relies on the supervisor's monitoring task. Probably fine in practice but it's a subtle behavioral change.
Second SIGTERM adds delay without clear benefit
SIGTERM → SIGTERM → SIGKILL is unusual. If a process ignores the first SIGTERM, it will almost certainly ignore the second. The standard pattern is SIGTERM → wait → SIGKILL. The second SIGTERM adds 2s of delay before the inevitable SIGKILL with no practical benefit for most cases. Simpler approach: SIGTERM → wait 3-5s → SIGKILL.
Lifecycle verification removal
The old post-kill verification loop in lifecycle.rs is removed with a comment that kill_async now guarantees termination. This is correct if the zombie concern above is addressed. Otherwise kill_async could return true for a zombie that still holds resources (ports, files), and the daemon state gets set to Stopped prematurely.
Nits
- The architecture doc additions (Port/Command readiness types, Stopping status) and the test canonicalize fixes seem unrelated to the SIGKILL feature — could be separate commits/PRs.
- The fast 10ms initial polling is a nice touch for the common case.
98b3cc8 to
70fe1cc
Compare
## 🤖 New release * `pitchfork-cli`: 1.4.3 -> 1.5.0 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [1.5.0](v1.4.3...v1.5.0) - 2026-02-16 ### Added - *(supervisor)* use process groups for atomic daemon shutdown ([#239](#239)) - SIGKILL after SIGTERM ([#238](#238)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Release metadata only (version/changelog updates) with no functional code changes in this PR. > > **Overview** > Prepares the `v1.5.0` release by bumping the `pitchfork-cli` version from `1.4.3` to `1.5.0` in `Cargo.toml`/`Cargo.lock`. > > Updates `CHANGELOG.md` with a new `1.5.0` entry noting supervisor shutdown improvements (process groups for atomic shutdown) and enforcing `SIGKILL` after `SIGTERM`. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit a3def3b. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY -->
fix #100.