Skip to content

feat: SIGKILL after SIGTERM#238

Merged
jdx merged 1 commit intojdx:mainfrom
gaojunran:feat-sigkill
Feb 16, 2026
Merged

feat: SIGKILL after SIGTERM#238
jdx merged 1 commit intojdx:mainfrom
gaojunran:feat-sigkill

Conversation

@gaojunran
Copy link
Contributor

@gaojunran gaojunran commented Feb 15, 2026

fix #100.

Copilot AI review requested due to automatic review settings February 15, 2026 12:21
@gemini-code-assist
Copy link

Summary of Changes

Hello @gaojunran, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the daemon termination process by introducing a progressive shutdown strategy. Instead of a single SIGTERM, the system now attempts a graceful shutdown with two SIGTERM signals, followed by a forceful SIGKILL if the process remains unresponsive. This change aims to improve the reliability of stopping daemons, ensuring that processes are terminated effectively while allowing well-behaved applications sufficient time to clean up resources.

Highlights

  • Progressive Daemon Shutdown Strategy: Implemented a robust, progressive daemon termination strategy that escalates from two SIGTERM signals to a final SIGKILL if a process remains unresponsive, ensuring reliable shutdown.
  • Comprehensive Documentation Updates: Updated all relevant documentation, including CLI help, markdown guides, and the architecture overview, to accurately describe the new, more reliable process termination logic.
  • Refactored Process Killing Logic: Refactored the kill function in src/procs.rs to orchestrate the signal escalation and timed waits, with platform-specific handling for Unix and Windows.
  • Streamlined Supervisor Lifecycle: Removed redundant process termination checks from the supervisor's lifecycle, as the new kill function now internally handles the verification of process termination.
  • Improved Test Reliability: Enhanced end-to-end test reliability by adding path canonicalization to assertions, preventing potential issues with symbolic links.
Changelog
  • docs/cli/commands.json
    • Updated the long help description for the stop command to detail the new graceful shutdown strategy.
  • docs/cli/stop.md
    • Revised the documentation for the stop command to explain the progressive signal escalation (SIGTERM, SIGTERM, SIGKILL).
  • docs/concepts/architecture.md
    • Added a new 'Stopping' daemon state.
    • Introduced a dedicated 'Process Termination' section outlining the new shutdown strategy.
    • Expanded 'Readiness Detection' with 'Port' and 'Command' methods.
  • docs/troubleshooting.md
    • Included a new troubleshooting section 'Daemon Won't Stop' which explains the progressive shutdown and provides steps to diagnose issues.
  • pitchfork.usage.kdl
    • Modified the long_help text for the stop command to reflect the updated daemon termination sequence.
  • src/cli/stop.rs
    • Updated the long_about description for the stop command to align with the new progressive shutdown mechanism.
  • src/procs.rs
    • Implemented a new kill function that orchestrates the progressive shutdown, sending SIGTERM, waiting, sending another SIGTERM, waiting, and finally sending SIGKILL if necessary, with platform-specific handling for Windows and Unix.
  • src/supervisor/lifecycle.rs
    • Removed the previous explicit process termination verification loop, relying instead on the new kill function's internal termination checks.
  • tests/test_e2e.rs
    • Added canonicalize() calls to path comparisons in end-to-end tests to ensure robustness against symbolic links.
Activity
  • No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a more robust graceful shutdown strategy, escalating from SIGTERM to SIGKILL, which is a great improvement for process management. The implementation looks solid, and the accompanying documentation updates across various files are clear and consistent. I've made one suggestion in src/procs.rs to refactor the new termination logic for better readability and maintainability by extracting the polling logic into a helper function and using constants for timing values. Overall, this is a well-executed feature enhancement.

Comment on lines +102 to 163
#[cfg(unix)]
{
// Phase 1: First SIGTERM with quick check intervals
if let Some(process) = self.lock_system().process(sysinfo_pid) {
debug!("sending SIGTERM to process {pid}");
process.kill_with(Signal::Term);
}

// Fast check: 10ms intervals for first 100ms (for processes that exit immediately)
for i in 0..10 {
std::thread::sleep(std::time::Duration::from_millis(10));
self.refresh_pids(&[pid]);
if self.lock_system().process(sysinfo_pid).is_none() {
debug!(
"process {pid} terminated after first SIGTERM ({} ms)",
(i + 1) * 10
);
return true;
}
}

// Slower check: 50ms intervals for up to 3 more seconds (100ms + 2900ms = 3000ms total)
for i in 0..58 {
std::thread::sleep(std::time::Duration::from_millis(50));
self.refresh_pids(&[pid]);
if self.lock_system().process(sysinfo_pid).is_none() {
debug!(
"process {pid} terminated after first SIGTERM ({} ms)",
100 + (i + 1) * 50
);
return true;
}
}

// Phase 2: Second SIGTERM for processes that need more time
if let Some(process) = self.lock_system().process(sysinfo_pid) {
debug!("process {pid} still running after ~3s, sending second SIGTERM");
process.kill_with(Signal::Term);
}

// Wait up to 2 more seconds
for i in 0..40 {
std::thread::sleep(std::time::Duration::from_millis(50));
self.refresh_pids(&[pid]);
if self.lock_system().process(sysinfo_pid).is_none() {
debug!(
"process {pid} terminated after second SIGTERM ({} ms)",
3000 + (i + 1) * 50
);
return true;
}
}

// Phase 3: SIGKILL as last resort
if let Some(process) = self.lock_system().process(sysinfo_pid) {
warn!("process {pid} did not respond to SIGTERM after 5s, sending SIGKILL");
process.kill_with(Signal::Kill);
process.wait();
}

true
} else {
false
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The graceful shutdown logic is well-implemented, but it contains some duplicated code and magic numbers for timings and loop counts. This can make the code harder to read and maintain.

I suggest refactoring this block by:

  1. Introducing constants for the timing values to make their purpose explicit.
  2. Extracting the polling logic into a poll_for_exit helper function to reduce code duplication.

This will make the graceful shutdown strategy clearer and easier to modify in the future.

        {
            fn poll_for_exit(
                procs: &Procs,
                pid: u32,
                sysinfo_pid: sysinfo::Pid,
                checks: u32,
                interval: std::time::Duration,
                base_time_ms: u64,
                signal_name: &str,
            ) -> bool {
                for i in 0..checks {
                    std::thread::sleep(interval);
                    procs.refresh_pids(&[pid]);
                    if procs.lock_system().process(sysinfo_pid).is_none() {
                        let elapsed = base_time_ms + (i + 1) as u64 * interval.as_millis() as u64;
                        debug!(
                            "process {pid} terminated after {signal_name} ({} ms)",
                            elapsed
                        );
                        return true;
                    }
                }
                false
            }

            const FAST_CHECK_INTERVAL: std::time::Duration = std::time::Duration::from_millis(10);
            const FAST_CHECKS: u32 = 10;
            const SLOW_CHECK_INTERVAL: std::time::Duration = std::time::Duration::from_millis(50);
            const PHASE1_SLOW_CHECKS: u32 = 58; // (3000ms - 100ms) / 50ms
            const PHASE2_CHECKS: u32 = 40; // 2000ms / 50ms

            // Phase 1: First SIGTERM
            if let Some(process) = self.lock_system().process(sysinfo_pid) {
                debug!("sending SIGTERM to process {pid}");
                process.kill_with(Signal::Term);
            }

            if poll_for_exit(self, pid, sysinfo_pid, FAST_CHECKS, FAST_CHECK_INTERVAL, 0, "first SIGTERM") {
                return true;
            }
            if poll_for_exit(self, pid, sysinfo_pid, PHASE1_SLOW_CHECKS, SLOW_CHECK_INTERVAL, 100, "first SIGTERM") {
                return true;
            }

            // Phase 2: Second SIGTERM
            if let Some(process) = self.lock_system().process(sysinfo_pid) {
                debug!("process {pid} still running after ~3s, sending second SIGTERM");
                process.kill_with(Signal::Term);
            }

            if poll_for_exit(self, pid, sysinfo_pid, PHASE2_CHECKS, SLOW_CHECK_INTERVAL, 3000, "second SIGTERM") {
                return true;
            }

            // Phase 3: SIGKILL
            if let Some(process) = self.lock_system().process(sysinfo_pid) {
                warn!("process {pid} did not respond to SIGTERM after 5s, sending SIGKILL");
                process.kill_with(Signal::Kill);
                process.wait();
            }

            true
        }

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements a progressive shutdown strategy for daemon processes, adding automatic SIGKILL escalation after two SIGTERM attempts. The implementation addresses issue #100 by ensuring stubborn processes that don't respond to SIGTERM are eventually force-terminated after ~5 seconds.

Changes:

  • Implements SIGTERM → SIGTERM → SIGKILL escalation strategy in src/procs.rs with intelligent timing (10ms checks initially, then 50ms intervals)
  • Removes redundant process termination verification from src/supervisor/lifecycle.rs since the new kill strategy is more robust
  • Updates all documentation to reflect the new shutdown behavior
  • Fixes unrelated macOS test failures by canonicalizing paths to resolve symlinks

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
src/procs.rs Implements progressive kill strategy with 3-phase escalation (SIGTERM at 0s, SIGTERM at 3s, SIGKILL at 5s) on Unix; Windows uses simple kill()
src/supervisor/lifecycle.rs Removes old termination verification loop, relying on new robust kill implementation
src/cli/stop.rs Updates CLI help text to document the new shutdown strategy
pitchfork.usage.kdl Updates usage specification with new shutdown documentation
docs/troubleshooting.md Adds comprehensive troubleshooting section for daemon stop issues
docs/concepts/architecture.md Documents the progressive termination strategy and adds "Stopping" state
docs/cli/stop.md Updates stop command documentation to reflect new behavior
docs/cli/commands.json Updates generated command documentation
tests/test_e2e.rs Fixes macOS test failures by canonicalizing paths to resolve symlinks like /var → /private/var

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +93 to +100
#[cfg(windows)]
{
if let Some(process) = self.lock_system().process(sysinfo_pid) {
process.kill();
process.wait();
}
return true;
}
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Windows implementation does not use the same progressive SIGTERM -> SIGTERM -> SIGKILL strategy that's used on Unix. On Windows, it immediately kills the process with no retry logic. This discrepancy should be documented, or the Windows implementation should be updated to provide similar graceful shutdown behavior (though Windows doesn't have SIGTERM/SIGKILL, it could implement retries with TerminateProcess or similar). At minimum, the documentation (cli/stop.rs, docs) should note that the progressive shutdown only applies to Unix systems.

Copilot uses AI. Check for mistakes.
|-------|---------|
| Running | Process is alive |
| Waiting | Waiting for ready check |
| Stopping | Being terminated (SIGTERM sent) |
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The description "Being terminated (SIGTERM sent)" is not fully accurate since the Stopping state covers the entire termination process, including when SIGKILL is sent after 5 seconds. Consider updating to something like "Being terminated (shutdown in progress)" to be more accurate about what this state represents.

Suggested change
| Stopping | Being terminated (SIGTERM sent) |
| Stopping | Being terminated (shutdown in progress) |

Copilot uses AI. Check for mistakes.
src/procs.rs Outdated
Comment on lines +155 to +161
// Phase 3: SIGKILL as last resort
if let Some(process) = self.lock_system().process(sysinfo_pid) {
warn!("process {pid} did not respond to SIGTERM after 5s, sending SIGKILL");
process.kill_with(Signal::Kill);
process.wait();
}

Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After sending SIGKILL, there's no verification that the process actually terminated. While SIGKILL is generally unignorable, there's a brief window where the process could be in an uninterruptible state (e.g., kernel I/O wait). Consider adding a final check after wait() to ensure the process is gone, and log a warning if it's still running. This would make the behavior consistent with the earlier phases and help identify edge cases where SIGKILL might not immediately succeed.

Suggested change
// Phase 3: SIGKILL as last resort
if let Some(process) = self.lock_system().process(sysinfo_pid) {
warn!("process {pid} did not respond to SIGTERM after 5s, sending SIGKILL");
process.kill_with(Signal::Kill);
process.wait();
}
// Phase 3: SIGKILL as last resort
let mut sigkill_sent = false;
if let Some(process) = self.lock_system().process(sysinfo_pid) {
warn!("process {pid} did not respond to SIGTERM after 5s, sending SIGKILL");
process.kill_with(Signal::Kill);
process.wait();
sigkill_sent = true;
}
// Final verification after SIGKILL and wait()
if sigkill_sent {
self.refresh_pids(&[pid]);
if self.lock_system().process(sysinfo_pid).is_some() {
warn!("process {pid} appears to still be running after SIGKILL and wait()");
}
}

Copilot uses AI. Check for mistakes.
Comment on lines +989 to 995
// Canonicalize to resolve symlinks (e.g., /var -> /private/var on macOS)
let expected = subdir.canonicalize().unwrap();
assert_eq!(
pwd,
subdir.to_str().unwrap(),
expected.to_str().unwrap(),
"Daemon should run in the subdirectory"
);
Copy link

Copilot AI Feb 15, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test changes that add canonicalize() to resolve symlinks on macOS appear to be unrelated to the main purpose of this PR (adding SIGKILL escalation). While these are valid fixes for macOS test failures, they would be better suited for a separate PR or at least mentioned in the PR description. This makes it clearer what changes are part of the SIGKILL feature vs. what are test infrastructure improvements.

Copilot uses AI. Check for mistakes.
Copy link
Owner

@jdx jdx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AI-generated review (Claude Code)

Overall

Solid approach — progressive shutdown is a real need for a daemon supervisor. The docs and help text updates are thorough. A few concerns below, roughly ordered by severity.

Issues

Zombie process detection

The polling loop uses self.lock_system().process(sysinfo_pid).is_none() to detect exit. On Linux, zombie processes still have /proc/[pid] entries, so sysinfo returns Some(process) for them. If the tokio Child monitoring task doesn't reap quickly enough, a process that exited cleanly on the first SIGTERM will appear "still running" and get unnecessarily escalated through all phases to SIGKILL.

Consider also checking for zombie status:

if self.lock_system().process(sysinfo_pid)
    .map(|p| matches!(p.status(), sysinfo::ProcessStatus::Zombie))
    .unwrap_or(true)
{
    return true;
}

No process.wait() for SIGTERM exits

The old code always called process.wait() to reap the child. The new code only calls wait() in the SIGKILL path. For processes that exit from SIGTERM (detected via polling), there's no explicit reap — this relies on the supervisor's monitoring task. Probably fine in practice but it's a subtle behavioral change.

Second SIGTERM adds delay without clear benefit

SIGTERM → SIGTERM → SIGKILL is unusual. If a process ignores the first SIGTERM, it will almost certainly ignore the second. The standard pattern is SIGTERM → wait → SIGKILL. The second SIGTERM adds 2s of delay before the inevitable SIGKILL with no practical benefit for most cases. Simpler approach: SIGTERM → wait 3-5s → SIGKILL.

Lifecycle verification removal

The old post-kill verification loop in lifecycle.rs is removed with a comment that kill_async now guarantees termination. This is correct if the zombie concern above is addressed. Otherwise kill_async could return true for a zombie that still holds resources (ports, files), and the daemon state gets set to Stopped prematurely.

Nits

  • The architecture doc additions (Port/Command readiness types, Stopping status) and the test canonicalize fixes seem unrelated to the SIGKILL feature — could be separate commits/PRs.
  • The fast 10ms initial polling is a nice touch for the common case.

@gaojunran gaojunran changed the title feat: SIGKILL after two SIGTERMs feat: SIGKILL after SIGTERM Feb 16, 2026
@jdx jdx merged commit 98b1e4c into jdx:main Feb 16, 2026
4 checks passed
@jdx jdx mentioned this pull request Feb 16, 2026
jdx added a commit that referenced this pull request Feb 16, 2026
## 🤖 New release

* `pitchfork-cli`: 1.4.3 -> 1.5.0

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [1.5.0](v1.4.3...v1.5.0) -
2026-02-16

### Added

- *(supervisor)* use process groups for atomic daemon shutdown
([#239](#239))
- SIGKILL after SIGTERM
([#238](#238))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Release metadata only (version/changelog updates) with no functional
code changes in this PR.
> 
> **Overview**
> Prepares the `v1.5.0` release by bumping the `pitchfork-cli` version
from `1.4.3` to `1.5.0` in `Cargo.toml`/`Cargo.lock`.
> 
> Updates `CHANGELOG.md` with a new `1.5.0` entry noting supervisor
shutdown improvements (process groups for atomic shutdown) and enforcing
`SIGKILL` after `SIGTERM`.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
a3def3b. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants