fix: refactor the logic of stopping a daemon and add tests#192
Conversation
There was a problem hiding this comment.
Pull request overview
This PR refactors the daemon stop logic to avoid a race between process monitoring and killing, introduces richer IPC responses for stop outcomes, and adds e2e tests to validate correct stop behavior (including child processes and already-stopped daemons).
Changes:
- Adjusted supervisor
stoplifecycle logic to kill child processes first, update daemon status explicitly, and return more specificIpcResponsevariants for different stop outcomes. - Expanded the IPC protocol and client handling to distinguish between “not running”, “not found”, “was not running”, and “stop failed” cases, moving user-facing messaging into the client.
- Added e2e tests for HTTP/port/cmd readiness log flushing and for various
stopscenarios (normal stop, with children, and already-stopped daemons).
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
tests/test_e2e.rs |
Adds small sleeps before reading logs to reduce flakiness and introduces new stop-related e2e tests validating that daemons transition to stopped quickly, handle child processes, and behave correctly when already stopped. |
src/supervisor/lifecycle.rs |
Refines the Supervisor::stop implementation to kill child PIDs first, handle kill errors by inspecting actual process state, update daemon status to Stopped or revert to Running if kill fails, and return new detailed IPC responses. |
src/ipc/mod.rs |
Extends the IpcResponse enum with new stop-related variants (DaemonWasNotRunning, DaemonStopFailed, DaemonNotRunning, DaemonNotFound) and reformats existing variants. |
src/ipc/client.rs |
Updates the IPC client’s stop method to handle the new response variants and log appropriate messages, while preserving a simple Result<()> API to callers. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| IpcResponse::DaemonStopFailed { error } => { | ||
| error!("failed to stop daemon {}: {}", id, error); | ||
| Ok(()) | ||
| } |
There was a problem hiding this comment.
IpcResponse::DaemonStopFailed is treated as a successful outcome here (logging an error but still returning Ok(())), so callers like the CLI stop command and TUI will consider the stop operation successful even when the supervisor explicitly reports that the process is still running. To avoid confusing users and allow callers to surface a non-zero exit code when a stop fails, this variant should be translated into an error Result rather than being handled as success.
There was a problem hiding this comment.
Here we need the error message and do not care about the Result.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Process successfully stopped | ||
| self.upsert_daemon(UpsertDaemonOpts { | ||
| id: id.to_string(), | ||
| pid: None, | ||
| status: DaemonStatus::Stopped, | ||
| ..Default::default() | ||
| }) | ||
| .await?; |
There was a problem hiding this comment.
When marking a daemon as stopped after a manual stop call, last_exit_success is no longer updated, because this upsert_daemon call does not set it and the monitor task now bails out when it sees pid: None. Previously, the monitor updated last_exit_success (often to false for a SIGTERM), which is used by the cron retrigger logic (see src/supervisor/watchers.rs:126-131) to decide whether to rerun on Success or Fail. With the new flow, a daemon stopped via stop() ends up with last_exit_success unchanged/None, so CronRetrigger::Success and ::Fail will both treat the last run as neither success nor failure and will never trigger. To preserve existing behavior, this path should explicitly set an appropriate last_exit_success value (likely Some(false) for a manual/forced stop), or the monitor logic should be adjusted so that it still records the exit result for stopped daemons.
| /// Test stop command with a daemon that has child processes | ||
| #[test] | ||
| fn test_stop_kills_child_processes() { | ||
| let env = TestEnv::new(); | ||
| env.ensure_binary_exists().unwrap(); | ||
|
|
||
| // Use bash to spawn a child process | ||
| let toml_content = r#" | ||
| [daemons.stop_children_test] | ||
| run = "bash -c 'sleep 60 & sleep 60 & wait'" | ||
| ready_delay = 1 | ||
| "#; | ||
| env.create_toml(toml_content); | ||
|
|
||
| // Start the daemon | ||
| let output = env.run_command(&["start", "stop_children_test"]); | ||
| assert!( | ||
| output.status.success(), | ||
| "Start command should succeed: {}", | ||
| String::from_utf8_lossy(&output.stderr) | ||
| ); | ||
|
|
||
| // Verify daemon is running | ||
| let status = env.get_daemon_status("stop_children_test"); | ||
| assert_eq!( | ||
| status, | ||
| Some("running".to_string()), | ||
| "Daemon should be running" | ||
| ); | ||
|
|
||
| // Stop the daemon | ||
| let stop_start = std::time::Instant::now(); | ||
| let output = env.run_command(&["stop", "stop_children_test"]); | ||
| let stop_elapsed = stop_start.elapsed(); | ||
|
|
||
| println!("Stop took: {:?}", stop_elapsed); | ||
|
|
||
| assert!(output.status.success(), "Stop command should succeed"); | ||
|
|
||
| // Stop should complete in reasonable time even with child processes | ||
| assert!( | ||
| stop_elapsed < Duration::from_secs(10), | ||
| "Stop should complete in reasonable time, took {:?}", | ||
| stop_elapsed | ||
| ); | ||
|
|
||
| // Daemon should be stopped | ||
| let status = env.get_daemon_status("stop_children_test"); | ||
| assert_eq!( | ||
| status, | ||
| Some("stopped".to_string()), | ||
| "Daemon should be stopped" | ||
| ); | ||
| } |
There was a problem hiding this comment.
The test_stop_kills_child_processes test name and comment suggest validating that child processes are actually terminated before the parent, but the assertions only check that stop returns quickly and that the daemon status becomes stopped. This means the new child-killing behavior in Supervisor::stop (calling PROCS.all_children and kill_async on each child) is not directly covered by tests; the test would still pass if the child-kill loop were removed and only the parent were killed, potentially leaving orphaned sleep processes. Consider extending this test to assert that no child PIDs remain (e.g., by inspecting the process list) so the behavior "kill children first, then parent" is explicitly exercised.
|
bugbot run |
jdx
left a comment
There was a problem hiding this comment.
Thanks for the refactor! The core changes look good - killing children first and fixing the race condition are solid improvements.
One change needed before merge:
DaemonStopFailed should return Err, not Ok(())
Currently in src/ipc/client.rs:
IpcResponse::DaemonStopFailed { error } => {
error!("failed to stop daemon {}: {}", id, error);
Ok(()) // This should be Err
}If a user runs pitchfork stop foo and the daemon is still running afterward, that's a failure - the CLI should exit non-zero so scripts and users know something went wrong.
Optional but worth considering:
- Set
last_exit_successwhen manually stopping a daemon (currently unset, which may affectCronRetrigger::Success/Faillogic)
|
bugbot run |
7e7d8f6 to
9a4c64d
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9a4c64d to
5fb6bb3
Compare
|
bugbot run |
|
btw, |
## 🤖 New release * `pitchfork-cli`: 1.2.0 -> 1.3.0 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [1.3.0](v1.2.0...v1.3.0) - 2026-02-01 ### Added - *(list)* show available daemons and align logics with TUI ([#206](#206)) - *(logs)* support `--since <humantime>`, use pager by default ([#204](#204)) - support `pitchfork.local.toml` ([#198](#198)) - impl `stop --all` ([#195](#195)) - beautify web ui ([#191](#191)) - add ready_cmd option ([#187](#187)) ### Fixed - refactor the logic of stopping a daemon and add tests ([#192](#192)) ### Other - re-order code to suit for multi-frontend structure ([#197](#197)) - *(deps)* update rust crate xx to v2.3.1 ([#203](#203)) - *(deps)* update rust crate clap to v4.5.56 ([#202](#202)) - *(ci)* run linting on all files in CI ([#196](#196)) - Update README.md logo ([#184](#184)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Low risk: this PR only updates version numbers and release notes/docs, with no runtime code changes. > > **Overview** > Releases **v1.3.0** by bumping the crate version from `1.2.0` to `1.3.0` (including `Cargo.lock`) and updating the `pitchfork.usage.kdl`/generated CLI docs to match. > > Updates `CHANGELOG.md` with the `1.3.0` release notes and links to the included feature/fix PRs. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit d278b90. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
StoppingtoStoppedrelies onwait()in the monitor task. However, killing the process itself should alsowait()the process. There exists a race condition.warn!()) and should return different kinds ofIPCResponseand they should be handled by client. (Later a new PR will refactor more, this PR only includes stopping a daemon)Note
Improves reliability of daemon stopping and clarifies IPC semantics.
stop()to kill child processes before parent, verify termination, and set status toStoppingthenStopped; reverts toRunningand returnsDaemonStopFailedif kill failsIpcResponsewithDaemonWasNotRunning,DaemonStopFailed { error },DaemonNotRunning, andDaemonNotFound; removesDaemonAlreadyStoppedstop()to handle new responses and surfaceDaemonError::StopFailedon failureDaemonError::StopFaileddiagnostic typeWritten by Cursor Bugbot for commit 5fb6bb3. This will update automatically on new commits. Configure here.