Skip to content

fix: refactor the logic of stopping a daemon and add tests#192

Merged
jdx merged 3 commits intojdx:mainfrom
gaojunran:push-sysnoknsrxrt
Jan 27, 2026
Merged

fix: refactor the logic of stopping a daemon and add tests#192
jdx merged 3 commits intojdx:mainfrom
gaojunran:push-sysnoknsrxrt

Conversation

@gaojunran
Copy link
Contributor

@gaojunran gaojunran commented Jan 26, 2026

  1. In the past, the change from Stopping to Stopped relies on wait() in the monitor task. However, killing the process itself should also wait() the process. There exists a race condition.
  2. Child processes should be killed first, then the parent process.
  3. Refactor: in the IPC server, there should be less (no if possible) user interactive (such as warn!()) and should return different kinds of IPCResponse and they should be handled by client. (Later a new PR will refactor more, this PR only includes stopping a daemon)
  4. Add e2e tests for stopping a daemon. Note that tests are written by LLM.

Note

Improves reliability of daemon stopping and clarifies IPC semantics.

  • Refactors supervisor stop() to kill child processes before parent, verify termination, and set status to Stopping then Stopped; reverts to Running and returns DaemonStopFailed if kill fails
  • Expands IpcResponse with DaemonWasNotRunning, DaemonStopFailed { error }, DaemonNotRunning, and DaemonNotFound; removes DaemonAlreadyStopped
  • Updates IPC client stop() to handle new responses and surface DaemonError::StopFailed on failure
  • Adds DaemonError::StopFailed diagnostic type
  • Enhances e2e tests: new stop scenarios (transition to stopped, killing child processes, idempotent stop) and minor log flush waits in readiness tests

Written by Cursor Bugbot for commit 5fb6bb3. This will update automatically on new commits. Configure here.

Copilot AI review requested due to automatic review settings January 26, 2026 13:28
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the daemon stop logic to avoid a race between process monitoring and killing, introduces richer IPC responses for stop outcomes, and adds e2e tests to validate correct stop behavior (including child processes and already-stopped daemons).

Changes:

  • Adjusted supervisor stop lifecycle logic to kill child processes first, update daemon status explicitly, and return more specific IpcResponse variants for different stop outcomes.
  • Expanded the IPC protocol and client handling to distinguish between “not running”, “not found”, “was not running”, and “stop failed” cases, moving user-facing messaging into the client.
  • Added e2e tests for HTTP/port/cmd readiness log flushing and for various stop scenarios (normal stop, with children, and already-stopped daemons).

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
tests/test_e2e.rs Adds small sleeps before reading logs to reduce flakiness and introduces new stop-related e2e tests validating that daemons transition to stopped quickly, handle child processes, and behave correctly when already stopped.
src/supervisor/lifecycle.rs Refines the Supervisor::stop implementation to kill child PIDs first, handle kill errors by inspecting actual process state, update daemon status to Stopped or revert to Running if kill fails, and return new detailed IPC responses.
src/ipc/mod.rs Extends the IpcResponse enum with new stop-related variants (DaemonWasNotRunning, DaemonStopFailed, DaemonNotRunning, DaemonNotFound) and reformats existing variants.
src/ipc/client.rs Updates the IPC client’s stop method to handle the new response variants and log appropriate messages, while preserving a simple Result<()> API to callers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +300 to +303
IpcResponse::DaemonStopFailed { error } => {
error!("failed to stop daemon {}: {}", id, error);
Ok(())
}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IpcResponse::DaemonStopFailed is treated as a successful outcome here (logging an error but still returning Ok(())), so callers like the CLI stop command and TUI will consider the stop operation successful even when the supervisor explicitly reports that the process is still running. To avoid confusing users and allow callers to surface a non-zero exit code when a stop fails, this variant should be translated into an error Result rather than being handled as success.

Copilot uses AI. Check for mistakes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we need the error message and do not care about the Result.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings January 26, 2026 13:33
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +623 to +630
// Process successfully stopped
self.upsert_daemon(UpsertDaemonOpts {
id: id.to_string(),
pid: None,
status: DaemonStatus::Stopped,
..Default::default()
})
.await?;
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When marking a daemon as stopped after a manual stop call, last_exit_success is no longer updated, because this upsert_daemon call does not set it and the monitor task now bails out when it sees pid: None. Previously, the monitor updated last_exit_success (often to false for a SIGTERM), which is used by the cron retrigger logic (see src/supervisor/watchers.rs:126-131) to decide whether to rerun on Success or Fail. With the new flow, a daemon stopped via stop() ends up with last_exit_success unchanged/None, so CronRetrigger::Success and ::Fail will both treat the last run as neither success nor failure and will never trigger. To preserve existing behavior, this path should explicitly set an appropriate last_exit_success value (likely Some(false) for a manual/forced stop), or the monitor logic should be adjusted so that it still records the exit result for stopped daemons.

Copilot uses AI. Check for mistakes.
Comment on lines +1092 to +1145
/// Test stop command with a daemon that has child processes
#[test]
fn test_stop_kills_child_processes() {
let env = TestEnv::new();
env.ensure_binary_exists().unwrap();

// Use bash to spawn a child process
let toml_content = r#"
[daemons.stop_children_test]
run = "bash -c 'sleep 60 & sleep 60 & wait'"
ready_delay = 1
"#;
env.create_toml(toml_content);

// Start the daemon
let output = env.run_command(&["start", "stop_children_test"]);
assert!(
output.status.success(),
"Start command should succeed: {}",
String::from_utf8_lossy(&output.stderr)
);

// Verify daemon is running
let status = env.get_daemon_status("stop_children_test");
assert_eq!(
status,
Some("running".to_string()),
"Daemon should be running"
);

// Stop the daemon
let stop_start = std::time::Instant::now();
let output = env.run_command(&["stop", "stop_children_test"]);
let stop_elapsed = stop_start.elapsed();

println!("Stop took: {:?}", stop_elapsed);

assert!(output.status.success(), "Stop command should succeed");

// Stop should complete in reasonable time even with child processes
assert!(
stop_elapsed < Duration::from_secs(10),
"Stop should complete in reasonable time, took {:?}",
stop_elapsed
);

// Daemon should be stopped
let status = env.get_daemon_status("stop_children_test");
assert_eq!(
status,
Some("stopped".to_string()),
"Daemon should be stopped"
);
}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test_stop_kills_child_processes test name and comment suggest validating that child processes are actually terminated before the parent, but the assertions only check that stop returns quickly and that the daemon status becomes stopped. This means the new child-killing behavior in Supervisor::stop (calling PROCS.all_children and kill_async on each child) is not directly covered by tests; the test would still pass if the child-kill loop were removed and only the parent were killed, potentially leaving orphaned sleep processes. Consider extending this test to assert that no child PIDs remain (e.g., by inspecting the process list) so the behavior "kill children first, then parent" is explicitly exercised.

Copilot uses AI. Check for mistakes.
@jdx
Copy link
Owner

jdx commented Jan 26, 2026

bugbot run

Copy link
Owner

@jdx jdx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the refactor! The core changes look good - killing children first and fixing the race condition are solid improvements.

One change needed before merge:

DaemonStopFailed should return Err, not Ok(())

Currently in src/ipc/client.rs:

IpcResponse::DaemonStopFailed { error } => {
    error!("failed to stop daemon {}: {}", id, error);
    Ok(())  // This should be Err
}

If a user runs pitchfork stop foo and the daemon is still running afterward, that's a failure - the CLI should exit non-zero so scripts and users know something went wrong.

Optional but worth considering:

  • Set last_exit_success when manually stopping a daemon (currently unset, which may affect CronRetrigger::Success/Fail logic)

@gaojunran gaojunran requested a review from jdx January 27, 2026 04:56
@jdx
Copy link
Owner

jdx commented Jan 27, 2026

bugbot run

Copilot AI review requested due to automatic review settings January 27, 2026 12:52
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@jdx
Copy link
Owner

jdx commented Jan 27, 2026

bugbot run

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

✅ Bugbot reviewed your changes and found no new issues!

Comment @cursor review or bugbot run to trigger another review on this PR

@gaojunran
Copy link
Contributor Author

gaojunran commented Jan 27, 2026

btw, pf stop --all will be implemented in the next PR, considering the dependency order

@jdx jdx merged commit d8540a6 into jdx:main Jan 27, 2026
4 checks passed
@jdx jdx mentioned this pull request Jan 27, 2026
jdx added a commit that referenced this pull request Feb 1, 2026
## 🤖 New release

* `pitchfork-cli`: 1.2.0 -> 1.3.0

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [1.3.0](v1.2.0...v1.3.0) -
2026-02-01

### Added

- *(list)* show available daemons and align logics with TUI
([#206](#206))
- *(logs)* support `--since <humantime>`, use pager by default
([#204](#204))
- support `pitchfork.local.toml`
([#198](#198))
- impl `stop --all` ([#195](#195))
- beautify web ui ([#191](#191))
- add ready_cmd option
([#187](#187))

### Fixed

- refactor the logic of stopping a daemon and add tests
([#192](#192))

### Other

- re-order code to suit for multi-frontend structure
([#197](#197))
- *(deps)* update rust crate xx to v2.3.1
([#203](#203))
- *(deps)* update rust crate clap to v4.5.56
([#202](#202))
- *(ci)* run linting on all files in CI
([#196](#196))
- Update README.md logo
([#184](#184))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Low risk: this PR only updates version numbers and release notes/docs,
with no runtime code changes.
> 
> **Overview**
> Releases **v1.3.0** by bumping the crate version from `1.2.0` to
`1.3.0` (including `Cargo.lock`) and updating the
`pitchfork.usage.kdl`/generated CLI docs to match.
> 
> Updates `CHANGELOG.md` with the `1.3.0` release notes and links to the
included feature/fix PRs.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
d278b90. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants