feat: add daemon lifecycle hooks and retry count env vars#245
Conversation
Add configurable hooks (on_ready, on_fail, on_retry) that fire shell commands in response to daemon lifecycle events, and inject PITCHFORK_DAEMON_ID and PITCHFORK_RETRY_COUNT environment variables into every daemon process. Closes #242, closes #215 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Summary of ChangesHello @jdx, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the daemon management capabilities by introducing lifecycle hooks and providing more contextual information to daemon processes via environment variables. These additions allow users to define custom actions for critical daemon events like readiness, failure, and retry attempts, improving automation, monitoring, and debugging workflows without interfering with the core daemon operations. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature: daemon lifecycle hooks and additional environment variables. The implementation is solid and well-tested. I've identified a few areas for improvement, primarily related to code duplication in the supervisor's lifecycle management, which could be refactored to improve maintainability. There's also a potential performance consideration in how hook configurations are loaded, and a minor point of confusion in one of the new tests. Overall, this is a great addition to the project.
| let pt = PitchforkToml::all_merged(); | ||
| let hook_cmd = pt | ||
| .daemons | ||
| .get(&daemon_id) | ||
| .and_then(|d| get_hook_cmd(&d.hooks, &hook_type)); |
There was a problem hiding this comment.
Calling PitchforkToml::all_merged() inside tokio::spawn for every hook execution means that all configuration files are read from disk and parsed every time a hook is fired. While this ensures the hook uses the absolute latest configuration, it can be inefficient, especially for frequently triggered hooks like on_retry in a scenario where a service is flapping. This could lead to performance issues due to repeated disk I/O and parsing overhead.
Since using the latest config is an explicit design goal mentioned in the PR, this is a trade-off. However, it's worth considering if there's a middle ground, perhaps involving a cached configuration that's refreshed periodically, to mitigate potential performance bottlenecks.
| if let Some(tx) = ready_tx.take() { | ||
| let _ = tx.send(Ok(())); | ||
| } | ||
| fire_hook(HookType::OnReady, id.clone(), daemon_dir.clone(), hook_retry_count, hook_daemon_env.clone(), vec![]); |
There was a problem hiding this comment.
There is significant code duplication for handling the 'ready' state. The logic to notify readiness (including flushing logs, sending on the ready_tx channel, and firing the on_ready hook) is repeated in 6 different places within the select! loop. This makes the code harder to maintain and prone to errors if changes are not applied consistently to all copies.
Consider refactoring this logic into a private helper function within the tokio::spawn block. This function could encapsulate setting ready_notified, sending on ready_tx, and firing the hook.
For example:
async fn notify_ready(
id: &str,
ready_notified: &mut bool,
ready_tx: &mut Option<oneshot::Sender<Result<(), Option<i32>>>>,
log_appender: &mut BufWriter<tokio::fs::File>,
// ... other params for fire_hook
) {
if *ready_notified { return; }
info!("daemon {} ready", id);
*ready_notified = true;
let _ = log_appender.flush().await;
if let Some(tx) = ready_tx.take() {
let _ = tx.send(Ok(()));
}
fire_hook(HookType::OnReady, /* ... */);
}You could then call notify_ready(...).await in each of the readiness detection branches.
| // Fire on_fail hook if retries are exhausted | ||
| if hook_retry_count >= hook_retry { | ||
| fire_hook( | ||
| HookType::OnFail, | ||
| id.clone(), | ||
| daemon_dir.clone(), | ||
| hook_retry_count, | ||
| hook_daemon_env.clone(), | ||
| vec![("PITCHFORK_EXIT_CODE".to_string(), exit_code.to_string())], | ||
| ); | ||
| } |
There was a problem hiding this comment.
The logic for handling a daemon's failure is duplicated. This block for a successful wait() but failed exit code is very similar to the block in the final else clause (lines 605-629) which handles a wait() error. Both blocks update the daemon status to Errored and fire the on_fail hook if retries are exhausted.
This duplication can be consolidated into a helper function to improve maintainability. The function could take the exit_code as an argument and perform the necessary state updates and hook firing.
Example refactoring:
async fn handle_failure(id: &str, exit_code: i32, hook_retry_count: u32, hook_retry: Retry, /*...other params*/) {
let err_status = DaemonStatus::Errored(exit_code);
if let Err(e) = SUPERVISOR
.upsert_daemon(UpsertDaemonOpts {
id: id.to_string(),
pid: None,
status: err_status,
last_exit_success: Some(false),
..Default::default()
})
.await
{
error!("Failed to update daemon state for {id}: {e}");
}
// Fire on_fail hook if retries are exhausted
if hook_retry_count >= hook_retry {
fire_hook(
HookType::OnFail,
id.to_string(),
// ... params
vec![("PITCHFORK_EXIT_CODE".to_string(), exit_code.to_string())],
);
}
}You could then call this from both failure paths with the appropriate exit_code.
| let output = env.run_command_with_env( | ||
| &["start", "fail_retry_hook"], | ||
| &[("PITCHFORK_INTERVAL_SECS", "1")], | ||
| ); | ||
| println!("start stdout: {}", String::from_utf8_lossy(&output.stdout)); |
There was a problem hiding this comment.
This test logic seems a bit confusing. It appears to start the fail_retry_hook daemon twice. The second run_command_with_env call, which sets PITCHFORK_INTERVAL_SECS=1, is likely ineffective. The supervisor process is started by the first run_command call and will not pick up the environment variable from the second call.
The test probably passes because the first start command uses --delay 1, which triggers the wait_ready retry logic in supervisor/lifecycle.rs. This logic has its own backoff schedule and doesn't depend on PITCHFORK_INTERVAL_SECS.
To make this test clearer and more robust, consider either:
- Relying only on the
wait_readyretry logic and removing the secondrun_commandcall. - If testing the background retry mechanism is intended, ensure the supervisor is started with the
PITCHFORK_INTERVAL_SECS=1environment variable from the very beginning.
There was a problem hiding this comment.
Pull request overview
Adds configurable daemon lifecycle hooks and injects daemon metadata env vars into both daemon processes and hook commands, enabling external automation around daemon readiness/failure/retries.
Changes:
- Add
[daemons.<name>.hooks]config (on_ready/on_fail/on_retry) and expose it in generated JSON schema. - Inject
PITCHFORK_DAEMON_ID+PITCHFORK_RETRY_COUNTinto daemon processes; pass these (+PITCHFORK_EXIT_CODEon failure) into hook commands. - Fire hooks from supervisor lifecycle + retry paths, and add unit/E2E tests covering parsing and hook/env behavior.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_pitchfork_toml.rs | Adds unit tests validating hooks config parsing (none/partial/full). |
| tests/test_hooks.rs | New E2E coverage for on_ready/on_retry/on_fail hooks and injected env vars. |
| src/tui/app.rs | Updates TUI config editing defaults to include hooks: None. |
| src/supervisor/retry.rs | Fires on_retry hook before retry restarts; reuses computed daemon dir. |
| src/supervisor/mod.rs | Registers new hooks supervisor module. |
| src/supervisor/lifecycle.rs | Injects new env vars into daemons; fires hooks on ready/retry/final fail. |
| src/supervisor/hooks.rs | Implements fire-and-forget hook execution with env injection. |
| src/pitchfork_toml.rs | Adds PitchforkTomlHooks + hooks field on daemon config. |
| src/deps.rs | Updates test helper daemon config struct to include hooks: None. |
| src/cli/config/add.rs | Ensures new daemon entries default hooks: None. |
| docs/public/schema.json | Publishes hooks in JSON schema output. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| let output = env.run_command(&["start", "fail_retry_hook", "--delay", "1"]); | ||
| println!("start stdout: {}", String::from_utf8_lossy(&output.stdout)); | ||
|
|
||
| // Background retries happen on 10s interval, so we need to wait | ||
| // Use PITCHFORK_INTERVAL_SECS=1 to speed this up | ||
| let output = env.run_command_with_env( | ||
| &["start", "fail_retry_hook"], |
There was a problem hiding this comment.
test_hook_on_fail_after_retries starts the same daemon twice (first with --delay 1, then again with PITCHFORK_INTERVAL_SECS=1). The second invocation likely won’t affect the already-running supervisor/daemon, and may just return DaemonAlreadyRunning, making the test flaky or ineffective. Make this a single start call: either (a) remove the first start and run once with run_command_with_env (setting PITCHFORK_INTERVAL_SECS up front), or (b) keep the --delay 1 path and delete the second start entirely.
| let output = env.run_command(&["start", "fail_retry_hook", "--delay", "1"]); | |
| println!("start stdout: {}", String::from_utf8_lossy(&output.stdout)); | |
| // Background retries happen on 10s interval, so we need to wait | |
| // Use PITCHFORK_INTERVAL_SECS=1 to speed this up | |
| let output = env.run_command_with_env( | |
| &["start", "fail_retry_hook"], | |
| // Background retries happen on 10s interval, so we need to wait | |
| // Use PITCHFORK_INTERVAL_SECS=1 to speed this up | |
| let output = env.run_command_with_env( | |
| &["start", "fail_retry_hook", "--delay", "1"], |
| tokio::spawn(async move { | ||
| let pt = PitchforkToml::all_merged(); | ||
| let hook_cmd = pt |
There was a problem hiding this comment.
fire_hook calls PitchforkToml::all_merged() inside a tokio::spawn task, and all_merged() does blocking filesystem I/O. This can block Tokio worker threads under load (especially if hooks fire frequently on retries). Consider loading the merged config via tokio::task::spawn_blocking and then continuing with the async hook execution.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ites PITCHFORK_DAEMON_ID and PITCHFORK_RETRY_COUNT are now injected after user-configured env vars in both daemon spawning and hook execution, preventing user config from silently overwriting them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (1)
src/supervisor/retry.rs:113
- When a daemon's command fails to parse during a retry attempt or when no run command is found, the daemon is marked as exhausted (retry_count set to retry value) but the
on_failhook is not fired. This means users won't be notified when their daemon permanently fails due to configuration errors during retry.
Consider firing the on_fail hook in these error paths to maintain consistent behavior with other failure scenarios.
error!("failed to parse command for daemon {id}: {e}");
// Mark as exhausted to prevent infinite retry loop, preserving error status
self.upsert_daemon(UpsertDaemonOpts {
id,
status: daemon.status.clone(),
retry_count: Some(daemon.retry),
..Default::default()
})
.await?;
continue;
}
};
let dir = daemon.dir.unwrap_or_else(|| env::CWD.clone());
fire_hook(
HookType::OnRetry,
id.clone(),
dir.clone(),
daemon.retry_count + 1,
daemon.env.clone(),
vec![],
);
let retry_opts = RunOptions {
id: id.clone(),
cmd,
force: false,
shell_pid: daemon.shell_pid,
dir,
autostop: daemon.autostop,
cron_schedule: daemon.cron_schedule,
cron_retrigger: daemon.cron_retrigger,
retry: daemon.retry,
retry_count: daemon.retry_count + 1,
ready_delay: daemon.ready_delay,
ready_output: daemon.ready_output.clone(),
ready_http: daemon.ready_http.clone(),
ready_port: daemon.ready_port,
ready_cmd: daemon.ready_cmd.clone(),
wait_ready: false,
depends: daemon.depends.clone(),
env: daemon.env.clone(),
};
if let Err(e) = self.run(retry_opts).await {
error!("failed to retry daemon {id}: {e}");
}
} else {
warn!("no run command found for daemon {id}, cannot retry");
// Mark as exhausted
self.upsert_daemon(UpsertDaemonOpts {
id,
retry_count: Some(daemon.retry),
..Default::default()
})
.await?;
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| // Start with retry=2 and ready_delay so it uses wait_ready path | ||
| let output = env.run_command(&["start", "fail_retry_hook", "--delay", "1"]); | ||
| println!("start stdout: {}", String::from_utf8_lossy(&output.stdout)); | ||
|
|
||
| // Background retries happen on 10s interval, so we need to wait | ||
| // Use PITCHFORK_INTERVAL_SECS=1 to speed this up | ||
| let output = env.run_command_with_env( | ||
| &["start", "fail_retry_hook"], | ||
| &[("PITCHFORK_INTERVAL_SECS", "1")], | ||
| ); | ||
| println!("start stdout: {}", String::from_utf8_lossy(&output.stdout)); |
There was a problem hiding this comment.
This test has confusing logic that makes it difficult to understand what behavior is being tested. The test starts the same daemon twice (lines 117 and 122-125), which seems incorrect.
The comment on line 120-121 suggests testing "background retries" but the first start command uses --delay 1 which triggers foreground retries (via the wait_ready path in lifecycle.rs lines 80-121). The purpose of the second start command is unclear - if the first command exhausted all retries, the second start would reset retry_count to 0.
Consider either:
- Removing the first start command and only using the second one (to test background retries), OR
- Removing the second start command if the first one is sufficient, OR
- Adding clarifying comments explaining why both start commands are needed
## 🤖 New release * `pitchfork-cli`: 1.5.0 -> 1.6.0 <details><summary><i><b>Changelog</b></i></summary><p> <blockquote> ## [1.6.0](v1.5.0...v1.6.0) - 2026-02-21 ### Added - *(web)* add PITCHFORK_WEB_PATH support for reverse proxy path prefixes ([#244](#244)) - add daemon lifecycle hooks and retry count env vars ([#245](#245)) ### Fixed - pass cwd to ready_cmd spawning ([#243](#243)) </blockquote> </p></details> --- This PR was generated with [release-plz](https://github.com/release-plz/release-plz/). <!-- CURSOR_SUMMARY --> --- > [!NOTE] > **Low Risk** > Release-only version/documentation updates; no functional code changes are included in this diff. > > **Overview** > Prepares the `pitchfork-cli` **v1.6.0** release by bumping the crate/version metadata from `1.5.0` to `1.6.0` (including `Cargo.toml`, `Cargo.lock`, and generated CLI docs/spec files). > > Updates `CHANGELOG.md` with the new `1.6.0` entry summarizing the release’s added features and the `ready_cmd` working-directory fix. > > <sup>Written by [Cursor Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit e00649e. This will update automatically on new commits. Configure [here](https://cursor.com/dashboard?tab=bugbot).</sup> <!-- /CURSOR_SUMMARY --> --------- Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Summary
on_ready,on_fail,on_retry) under[daemons.<name>.hooks]in pitchfork.toml that fire shell commands in response to daemon eventsPITCHFORK_DAEMON_IDandPITCHFORK_RETRY_COUNTenvironment variables into every daemon processPITCHFORK_EXIT_CODE(foron_fail)PitchforkToml::all_merged()Config example
Closes #242, closes #215
Test plan
cargo buildcompilescargo fmt --checkpassescargo clippy --all-targets --all-features -- -D warningspassescargo nextest run(all existing + 10 new)🤖 Generated with Claude Code
Note
Medium Risk
Touches core supervisor lifecycle/retry paths and spawns additional background shell commands, which could affect process behavior and timing if misconfigured, but is gated by optional config and covered by new tests.
Overview
Adds per-daemon lifecycle hooks under
[daemons.<id>.hooks](on_ready,on_retry,on_fail) and wires the supervisor to fire them on readiness, before retry attempts (both startup retry loop and background retries), and when failures exhaust retries.Daemon processes now always receive
PITCHFORK_DAEMON_IDandPITCHFORK_RETRY_COUNT, while hook commands run fire-and-forget in the daemon working directory with the same env plusPITCHFORK_EXIT_CODEforon_fail; configuration/schema/docs are updated accordingly and new tests cover parsing and end-to-end hook/env behavior.Written by Cursor Bugbot for commit d9da016. This will update automatically on new commits. Configure here.