Skip to content

feat: add daemon lifecycle hooks and retry count env vars#245

Merged
jdx merged 3 commits intomainfrom
feat/daemon-hooks-env-vars
Feb 21, 2026
Merged

feat: add daemon lifecycle hooks and retry count env vars#245
jdx merged 3 commits intomainfrom
feat/daemon-hooks-env-vars

Conversation

@jdx
Copy link
Owner

@jdx jdx commented Feb 21, 2026

Summary

  • Add configurable lifecycle hooks (on_ready, on_fail, on_retry) under [daemons.<name>.hooks] in pitchfork.toml that fire shell commands in response to daemon events
  • Inject PITCHFORK_DAEMON_ID and PITCHFORK_RETRY_COUNT environment variables into every daemon process
  • Hook commands also receive these env vars plus PITCHFORK_EXIT_CODE (for on_fail)
  • Hooks are fire-and-forget (spawned as background tokio tasks, errors logged but don't block)
  • Hooks read config fresh at execution time via PitchforkToml::all_merged()

Config example

[daemons.api]
run = "node server.js"
retry = 3

[daemons.api.hooks]
on_ready = "curl -X POST https://alerts.example.com/ready"
on_fail = "./scripts/cleanup.sh"
on_retry = "echo 'retrying...'"

Closes #242, closes #215

Test plan

  • cargo build compiles
  • cargo fmt --check passes
  • cargo clippy --all-targets --all-features -- -D warnings passes
  • 164 tests pass via cargo nextest run (all existing + 10 new)
  • New unit tests: hooks config parsing (with/without/partial hooks)
  • New E2E tests: on_ready hook, on_fail (no retry), on_fail (after retries exhausted), on_retry hook, PITCHFORK_DAEMON_ID env var, PITCHFORK_RETRY_COUNT env var, hook env vars

🤖 Generated with Claude Code


Note

Medium Risk
Touches core supervisor lifecycle/retry paths and spawns additional background shell commands, which could affect process behavior and timing if misconfigured, but is gated by optional config and covered by new tests.

Overview
Adds per-daemon lifecycle hooks under [daemons.<id>.hooks] (on_ready, on_retry, on_fail) and wires the supervisor to fire them on readiness, before retry attempts (both startup retry loop and background retries), and when failures exhaust retries.

Daemon processes now always receive PITCHFORK_DAEMON_ID and PITCHFORK_RETRY_COUNT, while hook commands run fire-and-forget in the daemon working directory with the same env plus PITCHFORK_EXIT_CODE for on_fail; configuration/schema/docs are updated accordingly and new tests cover parsing and end-to-end hook/env behavior.

Written by Cursor Bugbot for commit d9da016. This will update automatically on new commits. Configure here.

Add configurable hooks (on_ready, on_fail, on_retry) that fire shell
commands in response to daemon lifecycle events, and inject
PITCHFORK_DAEMON_ID and PITCHFORK_RETRY_COUNT environment variables
into every daemon process.

Closes #242, closes #215

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 21, 2026 15:55
@gemini-code-assist
Copy link

Summary of Changes

Hello @jdx, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the daemon management capabilities by introducing lifecycle hooks and providing more contextual information to daemon processes via environment variables. These additions allow users to define custom actions for critical daemon events like readiness, failure, and retry attempts, improving automation, monitoring, and debugging workflows without interfering with the core daemon operations.

Highlights

  • Daemon Lifecycle Hooks: Introduced configurable lifecycle hooks (on_ready, on_fail, on_retry) that execute shell commands in response to daemon events. These hooks are defined under [daemons.<name>.hooks] in pitchfork.toml.
  • Environment Variable Injection: Injected PITCHFORK_DAEMON_ID and PITCHFORK_RETRY_COUNT environment variables into every daemon process, providing context about the running daemon and its current retry attempt. The on_fail hook also receives PITCHFORK_EXIT_CODE.
  • Fire-and-Forget Hook Execution: Implemented hooks as fire-and-forget background Tokio tasks, ensuring that hook execution does not block the main daemon lifecycle. Errors during hook execution are logged but do not halt daemon operations.
  • Comprehensive Testing: Added extensive unit and end-to-end tests to cover hook configuration parsing, on_ready, on_fail, on_retry hook firing, and the correct injection and usage of the new environment variables.
Changelog
  • docs/public/schema.json
    • Added hooks property to daemon configuration schema.
    • Defined PitchforkTomlHooks schema for on_ready, on_fail, and on_retry commands.
  • src/cli/config/add.rs
    • Initialized hooks field to None when adding a new daemon configuration.
  • src/deps.rs
    • Initialized hooks field to None in test daemon configuration.
  • src/pitchfork_toml.rs
    • Defined PitchforkTomlHooks struct to represent daemon lifecycle hooks.
    • Added hooks field of type Option<PitchforkTomlHooks> to PitchforkTomlDaemon struct.
  • src/supervisor/hooks.rs
    • Added new module to define HookType enum and fire_hook function for executing lifecycle commands.
  • src/supervisor/lifecycle.rs
    • Imported HookType and fire_hook from the new hooks module.
    • Injected PITCHFORK_DAEMON_ID and PITCHFORK_RETRY_COUNT environment variables into daemon processes.
    • Called fire_hook for OnRetry events during daemon retry logic.
    • Called fire_hook for OnReady events upon daemon readiness detection (via stdout, stderr, HTTP, port, or command checks).
    • Called fire_hook for OnFail events when a daemon exits with an error and retries are exhausted, including PITCHFORK_EXIT_CODE.
  • src/supervisor/mod.rs
    • Declared hooks as a new module within the supervisor.
  • src/supervisor/retry.rs
    • Imported HookType and fire_hook from the hooks module.
    • Called fire_hook for OnRetry events before initiating a daemon retry.
  • src/tui/app.rs
    • Initialized hooks field to None in the EditorState's daemon configuration.
  • tests/test_hooks.rs
    • Added new test file containing end-to-end tests for on_ready, on_fail, and on_retry hooks.
    • Included tests to verify PITCHFORK_DAEMON_ID, PITCHFORK_RETRY_COUNT, and PITCHFORK_EXIT_CODE environment variables are correctly passed to daemons and hooks.
  • tests/test_pitchfork_toml.rs
    • Added new tests to verify parsing of daemon configurations with full, partial, and no hooks defined.
Activity
  • The pull request author provided a detailed summary of the changes, including a config example and a comprehensive test plan.
  • The author confirmed that cargo build, cargo fmt --check, and cargo clippy all pass.
  • The author reported that 164 tests pass, including all existing tests and 10 new ones.
  • New unit tests were added for hooks config parsing (with/without/partial hooks).
  • New E2E tests were added for on_ready hook, on_fail (no retry), on_fail (after retries exhausted), on_retry hook, PITCHFORK_DAEMON_ID env var, PITCHFORK_RETRY_COUNT env var, and hook env vars.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature: daemon lifecycle hooks and additional environment variables. The implementation is solid and well-tested. I've identified a few areas for improvement, primarily related to code duplication in the supervisor's lifecycle management, which could be refactored to improve maintainability. There's also a potential performance consideration in how hook configurations are loaded, and a minor point of confusion in one of the new tests. Overall, this is a great addition to the project.

Comment on lines +55 to +59
let pt = PitchforkToml::all_merged();
let hook_cmd = pt
.daemons
.get(&daemon_id)
.and_then(|d| get_hook_cmd(&d.hooks, &hook_type));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Calling PitchforkToml::all_merged() inside tokio::spawn for every hook execution means that all configuration files are read from disk and parsed every time a hook is fired. While this ensures the hook uses the absolute latest configuration, it can be inefficient, especially for frequently triggered hooks like on_retry in a scenario where a service is flapping. This could lead to performance issues due to repeated disk I/O and parsing overhead.

Since using the latest config is an explicit design goal mentioned in the PR, this is a trade-off. However, it's worth considering if there's a middle ground, perhaps involving a cached configuration that's refreshed periodically, to mitigate potential performance bottlenecks.

if let Some(tx) = ready_tx.take() {
let _ = tx.send(Ok(()));
}
fire_hook(HookType::OnReady, id.clone(), daemon_dir.clone(), hook_retry_count, hook_daemon_env.clone(), vec![]);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is significant code duplication for handling the 'ready' state. The logic to notify readiness (including flushing logs, sending on the ready_tx channel, and firing the on_ready hook) is repeated in 6 different places within the select! loop. This makes the code harder to maintain and prone to errors if changes are not applied consistently to all copies.

Consider refactoring this logic into a private helper function within the tokio::spawn block. This function could encapsulate setting ready_notified, sending on ready_tx, and firing the hook.

For example:

async fn notify_ready(
    id: &str,
    ready_notified: &mut bool,
    ready_tx: &mut Option<oneshot::Sender<Result<(), Option<i32>>>>,
    log_appender: &mut BufWriter<tokio::fs::File>,
    // ... other params for fire_hook
) {
    if *ready_notified { return; }
    info!("daemon {} ready", id);
    *ready_notified = true;
    let _ = log_appender.flush().await;
    if let Some(tx) = ready_tx.take() {
        let _ = tx.send(Ok(()));
    }
    fire_hook(HookType::OnReady, /* ... */);
}

You could then call notify_ready(...).await in each of the readiness detection branches.

Comment on lines +578 to +588
// Fire on_fail hook if retries are exhausted
if hook_retry_count >= hook_retry {
fire_hook(
HookType::OnFail,
id.clone(),
daemon_dir.clone(),
hook_retry_count,
hook_daemon_env.clone(),
vec![("PITCHFORK_EXIT_CODE".to_string(), exit_code.to_string())],
);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The logic for handling a daemon's failure is duplicated. This block for a successful wait() but failed exit code is very similar to the block in the final else clause (lines 605-629) which handles a wait() error. Both blocks update the daemon status to Errored and fire the on_fail hook if retries are exhausted.

This duplication can be consolidated into a helper function to improve maintainability. The function could take the exit_code as an argument and perform the necessary state updates and hook firing.

Example refactoring:

async fn handle_failure(id: &str, exit_code: i32, hook_retry_count: u32, hook_retry: Retry, /*...other params*/) {
    let err_status = DaemonStatus::Errored(exit_code);
    if let Err(e) = SUPERVISOR
        .upsert_daemon(UpsertDaemonOpts {
            id: id.to_string(),
            pid: None,
            status: err_status,
            last_exit_success: Some(false),
            ..Default::default()
        })
        .await
    {
        error!("Failed to update daemon state for {id}: {e}");
    }
    // Fire on_fail hook if retries are exhausted
    if hook_retry_count >= hook_retry {
        fire_hook(
            HookType::OnFail,
            id.to_string(),
            // ... params
            vec![("PITCHFORK_EXIT_CODE".to_string(), exit_code.to_string())],
        );
    }
}

You could then call this from both failure paths with the appropriate exit_code.

Comment on lines +122 to +126
let output = env.run_command_with_env(
&["start", "fail_retry_hook"],
&[("PITCHFORK_INTERVAL_SECS", "1")],
);
println!("start stdout: {}", String::from_utf8_lossy(&output.stdout));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This test logic seems a bit confusing. It appears to start the fail_retry_hook daemon twice. The second run_command_with_env call, which sets PITCHFORK_INTERVAL_SECS=1, is likely ineffective. The supervisor process is started by the first run_command call and will not pick up the environment variable from the second call.

The test probably passes because the first start command uses --delay 1, which triggers the wait_ready retry logic in supervisor/lifecycle.rs. This logic has its own backoff schedule and doesn't depend on PITCHFORK_INTERVAL_SECS.

To make this test clearer and more robust, consider either:

  1. Relying only on the wait_ready retry logic and removing the second run_command call.
  2. If testing the background retry mechanism is intended, ensure the supervisor is started with the PITCHFORK_INTERVAL_SECS=1 environment variable from the very beginning.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds configurable daemon lifecycle hooks and injects daemon metadata env vars into both daemon processes and hook commands, enabling external automation around daemon readiness/failure/retries.

Changes:

  • Add [daemons.<name>.hooks] config (on_ready/on_fail/on_retry) and expose it in generated JSON schema.
  • Inject PITCHFORK_DAEMON_ID + PITCHFORK_RETRY_COUNT into daemon processes; pass these (+ PITCHFORK_EXIT_CODE on failure) into hook commands.
  • Fire hooks from supervisor lifecycle + retry paths, and add unit/E2E tests covering parsing and hook/env behavior.

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
tests/test_pitchfork_toml.rs Adds unit tests validating hooks config parsing (none/partial/full).
tests/test_hooks.rs New E2E coverage for on_ready/on_retry/on_fail hooks and injected env vars.
src/tui/app.rs Updates TUI config editing defaults to include hooks: None.
src/supervisor/retry.rs Fires on_retry hook before retry restarts; reuses computed daemon dir.
src/supervisor/mod.rs Registers new hooks supervisor module.
src/supervisor/lifecycle.rs Injects new env vars into daemons; fires hooks on ready/retry/final fail.
src/supervisor/hooks.rs Implements fire-and-forget hook execution with env injection.
src/pitchfork_toml.rs Adds PitchforkTomlHooks + hooks field on daemon config.
src/deps.rs Updates test helper daemon config struct to include hooks: None.
src/cli/config/add.rs Ensures new daemon entries default hooks: None.
docs/public/schema.json Publishes hooks in JSON schema output.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +117 to +123
let output = env.run_command(&["start", "fail_retry_hook", "--delay", "1"]);
println!("start stdout: {}", String::from_utf8_lossy(&output.stdout));

// Background retries happen on 10s interval, so we need to wait
// Use PITCHFORK_INTERVAL_SECS=1 to speed this up
let output = env.run_command_with_env(
&["start", "fail_retry_hook"],
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_hook_on_fail_after_retries starts the same daemon twice (first with --delay 1, then again with PITCHFORK_INTERVAL_SECS=1). The second invocation likely won’t affect the already-running supervisor/daemon, and may just return DaemonAlreadyRunning, making the test flaky or ineffective. Make this a single start call: either (a) remove the first start and run once with run_command_with_env (setting PITCHFORK_INTERVAL_SECS up front), or (b) keep the --delay 1 path and delete the second start entirely.

Suggested change
let output = env.run_command(&["start", "fail_retry_hook", "--delay", "1"]);
println!("start stdout: {}", String::from_utf8_lossy(&output.stdout));
// Background retries happen on 10s interval, so we need to wait
// Use PITCHFORK_INTERVAL_SECS=1 to speed this up
let output = env.run_command_with_env(
&["start", "fail_retry_hook"],
// Background retries happen on 10s interval, so we need to wait
// Use PITCHFORK_INTERVAL_SECS=1 to speed this up
let output = env.run_command_with_env(
&["start", "fail_retry_hook", "--delay", "1"],

Copilot uses AI. Check for mistakes.
Comment on lines +54 to +56
tokio::spawn(async move {
let pt = PitchforkToml::all_merged();
let hook_cmd = pt
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fire_hook calls PitchforkToml::all_merged() inside a tokio::spawn task, and all_merged() does blocking filesystem I/O. This can block Tokio worker threads under load (especially if hooks fire frequently on retries). Consider loading the merged config via tokio::task::spawn_blocking and then continuing with the async hook execution.

Copilot uses AI. Check for mistakes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is ON, but a Cloud Agent failed to start.

…ites

PITCHFORK_DAEMON_ID and PITCHFORK_RETRY_COUNT are now injected after
user-configured env vars in both daemon spawning and hook execution,
preventing user config from silently overwriting them.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings February 21, 2026 16:15
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 16 out of 16 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (1)

src/supervisor/retry.rs:113

  • When a daemon's command fails to parse during a retry attempt or when no run command is found, the daemon is marked as exhausted (retry_count set to retry value) but the on_fail hook is not fired. This means users won't be notified when their daemon permanently fails due to configuration errors during retry.

Consider firing the on_fail hook in these error paths to maintain consistent behavior with other failure scenarios.

                        error!("failed to parse command for daemon {id}: {e}");
                        // Mark as exhausted to prevent infinite retry loop, preserving error status
                        self.upsert_daemon(UpsertDaemonOpts {
                            id,
                            status: daemon.status.clone(),
                            retry_count: Some(daemon.retry),
                            ..Default::default()
                        })
                        .await?;
                        continue;
                    }
                };
                let dir = daemon.dir.unwrap_or_else(|| env::CWD.clone());
                fire_hook(
                    HookType::OnRetry,
                    id.clone(),
                    dir.clone(),
                    daemon.retry_count + 1,
                    daemon.env.clone(),
                    vec![],
                );
                let retry_opts = RunOptions {
                    id: id.clone(),
                    cmd,
                    force: false,
                    shell_pid: daemon.shell_pid,
                    dir,
                    autostop: daemon.autostop,
                    cron_schedule: daemon.cron_schedule,
                    cron_retrigger: daemon.cron_retrigger,
                    retry: daemon.retry,
                    retry_count: daemon.retry_count + 1,
                    ready_delay: daemon.ready_delay,
                    ready_output: daemon.ready_output.clone(),
                    ready_http: daemon.ready_http.clone(),
                    ready_port: daemon.ready_port,
                    ready_cmd: daemon.ready_cmd.clone(),
                    wait_ready: false,
                    depends: daemon.depends.clone(),
                    env: daemon.env.clone(),
                };
                if let Err(e) = self.run(retry_opts).await {
                    error!("failed to retry daemon {id}: {e}");
                }
            } else {
                warn!("no run command found for daemon {id}, cannot retry");
                // Mark as exhausted
                self.upsert_daemon(UpsertDaemonOpts {
                    id,
                    retry_count: Some(daemon.retry),
                    ..Default::default()
                })
                .await?;
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +116 to +126
// Start with retry=2 and ready_delay so it uses wait_ready path
let output = env.run_command(&["start", "fail_retry_hook", "--delay", "1"]);
println!("start stdout: {}", String::from_utf8_lossy(&output.stdout));

// Background retries happen on 10s interval, so we need to wait
// Use PITCHFORK_INTERVAL_SECS=1 to speed this up
let output = env.run_command_with_env(
&["start", "fail_retry_hook"],
&[("PITCHFORK_INTERVAL_SECS", "1")],
);
println!("start stdout: {}", String::from_utf8_lossy(&output.stdout));
Copy link

Copilot AI Feb 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test has confusing logic that makes it difficult to understand what behavior is being tested. The test starts the same daemon twice (lines 117 and 122-125), which seems incorrect.

The comment on line 120-121 suggests testing "background retries" but the first start command uses --delay 1 which triggers foreground retries (via the wait_ready path in lifecycle.rs lines 80-121). The purpose of the second start command is unclear - if the first command exhausted all retries, the second start would reset retry_count to 0.

Consider either:

  1. Removing the first start command and only using the second one (to test background retries), OR
  2. Removing the second start command if the first one is sufficient, OR
  3. Adding clarifying comments explaining why both start commands are needed

Copilot uses AI. Check for mistakes.
@jdx jdx merged commit ec96144 into main Feb 21, 2026
9 checks passed
@jdx jdx deleted the feat/daemon-hooks-env-vars branch February 21, 2026 16:30
@jdx jdx mentioned this pull request Feb 21, 2026
jdx added a commit that referenced this pull request Feb 21, 2026
## 🤖 New release

* `pitchfork-cli`: 1.5.0 -> 1.6.0

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [1.6.0](v1.5.0...v1.6.0) -
2026-02-21

### Added

- *(web)* add PITCHFORK_WEB_PATH support for reverse proxy path prefixes
([#244](#244))
- add daemon lifecycle hooks and retry count env vars
([#245](#245))

### Fixed

- pass cwd to ready_cmd spawning
([#243](#243))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Release-only version/documentation updates; no functional code changes
are included in this diff.
> 
> **Overview**
> Prepares the `pitchfork-cli` **v1.6.0** release by bumping the
crate/version metadata from `1.5.0` to `1.6.0` (including `Cargo.toml`,
`Cargo.lock`, and generated CLI docs/spec files).
> 
> Updates `CHANGELOG.md` with the new `1.6.0` entry summarizing the
release’s added features and the `ready_cmd` working-directory fix.
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
e00649e. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants