fix: consider checkpoint jobs during update_needrun by tedil · Pull Request #4092 · snakemake/snakemake

tedil · 2026-03-13T14:43:16Z

Resolves #3879

Not sure this is the correct solution, though (might have unforeseeable side consequences, though I believe this should be fine, may just cause some more I/O overhead for checking the checkpoint's dependencies).

QC

The PR contains a test case for the changes or the changes are already covered by an existing test case.
The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

Bug Fixes
- Improved checkpoint job handling to ensure they are properly considered alongside target jobs during workflow execution evaluation, affecting how missing outputs and updated inputs are handled.
Tests
- Added test to verify that missing checkpoint outputs are correctly regenerated when workflows are re-run.

coderabbitai · 2026-03-13T14:43:32Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 41c7da43-36ea-4bac-a5e8-e8416942e441

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

A single conditional in update_needrun is expanded to include checkpoint jobs alongside target jobs when determining which jobs need to run. A corresponding test verifies that re-running a workflow properly regenerates deleted checkpoint outputs.

Changes

Cohort / File(s)	Summary
DAG needrun logic `src/snakemake/dag.py`	Modified the condition in `update_needrun` to treat checkpoint jobs as target jobs (`elif job in self.targetjobs or job.is_checkpoint`), ensuring missing or modified checkpoint outputs trigger re-execution.
Test coverage `tests/tests.py`	Added `test_checkpoint_missing_output()` to verify that deleting a checkpoint output file causes the checkpoint to re-run and regenerate the missing file on subsequent workflow execution.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 40.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The PR title accurately describes the main change: adding checkpoint job consideration during update_needrun function execution.
Description check	✅ Passed	The description includes linked issue reference and completes all QC checklist items (test case included and documentation status addressed).
Linked Issues check	✅ Passed	The PR directly addresses issue `#3879` by expanding update_needrun to consider checkpoint jobs, which resolves the reported problem of undetected missing checkpoint outputs.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing the checkpoint detection issue: one-line conditional change in dag.py and a corresponding test case.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/checkpoint-rerun

📝 Coding Plan

Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can customize the high-level summary generated by CodeRabbit.

Configure the reviews.high_level_summary_instructions setting to provide custom instructions for generating the high-level summary.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/tests.py`:
- Around line 1632-1648: The test failure is due to a missing test fixture
directory referenced by test_checkpoint_missing_output via
dpath("test_checkpoint_missing_output"); add a new
tests/test_checkpoint_missing_output/ directory to the repo containing the
Snakefile and any input files and expected output files that the test relies on
(ensure the expected output file "output/test_4.txt" is produced by the
Snakefile so the test can remove and verify regeneration), and include any
helper files used by run(...) so the test can execute deterministically.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8e01e412-40bd-4966-8a8e-52f503addfe0

📥 Commits

Reviewing files that changed from the base of the PR and between 034a9e7 and 0f54a61.

📒 Files selected for processing (2)

src/snakemake/dag.py
tests/tests.py

coderabbitai · 2026-03-13T14:48:28Z

+def test_checkpoint_missing_output():
+    """test for issue 3879"""
+    test_dir = dpath("test_checkpoint_missing_output")
+    tmpdir = run(test_dir, cleanup=False, check_results=False)
+
+    try:
+        missing_file = Path(tmpdir) / "output" / "test_4.txt"
+        os.remove(missing_file)
+
+        run(test_dir, tmpdir=tmpdir, cleanup=False, check_results=False)
+
+        assert (
+            missing_file.exists()
+        ), "The missing checkpoint output was not regenerated."
+    finally:
+        shutil.rmtree(tmpdir, ignore_errors=ON_WINDOWS)
+


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Verify the test directory exists with required files echo "=== Checking test directory ===" fd -t d "test_checkpoint_missing_output" tests/ echo "=== Listing contents if directory exists ===" if [ -d "tests/test_checkpoint_missing_output" ]; then tree tests/test_checkpoint_missing_output/ 2>/dev/null || ls -laR tests/test_checkpoint_missing_output/ else echo "ERROR: Test directory tests/test_checkpoint_missing_output/ not found" fi

Repository: snakemake/snakemake

Length of output: 209

Add the test directory tests/test_checkpoint_missing_output/ to the repository.

The test calls dpath("test_checkpoint_missing_output"), which requires tests/test_checkpoint_missing_output/ to exist with a Snakefile and expected outputs. This directory is missing from the PR and must be included for the test to run.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@tests/tests.py` around lines 1632 - 1648, The test failure is due to a missing test fixture directory referenced by test_checkpoint_missing_output via dpath("test_checkpoint_missing_output"); add a new tests/test_checkpoint_missing_output/ directory to the repo containing the Snakefile and any input files and expected output files that the test relies on (ensure the expected output file "output/test_4.txt" is produced by the Snakefile so the test can remove and verify regeneration), and include any helper files used by run(...) so the test can execute deterministically.

cademirch

~~looks like a reasonable solution to me.~~

edit: i re-read the referenced issue. i am unsure about the solution here now. how can we know if (and which) outputs of a checkpoint are missing? i guess we assume that checkpoints are determinstic each run? in any case im confused

actually upon re-reading the issue i'm unsure if a solution is even possible?

tedil · 2026-03-13T14:58:15Z

After discussion w/ @johanneskoester this may turn out to be problematic when a checkpoint generates temporary files. While this fix would be the only way to notice this, this would also force re-runs for checkpoints with temp outputs, even though technically no changes that'd change the outcome have occurred.
Since this may be unexpected behaviour for most users, perhaps we should instead just document that behaviour and add an example that demonstrates this?

johanneskoester

First decision after live discussion: we think that it rather should be documented that intermediate jobs between the checkpoint and the input function can not be triggered for reruns if the checkpoint output files are missing. Avoiding that would mean to cause reruns of the checkpoint and everything downstream if the checkpoint output is missing (e.g. because of being marked as temp). Workaround: use -R <checkpoint_name> in such cases.

Hocnonsense · 2026-03-20T07:59:51Z

            if is_forced(job):
                reason.forced = True
-            elif job in self.targetjobs:
+            elif job in self.targetjobs or job.is_checkpoint:


I don't think this is the accurate place to fix this.
Marking every checkpoint as needrun unconditionally here has a broader effect than intended: it forces re-evaluation of checkpoints whose outputs already exist and are up to date, which can cause unnecessary re-runs in incremental workflows.

This is a similar case to #3009, and the fix should be in update_checkpoint_dependencies.

…sts (#4124) will fix #3879. See also the discussion in #4092 ### QC  * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] This is not necessary to update the documentation (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). * [x] I, as a human being, have checked each line of code in this pull request ### AI-assistance disclosure  I used AI assistance for: * [x] Code generation (e.g., when writing an implementation or fixing a bug) * [ ] Test/benchmark generation * [ ] Documentation (including examples) * [ ] Research and understanding  ## Summary by CodeRabbit * **Bug Fixes** * Checkpoint handling now skips instances whose outputs are missing, preventing failures when previously created checkpoint outputs are deleted and allowing workflows to continue. * **Tests** * Added an integration test verifying reruns succeed when checkpoint outputs are removed after creation. * Removed an obsolete checkpoint-related test.  [![Review Change Stack](https://storage.googleapis.com/coderabbit_public_assets/review-stack-in-coderabbit-ui.svg)](https://app.coderabbit.ai/change-stack/snakemake/snakemake/pull/4124?utm_source=github_walkthrough&utm_medium=github&utm_campaign=change_stack)   --------- Co-authored-by: Till Hartmann <till.hartmann@bih-charite.de> Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>

johanneskoester · 2026-06-01T12:36:52Z

Closing in favor of the merged #4124. Nevertheless, thanks for bringing this up and helping to find the correct solution!

fix: consider checkpoint jobs during update_needrun

0f54a61

tedil requested a review from johanneskoester as a code owner March 13, 2026 14:43

tedil added this to Snakemake Hackathon 2026 Mar 13, 2026

tedil moved this to In review in Snakemake Hackathon 2026 Mar 13, 2026

tedil requested a review from cademirch March 13, 2026 14:43

coderabbitai Bot reviewed Mar 13, 2026

View reviewed changes

cademirch previously approved these changes Mar 13, 2026

View reviewed changes

cademirch self-requested a review March 13, 2026 14:55

tedil marked this pull request as draft March 13, 2026 14:58

johanneskoester requested changes Mar 13, 2026

View reviewed changes

add missing Snakefile

3e7ba50

Hocnonsense requested review from Hocnonsense and removed request for Hocnonsense March 18, 2026 15:32

Hocnonsense reviewed Mar 20, 2026

View reviewed changes

This was referenced Mar 20, 2026

fix: do not rerun when checkpoint job missing but downstream file exists #4124

Merged

fix: correctly propagate checkpoint dependencies across DAG expansions #3864

Open

johanneskoester closed this Jun 1, 2026

github-project-automation Bot moved this from In review to Done in Snakemake Hackathon 2026 Jun 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: consider checkpoint jobs during update_needrun#4092

fix: consider checkpoint jobs during update_needrun#4092
tedil wants to merge 2 commits into
mainfrom
fix/checkpoint-rerun

tedil commented Mar 13, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 13, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 13, 2026

Uh oh!

cademirch left a comment •

edited

Loading

Uh oh!

tedil commented Mar 13, 2026

Uh oh!

johanneskoester left a comment

Uh oh!

Hocnonsense Mar 20, 2026

Uh oh!

johanneskoester commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

tedil commented Mar 13, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 13, 2026

Choose a reason for hiding this comment

Uh oh!

cademirch left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tedil commented Mar 13, 2026

Uh oh!

johanneskoester left a comment

Choose a reason for hiding this comment

Uh oh!

Hocnonsense Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

johanneskoester commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tedil commented Mar 13, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 13, 2026 •

edited

Loading

cademirch left a comment •

edited

Loading