fix: do not rerun when checkpoint job missing but downstream file exists by Hocnonsense · Pull Request #4124 · snakemake/snakemake

Hocnonsense · 2026-03-20T08:37:45Z

will fix #3879. See also the discussion in #4092

QC

The PR contains a test case for the changes or the changes are already covered by an existing test case.
This is not necessary to update the documentation (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).
I, as a human being, have checked each line of code in this pull request

AI-assistance disclosure

I used AI assistance for:

Code generation (e.g., when writing an implementation or fixing a bug)
Test/benchmark generation
Documentation (including examples)
Research and understanding

Summary by CodeRabbit

Bug Fixes
- Checkpoint handling now skips instances whose outputs are missing, preventing failures when checkpoint outputs are deleted and allowing workflows to continue.
Tests
- Added an integration test that verifies reruns succeed when checkpoint outputs are removed after creation.
- Removed an obsolete checkpoint-related test.

coderabbitai · 2026-03-20T08:38:05Z

Caution

Review failed

Failed to post review comments

📝 Walkthrough

Walkthrough

Require checkpoint outputs to both not need rerun and be present before enqueuing instances; add an integration test that deletes checkpoint outputs between runs to verify they are regenerated; remove an outdated checkpoint test.

Changes

DAG checkpoint logic

Layer / File(s)	Summary
Checkpoint enqueue decision `src/snakemake/dag.py`	Added explicit `job_queue: Dict[Job, Set[Job]]` annotation and changed filtering in `update_checkpoint_dependencies` to add checkpoint instances only when `not self.needrun(checkpoint)` AND `await is_output_present(checkpoint)`.

Tests

Layer / File(s)	Summary
New checkpoint integration test `tests/test_checkpoint_missing_output/Snakefile`, `tests/test_checkpoint_missing_output/expected-results/output/selection.txt`	Add a checkpoint-producing workflow and expected `selection.txt` result; aggregation selects the max-valued checkpoint output.
Test harness update `tests/tests.py`	Add `test_checkpoint_missing_output` which runs the workflow, deletes checkpoint outputs between runs, and verifies reruns succeed and regenerate missing outputs.
Removed legacy test `tests/test_checkpoint_missout/Snakefile`	Delete an obsolete checkpoint test workflow that previously covered related behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Test/Runner
    participant DAG as DAG.update_checkpoint_dependencies
    participant Checkpoint as CheckpointJob
    participant FS as Filesystem

    Runner->>DAG: request candidate checkpoint jobs
    DAG->>Checkpoint: for each candidate, evaluate needrun(checkpoint)
    alt needrun == True
        Checkpoint-->>DAG: flagged for rerun (skip enqueue)
    else needrun == False
        DAG->>FS: await is_output_present(checkpoint)
        alt output exists
            FS-->>DAG: present
            DAG-->>Checkpoint: add to job_queue
        else output missing
            FS-->>DAG: missing
            DAG-->>Checkpoint: do not add (will trigger rerun)
        end
    end
    DAG-->>Runner: updated job_queue

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: checkpoint jobs are not rerun when their outputs are missing but downstream files exist, which aligns with the core modification in DAG.update_checkpoint_dependencies.
Linked Issues check	✅ Passed	The code changes directly address issue `#3879` by adding a check for `await is_output_present(checkpoint)` in the checkpoint dependency update logic, ensuring missing checkpoint outputs trigger reruns as required.
Out of Scope Changes check	✅ Passed	All changes are directly scoped to fixing the checkpoint missing output detection issue: DAG logic modification, new test case, expected results, and old test replacement are all aligned with the stated objectives.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix/checkpoint-rerun2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

🧹 Nitpick comments (1)

tests/tests.py (1)

1628-1639: Test validates the fix, but comments may be misleading.

The test correctly verifies that the workflow doesn't crash when checkpoint outputs are deleted between runs, which addresses issue #3879. However, the comments "should not fail (but nothing to do)" are potentially misleading — when checkpoint outputs are deleted, the checkpoint should be re-executed, so there is actually work to do.

Consider verifying that deleted outputs are actually regenerated after rerun to strengthen the test:

💡 Optional: Verify outputs are regenerated

 def test_checkpoint_missing_output():
     """test for issue 3879, also covers 3009"""
     # normal run to create the checkpoint output and final output
     tmpdir = run(dpath("test_checkpoint_missing_output"), cleanup=False)
     assert tmpdir
-    # should not fail (but nothing to do)
+    # Delete checkpoint output; rerun should regenerate it
     (tmpdir / "output" / "test_1.txt").unlink()
     run(dpath("test_checkpoint_missing_output"), cleanup=False, tmpdir=tmpdir)
-    # should not fail (but nothing to do)
+    assert (tmpdir / "output" / "test_1.txt").exists(), "test_1.txt should be regenerated"
+    # Delete another checkpoint output; rerun should regenerate it
     (tmpdir / "output" / "test_0.txt").unlink()
     run(dpath("test_checkpoint_missing_output"), cleanup=False, tmpdir=tmpdir)
+    assert (tmpdir / "output" / "test_0.txt").exists(), "test_0.txt should be regenerated"

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/tests.py` around lines 1628 - 1639, The test_comment is misleading and
the test should assert that deleted checkpoint outputs are regenerated: in
test_checkpoint_missing_output update the comments to reflect that deleting
checkpoint outputs should trigger re-execution, and after each
run(dpath("test_checkpoint_missing_output"), cleanup=False, tmpdir=tmpdir) add
assertions that (tmpdir / "output" / "test_1.txt") and (tmpdir / "output" /
"test_0.txt") respectively exist (or contain expected contents) to verify the
checkpoint was rerun; reference the test function name
test_checkpoint_missing_output, the run(...) helper, and the tmpdir / "output" /
"<file>" paths when adding the checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/tests.py`:
- Around line 1628-1639: The test_comment is misleading and the test should
assert that deleted checkpoint outputs are regenerated: in
test_checkpoint_missing_output update the comments to reflect that deleting
checkpoint outputs should trigger re-execution, and after each
run(dpath("test_checkpoint_missing_output"), cleanup=False, tmpdir=tmpdir) add
assertions that (tmpdir / "output" / "test_1.txt") and (tmpdir / "output" /
"test_0.txt") respectively exist (or contain expected contents) to verify the
checkpoint was rerun; reference the test function name
test_checkpoint_missing_output, the run(...) helper, and the tmpdir / "output" /
"<file>" paths when adding the checks.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 651adaa0-e9b3-444e-b3a3-2039c86d9178

📥 Commits

Reviewing files that changed from the base of the PR and between 2432a6c and c502139.

📒 Files selected for processing (7)

src/snakemake/dag.py
tests/test_checkpoint_missing_output/Snakefile
tests/test_checkpoint_missing_output/expected-results/output/selection.txt
tests/test_checkpoint_missout/Snakefile
tests/test_checkpoint_missout/expected-results/final.txt
tests/test_checkpoint_missout/problem.txt
tests/tests.py

💤 Files with no reviewable changes (1)

tests/test_checkpoint_missout/Snakefile

github-actions · 2026-03-20T08:43:14Z

Please format your code with pixi run format

coderabbitai

🧹 Nitpick comments (1)

tests/tests.py (1)

1628-1639: Test lacks assertions verifying checkpoint outputs are regenerated.

The test validates that the workflow doesn't crash when checkpoint outputs are missing, but doesn't assert that the deleted files (test_1.txt, test_0.txt) were actually regenerated after each rerun. Given the PR objective is to ensure "checkpoint outputs are properly detected and regenerated when deleted," consider adding assertions to verify the files exist after each run.

Additionally, the comments "target file exists so nothing to do" are misleading since the files are being deleted and the checkpoint should be re-run to regenerate them.

💡 Suggested improvement

 def test_checkpoint_missing_output():
     """test for issue 3879, also covers 3009"""
     # normal run to create the checkpoint output and final output
     tmpdir = run(dpath("test_checkpoint_missing_output"), cleanup=False)
     assert tmpdir
-    # should not fail (target file exists so nothing to do)
+    # delete checkpoint output and verify it gets regenerated
     (tmpdir / "output" / "test_1.txt").unlink()
     run(dpath("test_checkpoint_missing_output"), cleanup=False, tmpdir=tmpdir)
-    # should not fail (target file exists so nothing to do)
+    assert (tmpdir / "output" / "test_1.txt").exists(), "test_1.txt should be regenerated"
+    # delete another checkpoint output and verify regeneration
     (tmpdir / "output" / "test_0.txt").unlink()
     run(dpath("test_checkpoint_missing_output"), cleanup=False, tmpdir=tmpdir)
+    assert (tmpdir / "output" / "test_0.txt").exists(), "test_0.txt should be regenerated"
+    shutil.rmtree(tmpdir, ignore_errors=ON_WINDOWS)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@tests/tests.py` around lines 1628 - 1639, The test function
test_checkpoint_missing_output currently deletes checkpoint output files but
doesn't assert they are regenerated; update test_checkpoint_missing_output to
assert existence of (tmpdir / "output" / "test_1.txt") after the first rerun and
assert existence of (tmpdir / "output" / "test_0.txt") after the second rerun
(using the existing run(...) helper and dpath(...) to locate the files), and
revise the misleading comments to say that files are deleted and should be
regenerated rather than "target file exists so nothing to do".

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/tests.py`:
- Around line 1628-1639: The test function test_checkpoint_missing_output
currently deletes checkpoint output files but doesn't assert they are
regenerated; update test_checkpoint_missing_output to assert existence of
(tmpdir / "output" / "test_1.txt") after the first rerun and assert existence of
(tmpdir / "output" / "test_0.txt") after the second rerun (using the existing
run(...) helper and dpath(...) to locate the files), and revise the misleading
comments to say that files are deleted and should be regenerated rather than
"target file exists so nothing to do".

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 41e6491f-054c-4f95-a328-06b7a604543d

📥 Commits

Reviewing files that changed from the base of the PR and between c502139 and 49d3451.

📒 Files selected for processing (1)

tests/tests.py

themavik

Gating checkpoint enqueue on is_output_present fixes the stale-node case. nit: confirm dropping test_checkpoint_missout does not leave 3009/3879 uncovered.

🤖 I have created a release *beep* *boop* --- ## [9.21.1](v9.21.0...v9.21.1) (2026-05-29) ### Bug Fixes * add default json function to benchmarks ([#4128](#4128)) ([41fab22](41fab22)) * do not rerun when checkpoint job missing but downstream file exists ([#4124](#4124)) ([a060b93](a060b93)) * ensure that error logs contain all available details ([#4183](#4183)) ([74a86e9](74a86e9)) * handle missing pss attribute in benchmark on Windows ([#4160](#4160)) ([da52080](da52080)) * implement Resources.setdefault ([#3968](#3968)) ([2413e99](2413e99)) * reporting remote nodes number ([#3978](#3978)) ([8c534f0](8c534f0)) * resolve pathvars before constructing storage queries ([#3969](#3969)) ([bd15237](bd15237)) * storage temp() file cleanup with RemoteProvider ([#4189](#4189)) ([898bad1](898bad1)) * tolerate FileNotFoundError in drop_iocache ([#4153](#4153)) ([#4191](#4191)) ([ce26b28](ce26b28)) ### Documentation * Added guide on debugging workflows ([#4029](#4029)) ([3d052ae](3d052ae)) * **cli:** Remove broken ref bold markup ([#4204](#4204)) ([1200ebf](1200ebf)) * remove duplicated resources attribute in rules.rst ([#4190](#4190)) ([6c8ecdd](6c8ecdd)) * **rules:** Update script type hint advice ([#4193](#4193)) ([6108712](6108712)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

tedil and others added 4 commits March 13, 2026 15:41

fix: consider checkpoint jobs during update_needrun

0f54a61

add missing Snakefile

3e7ba50

Merge branch 'main' into fix/checkpoint-rerun

cde894b

fix: avoid icorrect mark checkpoint as 'noneedrun'

c502139

Hocnonsense requested a review from johanneskoester as a code owner March 20, 2026 08:37

Hocnonsense changed the title ~~fix:~~ fix: do not rerun when checkpoint job missing but downstream file exists Mar 20, 2026

coderabbitai Bot reviewed Mar 20, 2026

View reviewed changes

test: declare nothing to do

49d3451

coderabbitai Bot reviewed Mar 20, 2026

View reviewed changes

Hocnonsense self-assigned this Mar 21, 2026

Hocnonsense added this to Snakemake Hackathon 2026 Mar 21, 2026

Hocnonsense moved this to In review in Snakemake Hackathon 2026 Mar 21, 2026

Hocnonsense mentioned this pull request Mar 21, 2026

fix: correctly propagate checkpoint dependencies across DAG expansions #3864

Open

2 tasks

themavik reviewed Mar 23, 2026

View reviewed changes

Hocnonsense added the bug Something isn't working label Mar 27, 2026

Merge branch 'main' into fix/checkpoint-rerun2

42fe3be

johanneskoester reviewed May 29, 2026

View reviewed changes

Comment thread src/snakemake/dag.py

Apply suggestion from @johanneskoester

98de60b

johanneskoester approved these changes May 29, 2026

View reviewed changes

johanneskoester enabled auto-merge (squash) May 29, 2026 13:39

johanneskoester merged commit a060b93 into main May 29, 2026
60 checks passed

johanneskoester deleted the fix/checkpoint-rerun2 branch May 29, 2026 13:55

github-project-automation Bot moved this from In review to Done in Snakemake Hackathon 2026 May 29, 2026

snakemake-bot mentioned this pull request May 29, 2026

chore(main): release 9.21.1 #4200

Merged

johanneskoester mentioned this pull request Jun 1, 2026

fix: consider checkpoint jobs during update_needrun #4092

Closed

2 tasks

coderabbitai Bot mentioned this pull request Jun 1, 2026

perf: avoiding useless input expand during checkpoint update #3982

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: do not rerun when checkpoint job missing but downstream file exists#4124

fix: do not rerun when checkpoint job missing but downstream file exists#4124
johanneskoester merged 7 commits into
mainfrom
fix/checkpoint-rerun2

Hocnonsense commented Mar 20, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Mar 20, 2026 •

edited

Loading

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Uh oh!

github-actions Bot commented Mar 20, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

themavik left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Hocnonsense commented Mar 20, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

AI-assistance disclosure

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Mar 20, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

themavik left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Hocnonsense commented Mar 20, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 20, 2026 •

edited

Loading