fix: fix checkpoint handling corner cases (#3870 and #3559) by Hocnonsense · Pull Request #4015 · snakemake/snakemake

Hocnonsense · 2026-03-09T16:19:26Z

will fix #3870 and fix #3559

QC

The PR contains a test case for the changes or the changes are already covered by an existing test case.
The change does neither modify the language nor the behavior or functionalities of Snakemake.

Summary by CodeRabbit

Bug Fixes
- Improved checkpoint eviction handling so outputs evicted from memory are correctly considered when computing dynamic downstream targets, making checkpoint-driven workflows more reliable under memory pressure.
Tests
- Added a test workflow and unit test covering checkpoints that produce dynamic targets, validating behavior for uncreatable or hard-to-resolve checkpoint inputs.

coderabbitai · 2026-03-09T16:19:59Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Tracks outputs of checkpoint jobs evicted when their inputs are missing by adding DAG._evicted_checkpoint_outputs and propagating those outputs into created_output during checkpoint dependency updates; adds a test exercising a checkpoint with an uncreatable input.

Changes

Cohort / File(s)	Summary
Checkpoint eviction tracking `src/snakemake/dag.py`	Add public attribute `DAG._evicted_checkpoint_outputs: Set`; on `update()` record outputs of checkpoint jobs that raise MissingInputException; merge evicted outputs into `created_output` in `update_checkpoint_dependencies()`.
New uncreatable-checkpoint test `tests/test_uncreatable_checkpoint_input/Snakefile`, `tests/tests.py`	Add a Snakefile reproducing a checkpoint with an uncreatable input and a dynamic consumer; add `test_uncreatable_checkpoint_input()` to invoke that workflow.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

fix: fix bug in checkpoint handling that led to exceptions in case checkpoint output was missing upon rerun #3423: Modifies src/snakemake/dag.py checkpoint output handling and propagation (highly related).
fix: checkpoints inside modules are overwritten #3359: Changes checkpoint-produced output bookkeeping and created_output population (related).

Suggested reviewers

johanneskoester

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Description check	❓ Inconclusive	The PR description follows the template but lacks implementation details explaining how the fix addresses the reported issues.	Add a brief explanation of how the changes in dag.py fix the uncreatable checkpoint input issue, particularly how _evicted_checkpoint_outputs addresses the silent failure described in issue `#3870`.

✅ Passed checks (3 passed)

Check name	Status	Explanation
Linked Issues check	✅ Passed	The changes in dag.py track evicted checkpoint outputs and propagate them into created_output; the test case reproduces issue `#3870`, validating that checkpoints with uncreatable inputs now work correctly [`#3870`].
Out of Scope Changes check	✅ Passed	All changes are directly related to fixing the checkpoint eviction and uncreatable input issue; the new attribute, logic, and test are all within scope of the stated objectives.
Title check	✅ Passed	The PR title mentions fixing checkpoint handling corner cases and references issues `#3870` and `#3559`, which aligns with the actual changes that address uncreatable checkpoint input issues.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch fix/uncreatable_checkpoint_input

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-03-09T16:20:40Z

Please format your code with pixi run format

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/snakemake/dag.py`:
- Around line 1357-1360: The current logic adds files to
_evicted_checkpoint_outputs just because out.exists(), which can let stale files
satisfy downstream inputs; change the guard so we only record outputs for
checkpoints that have actually completed previously—for example, require that
the output is present in the checkpoint's recorded created_output (or otherwise
verify checkpoint completion via Checkpoint.get()/a checkpoint.completed flag)
before adding to self._evicted_checkpoint_outputs; update the block around
job.is_checkpoint/job.output/out.exists() to check that authoritative completion
evidence (e.g. out in checkpoint.created_output) instead of plain filesystem
existence.

In `@tests/test_uncreatable_checkpoint_input/Snakefile`:
- Around line 1-2: The Snakefile currently mutates the checkpoint output by
opening and writing "a.txt" during parsing (the two-line block that writes
"a\nb\n"), which updates its mtime every Snakemake run; remove that file-writing
from the Snakefile and instead perform the setup in the test harness or an
explicit preparation step (e.g., test setup function or fixture) that
creates/updates "a.txt" before invoking Snakemake so checkpoint behavior is
modeled by external changes rather than by Snakefile evaluation.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b7e94157-d903-4bfa-93e2-7e31f5b40aa6

📥 Commits

Reviewing files that changed from the base of the PR and between b88171c and d399d48.

📒 Files selected for processing (3)

src/snakemake/dag.py
tests/test_uncreatable_checkpoint_input/Snakefile
tests/test_uncreatable_checkpoint_input/expected-results/files/b

src/snakemake/dag.py

tests/test_uncreatable_checkpoint_input/Snakefile

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/tests.py`:
- Around line 1499-1500: Update test_uncreatable_checkpoint_input to reproduce
the rerun regression: perform the first successful run via
run(dpath("test_uncreatable_checkpoint_input")) while preserving the workdir
(use the existing preserve_workdir helper or avoid cleanup), then update the
timestamp on the checkpoint input file with os.utime(<path_to_a.txt>, None), and
then invoke a second dry-run/run (e.g., run(..., dry_run=True) or run(...)
again) and assert that the downstream rule is NOT rescheduled by checking the
run output/return (e.g., assert "Input files updated by another job" not in
output or ensure no downstream job appears in the scheduling output). Ensure you
reference the test function name test_uncreatable_checkpoint_input, use run and
dpath to locate the test tree, and touch a.txt with os.utime.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 473fbcde-86a6-4bc6-b660-aedc8aa9178a

📥 Commits

Reviewing files that changed from the base of the PR and between d399d48 and 16b7444.

📒 Files selected for processing (2)

src/snakemake/dag.py
tests/tests.py

🚧 Files skipped from review as they are similar to previous changes (1)

src/snakemake/dag.py

tests/tests.py

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/snakemake/dag.py`:
- Around line 1357-1363: The async method
self.workflow.persistence.incomplete(job) is being called without awaiting,
causing is_incomplete to be a coroutine and the condition to behave incorrectly;
update the checkpoint handling in the block that checks job.is_checkpoint (which
iterates over job.output and calls out.exists()) to await the incomplete call
(i.e., use await self.workflow.persistence.incomplete(job)) and ensure has_meta
is evaluated before awaiting incomplete if you want to avoid unnecessary awaits;
this will allow the subsequent if not is_incomplete: branch to run and correctly
add outputs to self._evicted_checkpoint_outputs.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e96b5863-729e-458d-b18c-0263147d41cc

📥 Commits

Reviewing files that changed from the base of the PR and between 1b9b43e and dc3dc96.

📒 Files selected for processing (1)

src/snakemake/dag.py

src/snakemake/dag.py

coderabbitai

♻️ Duplicate comments (1)

src/snakemake/dag.py (1)

1357-1363: ⚠️ Potential issue | 🟠 Major

Don’t ignore incomplete markers when metadata is missing.

Line 1361 only checks persistence.incomplete(job) if has_metadata(job) is true. But incomplete(job) already detects existing outputs that still have incomplete markers, so a checkpoint that failed before writing metadata can pass this guard, get added here, and then be promoted into checkpoints.created_output at Lines 2187-2189. That lets Checkpoint.get() resolve against incomplete files instead of raising.

Proposed fix

         if missing_input:
             if job.is_checkpoint:
-                for out in job.output:
-                    if await out.exists():
-                        has_meta = self.workflow.persistence.has_metadata(job)  # type: ignore[reportOptionalMemberAccess]
-                        is_incomplete = has_meta and await self.workflow.persistence.incomplete(job)  # type: ignore[reportOptionalMemberAccess]
-                        if not is_incomplete:
-                            self._evicted_checkpoint_outputs.add(out)
+                is_incomplete = bool(
+                    await self.workflow.persistence.incomplete(job)  # type: ignore[reportOptionalMemberAccess]
+                )
+                if not is_incomplete:
+                    for out in job.output:
+                        if await out.exists():
+                            self._evicted_checkpoint_outputs.add(out)
             self.delete_job(job, recursive=False)  # delete job from tree
             raise MissingInputException(job, missing_input)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/dag.py` around lines 1357 - 1363, The code currently only calls
self.workflow.persistence.incomplete(job) when has_metadata(job) is true, which
allows checkpoint outputs without metadata to be treated as complete; instead
call await self.workflow.persistence.incomplete(job) unconditionally for each
out in job.output (i.e., remove the has_metadata gate and compute is_incomplete
= await self.workflow.persistence.incomplete(job)), and only add outputs to
self._evicted_checkpoint_outputs when that unconditional incomplete check is
false so incomplete-marker files are not promoted into
checkpoints.created_output (this touches job.is_checkpoint handling and uses
self.workflow.persistence.incomplete(job) and self._evicted_checkpoint_outputs).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@src/snakemake/dag.py`:
- Around line 1357-1363: The code currently only calls
self.workflow.persistence.incomplete(job) when has_metadata(job) is true, which
allows checkpoint outputs without metadata to be treated as complete; instead
call await self.workflow.persistence.incomplete(job) unconditionally for each
out in job.output (i.e., remove the has_metadata gate and compute is_incomplete
= await self.workflow.persistence.incomplete(job)), and only add outputs to
self._evicted_checkpoint_outputs when that unconditional incomplete check is
false so incomplete-marker files are not promoted into
checkpoints.created_output (this touches job.is_checkpoint handling and uses
self.workflow.persistence.incomplete(job) and self._evicted_checkpoint_outputs).

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0a958038-8f14-48bd-9c2a-463fc7dd101e

📥 Commits

Reviewing files that changed from the base of the PR and between dc3dc96 and 7519318.

📒 Files selected for processing (1)

src/snakemake/dag.py

src/snakemake/dag.py

Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>

🤖 I have created a release *beep* *boop* --- ## [9.17.0](v9.16.3...v9.17.0) (2026-03-13) ### Features * Allow storing snakemake metadata in files or databases ([#4012](#4012)) ([dd75f31](dd75f31)) * Allow to specify comparison command per-unit test ([#3956](#3956)) ([b88171c](b88171c)) * job table orderd topological when run is started ([#4018](#4018)) ([75cf506](75cf506)) * lambda functions for priority in rules ([#3253](#3253)) ([d2aa226](d2aa226)) * Make on... directive of modules accessible ([#4050](#4050)) ([e9f2e1c](e9f2e1c)) ### Bug Fixes * adjust conda tests to not fail on apple silicon; fix [#4040](#4040) ([#4049](#4049)) ([f5b0142](f5b0142)) * allow "--containerize apptainer" to output apptainer format instead of dockerfile ([#4030](#4030)) ([f5cac30](f5cac30)) * apptainer command not recognized when singularity is absent ([#4010](#4010)) ([b8162e2](b8162e2)) * capture stderr when tests fail ([#3995](#3995)) ([97d74ba](97d74ba)) * **docs:** make Data-dependent conditional execution a complete example ([#4043](#4043)) ([3a1d7f2](3a1d7f2)) * don't build the DAG when running unlock. Fixes [#4000](#4000) and [#198](#198) ([#4007](#4007)) ([acf79fd](acf79fd)) * Ensure pixi tasks may be run as advertised ([#4046](#4046)) ([88253c2](88253c2)) * fix checkpoint handling corner cases ([#3870](#3870) and [#3559](#3559)) ([#4015](#4015)) ([63f4257](63f4257)) * issue 3642 ([#4054](#4054)) ([76e6fc2](76e6fc2)) * issue 3815 ([#4026](#4026)) ([b0eec96](b0eec96)) * logging None in shellcmd context causes error ([#4064](#4064)) ([d0652cd](d0652cd)) * lookup function returns default value for empty DataFrame queries ([#4056](#4056)) ([f71de97](f71de97)) * make `cache: omit-software` a rule specific property ([#4085](#4085)) ([034a9e7](034a9e7)) * reduce number of tests leaving temporary files behind ([#4033](#4033)) ([a3a1c97](a3a1c97)) * regression in dynamic resource handling ([#4038](#4038)) ([f2c554a](f2c554a)) * somewhat shorter announce message ([#4080](#4080)) ([57efc71](57efc71)) ### Performance Improvements * switch reretry with tenacity; decouple container classes (with Python 3.7 compat for old scripts) from rest of the codebase (enabling moving to newer python versions) ([#4032](#4032)) ([ffb19e7](ffb19e7)) ### Documentation * Add AI-assisted contributions policy to contributing guidelines ([#4051](#4051)) ([dd70526](dd70526)) * **codebase:** Update & simplify plugin architecture section ([#4052](#4052)) ([176cf63](176cf63)) * Correct workflow.source_path() description in documentation ([#4036](#4036)) ([45883c5](45883c5)) * fixed wrong code example for collect() function ([#4037](#4037)) ([5c85ed8](5c85ed8)) * Minor docs improvements ([#4089](#4089)) ([29ea226](29ea226)) * switch to sphinx_design for tabs ([#3976](#3976)) ([9674614](9674614)) * typo in the migration table breaking a pip install command ([#4024](#4024)) ([66f9dda](66f9dda)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

fix: uncreatable checkpoint input

d399d48

Hocnonsense requested a review from johanneskoester as a code owner March 9, 2026 16:19

chore

16b7444

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

src/snakemake/dag.py Show resolved Hide resolved

tests/test_uncreatable_checkpoint_input/Snakefile Show resolved Hide resolved

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

tests/tests.py Show resolved Hide resolved

Hocnonsense added 2 commits March 10, 2026 00:43

fix: bugs

1b9b43e

style: format

dc3dc96

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

src/snakemake/dag.py Outdated Show resolved Hide resolved

fix: Missing await on async method call

7519318

coderabbitai bot reviewed Mar 9, 2026

View reviewed changes

fix: Skip mtime check if the job depends on a checkpoint target

042fc49

Hocnonsense added this to Snakemake Hackathon 2026 Mar 9, 2026

Hocnonsense moved this to In review in Snakemake Hackathon 2026 Mar 9, 2026

johanneskoester changed the title ~~fix: uncreatable checkpoint input~~ fix: fix checkpoint handling corner cases (#3870 and #3559) Mar 10, 2026

johanneskoester requested changes Mar 10, 2026

View reviewed changes

src/snakemake/dag.py Show resolved Hide resolved

Hocnonsense and others added 5 commits March 11, 2026 00:17

Apply suggestion from @johanneskoester

49af37f

Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>

Merge branch 'main' into fix/uncreatable_checkpoint_input

e8cdfed

tests: add test for #3559

ba8c81a

fix: skip on windows

b27f0f3

test: fix playform

0b29f8e

Hocnonsense requested a review from johanneskoester March 10, 2026 18:54

johanneskoester approved these changes Mar 10, 2026

View reviewed changes

johanneskoester merged commit 63f4257 into main Mar 10, 2026
59 checks passed

johanneskoester deleted the fix/uncreatable_checkpoint_input branch March 10, 2026 21:21

github-project-automation bot moved this from In review to Done in Snakemake Hackathon 2026 Mar 10, 2026

snakemake-bot mentioned this pull request Mar 10, 2026

chore(main): release 9.17.0 #3977

Merged

Hocnonsense self-assigned this Mar 10, 2026

Hocnonsense added the bug Something isn't working label Mar 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: fix checkpoint handling corner cases (#3870 and #3559)#4015

fix: fix checkpoint handling corner cases (#3870 and #3559)#4015
johanneskoester merged 11 commits intomainfrom
fix/uncreatable_checkpoint_input

Hocnonsense commented Mar 9, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Mar 9, 2026 •

edited

Loading

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Hocnonsense commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

❌ Failed checks (1 warning, 1 inconclusive)

Uh oh!

github-actions bot commented Mar 9, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hocnonsense commented Mar 9, 2026 •

edited

Loading

coderabbitai bot commented Mar 9, 2026 •

edited

Loading