Skip to content

feat: allow Snakemake to discover all required checkpoints in a single evaluation#4125

Open
Hocnonsense wants to merge 7 commits intomainfrom
feat/checkpoint-many
Open

feat: allow Snakemake to discover all required checkpoints in a single evaluation#4125
Hocnonsense wants to merge 7 commits intomainfrom
feat/checkpoint-many

Conversation

@Hocnonsense
Copy link
Copy Markdown
Contributor

@Hocnonsense Hocnonsense commented Mar 20, 2026

will close #439, close #3475, and also see #3562

Problem

Once a checkpoint output is missing, Snakemake immediately re-evaluates the DAG.
Multiple checkpoints in an input function raise exceptions one by one, leading to unnecessary sequential scheduling.

Solution

To allow Snakemake to discover all required checkpoints in a single evaluation, here introduce the with checkpoints: context manager, which defers exception raising until all .get() calls in the block have been evaluated.
Inside the block, incomplete checkpoints return a placeholder CheckpointJob instead of raising immediately.
All missing targets are collected and raised together at the end of the block, so Snakemake can discover and schedule all missing checkpoints and their upstream dependencies, in a single DAG re-evaluation.

def aggregate_input(wildcards):
    with checkpoints:
        # multiple checkpoints of the same rule can be accessed in the same block
        series = {checkpoints.step_a.get(**wildcards, a=a).output[0] for a in range(3)}
        # different checkpoint rules can also be accessed together
        b = checkpoints.step_b.get(**wildcards).output[0]
    # both step_a and step_b are guaranteed complete beyond this point
    ...

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).
  • I, as a human being, have checked each line of code in this pull request

AI-assistance disclosure

I used AI assistance for:

  • Code generation (e.g., when writing an implementation or fixing a bug)
  • Test/benchmark generation
  • Documentation (including examples)
  • Research and understanding

Summary by CodeRabbit

  • New Features

    • Added a with checkpoints: context manager to detect multiple checkpoints together and enable parallel scheduling.
  • Improvements

    • Checkpoint misses inside the context are aggregated instead of raising immediately, reducing unnecessary sequential scheduling.
    • Improved error reporting and stronger type handling around checkpoint evaluation.
  • Documentation

    • Expanded guide with examples and a note to perform file I/O only after the with checkpoints: block completes.
  • Tests

    • Updated tests to precompute checkpoint outputs within the context to avoid repeated evaluations.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 20, 2026

📝 Walkthrough

Walkthrough

Adds a with checkpoints: context manager and related checkpoint proxy/refactor so .get() calls inside the block aggregate missing checkpoint outputs (and upstream deps) and raise a single aggregated IncompleteCheckpointException after exiting, allowing parallel scheduling of required checkpoint jobs.

Changes

Cohort / File(s) Summary
Documentation
docs/snakefiles/rules.rst
Documented with checkpoints: semantics, example usage for batching multiple checkpoint accesses in one block, and advised performing file I/O on checkpoint outputs only after the block completes.
Checkpoint core
src/snakemake/checkpoints.py
Refactored CheckpointsProxy (no longer subclassing Checkpoints): uses a _rules registry and dynamic __getattr__, adds register(), context manager __enter__/__exit__ with thread-local exception cache and nesting, and switches runtime behavior to return CheckpointLater within the context. Added Checkpoint.expect() and new CheckpointLater type that appends IncompleteCheckpointExceptions to the cache and returns CheckpointJob instead of immediately raising.
Exceptions
src/snakemake/exceptions.py
IncompleteCheckpointException now accepts optional extra exceptions and exposes a targetfile property that flattens primary and aggregated targets; InputFunctionException signature changed to require wildcards; MissingInputException.files typed as Sequence[str]; InputOpenException.rule annotated.
Rules / control flow & typing
src/snakemake/rules.py
Strengthened type hints (e.g., apply_input_function(func: Callable), non_derived_items: Optional[...]), added targeted # type: ignore comments, and updated checkpoint handling in expand_params/concretize_iofile to iterate over aggregated exception.targetfile and only return TBDString() when all targets are present; otherwise raise WorkflowError for the first missing target.
Tests / examples
tests/test_checkpoints_many/Snakefile
Updated aggregate() to resolve multiple checkpoint outputs inside a single with checkpoints: block and precompute an outputs mapping to avoid per-iteration .get() calls, reflecting batched checkpoint detection behavior.

Sequence Diagram(s)

sequenceDiagram
    participant Rule
    participant CheckpointsProxy
    participant ThreadCache as "Thread-Local Cache"
    participant CheckpointLater
    participant IncompleteExc as "IncompleteCheckpointException"

    Rule->>CheckpointsProxy: __enter__()
    activate CheckpointsProxy
    CheckpointsProxy->>ThreadCache: init exceptions list & depth

    Rule->>CheckpointsProxy: access checkpoint by name
    CheckpointsProxy-->>Rule: CheckpointLater (when in context)

    Rule->>CheckpointLater: expect(output)
    activate CheckpointLater
    alt output missing
        CheckpointLater->>IncompleteExc: create exception
        IncompleteExc->>ThreadCache: append exception
        CheckpointLater-->>Rule: return CheckpointJob
    else output present
        CheckpointLater-->>Rule: return output/path
    end
    deactivate CheckpointLater

    Rule->>CheckpointsProxy: __exit__(no-exc)
    ThreadCache->>CheckpointsProxy: provide cached exceptions
    alt cached exceptions exist
        CheckpointsProxy->>IncompleteExc: attach extras into first exception
        CheckpointsProxy-->>Rule: raise aggregated IncompleteCheckpointException
    else
        CheckpointsProxy-->>Rule: exit normally
    end
    deactivate CheckpointsProxy
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main feature being introduced: a mechanism for Snakemake to discover all required checkpoints in a single evaluation via the with checkpoints: context manager.
Description check ✅ Passed The PR description adequately covers the problem, solution, and includes all required QC checklist items and AI-assistance disclosure with proper detail.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/checkpoint-many

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/snakefiles/rules.rst`:
- Around line 3133-3137: The example uses Python's builtin open() for Snakemake
file objects (variables like a, b in the loop over series) which bypasses
Snakemake's storage-aware IO; change these calls to use the Snakemake file
object's .open() method (e.g., a.open(), b.open()) so reads use IOFile.open()
and work with remote/non-shared storage, keeping the same read/strip/max logic
for val2 and val.

In `@src/snakemake/checkpoints.py`:
- Around line 59-83: The current __enter__/__exit__ replaces self._local.cache
causing nested checkpoint blocks to lose the outer cache; change the semantics
so __enter__ pushes a new list onto a stack (or wraps the existing cache in a
list-of-lists) instead of overwriting, and __exit__ pops the inner cache and, if
a parent cache exists, extend/merge the popped cache into the parent rather than
discarding it; only when popping the outermost cache and exc_type is None should
you combine exceptions (set e.extra = _e) and raise the
IncompleteCheckpointException. Ensure you update uses of self._local.cache in
checkpoints.rule*.get paths to work with the stack shape and restore the
previous cache on all exit paths.

In `@src/snakemake/exceptions.py`:
- Around line 595-601: The property targetfile currently returns a list,
breaking callers that expect a single path; change the property targetfile on
the exception class to return the singular path (self._targetfile) and add a new
property named targetfiles that returns the aggregated list (including extra
entries) so batched logic can use it; then update the batched input-expansion
branch in Rule.expand_params() to reference exception.targetfiles instead of
exception.targetfile while leaving single-path checks using exception.targetfile
intact.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5b8cf63d-429c-45a3-818e-e0f8cb97ec6e

📥 Commits

Reviewing files that changed from the base of the PR and between 2432a6c and 2f2a8be.

📒 Files selected for processing (5)
  • docs/snakefiles/rules.rst
  • src/snakemake/checkpoints.py
  • src/snakemake/exceptions.py
  • src/snakemake/rules.py
  • tests/test_checkpoints_many/Snakefile

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/snakemake/checkpoints.py`:
- Around line 78-86: The __exit__ method references an undefined local variable
cache; change the logic to read the stored exceptions from self._local.cache
(e.g., cache = self._local.cache), check that cache is truthy when exc_type is
None, and then aggregate/raise as before (set e.extra = _e and raise e). Ensure
you read self._local.cache before you clear/reset it and still decrement/reset
self._local.depth correctly in the __exit__ implementation so the exception
aggregation using e.extra uses the stored cache instead of an undefined name.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6145fc43-d734-4d66-9de5-b3a81bfdecfd

📥 Commits

Reviewing files that changed from the base of the PR and between 2f2a8be and 4236ede.

📒 Files selected for processing (3)
  • docs/snakefiles/rules.rst
  • src/snakemake/checkpoints.py
  • src/snakemake/rules.py
🚧 Files skipped from review as they are similar to previous changes (2)
  • docs/snakefiles/rules.rst
  • src/snakemake/rules.py

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/snakemake/checkpoints.py (3)

76-76: Consider using English for code comments.

The comment "不再需要 as cp" (meaning "no longer need as cp") should be in English for consistency with the rest of the codebase.

💬 Suggested change
-        return self  # 不再需要 as cp
+        return self  # no longer need `as cp`
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/checkpoints.py` at line 76, Replace the Chinese inline comment
after the "return self" statement in src/snakemake/checkpoints.py with an
English comment for consistency; for example change "不再需要 as cp" to "no longer
need 'as cp'" (locate the comment adjacent to the return self statement in the
relevant method/class in checkpoints.py).

117-118: Type annotation NoReturn | CheckpointJob is misleading.

As flagged by static analysis, NoReturn | T is equivalent to T. The NoReturn component has no effect on the type. This method always raises, so for the base class NoReturn alone would be accurate, but since CheckpointLater overrides this to return a value, the annotation should simply be CheckpointJob.

🔧 Suggested fix
-    def expect(self, output) -> "NoReturn | CheckpointJob":
+    def expect(self, output) -> "CheckpointJob":
         raise IncompleteCheckpointException(self.rule, checkpoint_target(output[0]))
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/checkpoints.py` around lines 117 - 118, The return type on
expect is incorrect: replace the misleading union NoReturn | CheckpointJob with
just CheckpointJob on the expect method so it correctly reflects that subclasses
(e.g., CheckpointLater) may return a CheckpointJob; update the signature of
expect(self, output) -> CheckpointJob and ensure any necessary imports or
forward references remain valid (the method still raises
IncompleteCheckpointException here, but the declared type must be
CheckpointJob).

83-83: Redundant assignment.

self._local.depth = 0 is redundant since depth is already 0 at this point (decremented at line 79, and we only enter this block when it equals 0).

🧹 Suggested removal
         if self._local.depth == 0:
             cache: "list[IncompleteCheckpointException]" = self._local.cache
             self._local.cache = None
-            self._local.depth = 0
             if exc_type is None and cache:
                 e, *_e = cache
                 e.extra = _e
                 raise e
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/snakemake/checkpoints.py` at line 83, Remove the redundant reset of
self._local.depth to 0: since _local.depth is decremented earlier (the decrement
at the block that sets depth to 0) and this branch is only entered when it is
already zero, delete the assignment "self._local.depth = 0" to avoid pointless
code; update any nearby comment if present in the same method that references
this reset (look for the decrement and the branch using self._local.depth in the
same function/method).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/snakemake/checkpoints.py`:
- Line 76: Replace the Chinese inline comment after the "return self" statement
in src/snakemake/checkpoints.py with an English comment for consistency; for
example change "不再需要 as cp" to "no longer need 'as cp'" (locate the comment
adjacent to the return self statement in the relevant method/class in
checkpoints.py).
- Around line 117-118: The return type on expect is incorrect: replace the
misleading union NoReturn | CheckpointJob with just CheckpointJob on the expect
method so it correctly reflects that subclasses (e.g., CheckpointLater) may
return a CheckpointJob; update the signature of expect(self, output) ->
CheckpointJob and ensure any necessary imports or forward references remain
valid (the method still raises IncompleteCheckpointException here, but the
declared type must be CheckpointJob).
- Line 83: Remove the redundant reset of self._local.depth to 0: since
_local.depth is decremented earlier (the decrement at the block that sets depth
to 0) and this branch is only entered when it is already zero, delete the
assignment "self._local.depth = 0" to avoid pointless code; update any nearby
comment if present in the same method that references this reset (look for the
decrement and the branch using self._local.depth in the same function/method).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: da71daae-d26e-4799-b7f8-c34fb77871c2

📥 Commits

Reviewing files that changed from the base of the PR and between 4236ede and 7df256a.

📒 Files selected for processing (1)
  • src/snakemake/checkpoints.py

@Hocnonsense Hocnonsense force-pushed the feat/checkpoint-many branch from 3896e97 to 0c3bc0e Compare March 20, 2026 15:32
@Hocnonsense Hocnonsense force-pushed the feat/checkpoint-many branch from 0c3bc0e to 13b3287 Compare March 20, 2026 15:34
@github-actions
Copy link
Copy Markdown
Contributor

Please format your code with pixi run format

@Hocnonsense Hocnonsense self-assigned this Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Checkpoint "get" still only resolvable once per aggregate function Checkpoints that can be done in parallel are evaluated and executed sequentially

1 participant