feat: allow Snakemake to discover all required checkpoints in a single evaluation#4125
feat: allow Snakemake to discover all required checkpoints in a single evaluation#4125Hocnonsense wants to merge 7 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughAdds a Changes
Sequence Diagram(s)sequenceDiagram
participant Rule
participant CheckpointsProxy
participant ThreadCache as "Thread-Local Cache"
participant CheckpointLater
participant IncompleteExc as "IncompleteCheckpointException"
Rule->>CheckpointsProxy: __enter__()
activate CheckpointsProxy
CheckpointsProxy->>ThreadCache: init exceptions list & depth
Rule->>CheckpointsProxy: access checkpoint by name
CheckpointsProxy-->>Rule: CheckpointLater (when in context)
Rule->>CheckpointLater: expect(output)
activate CheckpointLater
alt output missing
CheckpointLater->>IncompleteExc: create exception
IncompleteExc->>ThreadCache: append exception
CheckpointLater-->>Rule: return CheckpointJob
else output present
CheckpointLater-->>Rule: return output/path
end
deactivate CheckpointLater
Rule->>CheckpointsProxy: __exit__(no-exc)
ThreadCache->>CheckpointsProxy: provide cached exceptions
alt cached exceptions exist
CheckpointsProxy->>IncompleteExc: attach extras into first exception
CheckpointsProxy-->>Rule: raise aggregated IncompleteCheckpointException
else
CheckpointsProxy-->>Rule: exit normally
end
deactivate CheckpointsProxy
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/snakefiles/rules.rst`:
- Around line 3133-3137: The example uses Python's builtin open() for Snakemake
file objects (variables like a, b in the loop over series) which bypasses
Snakemake's storage-aware IO; change these calls to use the Snakemake file
object's .open() method (e.g., a.open(), b.open()) so reads use IOFile.open()
and work with remote/non-shared storage, keeping the same read/strip/max logic
for val2 and val.
In `@src/snakemake/checkpoints.py`:
- Around line 59-83: The current __enter__/__exit__ replaces self._local.cache
causing nested checkpoint blocks to lose the outer cache; change the semantics
so __enter__ pushes a new list onto a stack (or wraps the existing cache in a
list-of-lists) instead of overwriting, and __exit__ pops the inner cache and, if
a parent cache exists, extend/merge the popped cache into the parent rather than
discarding it; only when popping the outermost cache and exc_type is None should
you combine exceptions (set e.extra = _e) and raise the
IncompleteCheckpointException. Ensure you update uses of self._local.cache in
checkpoints.rule*.get paths to work with the stack shape and restore the
previous cache on all exit paths.
In `@src/snakemake/exceptions.py`:
- Around line 595-601: The property targetfile currently returns a list,
breaking callers that expect a single path; change the property targetfile on
the exception class to return the singular path (self._targetfile) and add a new
property named targetfiles that returns the aggregated list (including extra
entries) so batched logic can use it; then update the batched input-expansion
branch in Rule.expand_params() to reference exception.targetfiles instead of
exception.targetfile while leaving single-path checks using exception.targetfile
intact.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 5b8cf63d-429c-45a3-818e-e0f8cb97ec6e
📒 Files selected for processing (5)
docs/snakefiles/rules.rstsrc/snakemake/checkpoints.pysrc/snakemake/exceptions.pysrc/snakemake/rules.pytests/test_checkpoints_many/Snakefile
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/snakemake/checkpoints.py`:
- Around line 78-86: The __exit__ method references an undefined local variable
cache; change the logic to read the stored exceptions from self._local.cache
(e.g., cache = self._local.cache), check that cache is truthy when exc_type is
None, and then aggregate/raise as before (set e.extra = _e and raise e). Ensure
you read self._local.cache before you clear/reset it and still decrement/reset
self._local.depth correctly in the __exit__ implementation so the exception
aggregation using e.extra uses the stored cache instead of an undefined name.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6145fc43-d734-4d66-9de5-b3a81bfdecfd
📒 Files selected for processing (3)
docs/snakefiles/rules.rstsrc/snakemake/checkpoints.pysrc/snakemake/rules.py
🚧 Files skipped from review as they are similar to previous changes (2)
- docs/snakefiles/rules.rst
- src/snakemake/rules.py
There was a problem hiding this comment.
🧹 Nitpick comments (3)
src/snakemake/checkpoints.py (3)
76-76: Consider using English for code comments.The comment "不再需要 as cp" (meaning "no longer need
as cp") should be in English for consistency with the rest of the codebase.💬 Suggested change
- return self # 不再需要 as cp + return self # no longer need `as cp`🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/checkpoints.py` at line 76, Replace the Chinese inline comment after the "return self" statement in src/snakemake/checkpoints.py with an English comment for consistency; for example change "不再需要 as cp" to "no longer need 'as cp'" (locate the comment adjacent to the return self statement in the relevant method/class in checkpoints.py).
117-118: Type annotationNoReturn | CheckpointJobis misleading.As flagged by static analysis,
NoReturn | Tis equivalent toT. TheNoReturncomponent has no effect on the type. This method always raises, so for the base classNoReturnalone would be accurate, but sinceCheckpointLateroverrides this to return a value, the annotation should simply beCheckpointJob.🔧 Suggested fix
- def expect(self, output) -> "NoReturn | CheckpointJob": + def expect(self, output) -> "CheckpointJob": raise IncompleteCheckpointException(self.rule, checkpoint_target(output[0]))🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/checkpoints.py` around lines 117 - 118, The return type on expect is incorrect: replace the misleading union NoReturn | CheckpointJob with just CheckpointJob on the expect method so it correctly reflects that subclasses (e.g., CheckpointLater) may return a CheckpointJob; update the signature of expect(self, output) -> CheckpointJob and ensure any necessary imports or forward references remain valid (the method still raises IncompleteCheckpointException here, but the declared type must be CheckpointJob).
83-83: Redundant assignment.
self._local.depth = 0is redundant sincedepthis already 0 at this point (decremented at line 79, and we only enter this block when it equals 0).🧹 Suggested removal
if self._local.depth == 0: cache: "list[IncompleteCheckpointException]" = self._local.cache self._local.cache = None - self._local.depth = 0 if exc_type is None and cache: e, *_e = cache e.extra = _e raise e🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/checkpoints.py` at line 83, Remove the redundant reset of self._local.depth to 0: since _local.depth is decremented earlier (the decrement at the block that sets depth to 0) and this branch is only entered when it is already zero, delete the assignment "self._local.depth = 0" to avoid pointless code; update any nearby comment if present in the same method that references this reset (look for the decrement and the branch using self._local.depth in the same function/method).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/snakemake/checkpoints.py`:
- Line 76: Replace the Chinese inline comment after the "return self" statement
in src/snakemake/checkpoints.py with an English comment for consistency; for
example change "不再需要 as cp" to "no longer need 'as cp'" (locate the comment
adjacent to the return self statement in the relevant method/class in
checkpoints.py).
- Around line 117-118: The return type on expect is incorrect: replace the
misleading union NoReturn | CheckpointJob with just CheckpointJob on the expect
method so it correctly reflects that subclasses (e.g., CheckpointLater) may
return a CheckpointJob; update the signature of expect(self, output) ->
CheckpointJob and ensure any necessary imports or forward references remain
valid (the method still raises IncompleteCheckpointException here, but the
declared type must be CheckpointJob).
- Line 83: Remove the redundant reset of self._local.depth to 0: since
_local.depth is decremented earlier (the decrement at the block that sets depth
to 0) and this branch is only entered when it is already zero, delete the
assignment "self._local.depth = 0" to avoid pointless code; update any nearby
comment if present in the same method that references this reset (look for the
decrement and the branch using self._local.depth in the same function/method).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: da71daae-d26e-4799-b7f8-c34fb77871c2
📒 Files selected for processing (1)
src/snakemake/checkpoints.py
3896e97 to
0c3bc0e
Compare
0c3bc0e to
13b3287
Compare
|
Please format your code with |
will close #439, close #3475, and also see #3562
Problem
Once a checkpoint output is missing, Snakemake immediately re-evaluates the DAG.
Multiple checkpoints in an input function raise exceptions one by one, leading to unnecessary sequential scheduling.
Solution
To allow Snakemake to discover all required checkpoints in a single evaluation, here introduce the
with checkpoints:context manager, which defers exception raising until all.get()calls in the block have been evaluated.Inside the block, incomplete checkpoints return a placeholder
CheckpointJobinstead of raising immediately.All missing targets are collected and raised together at the end of the block, so Snakemake can discover and schedule all missing checkpoints and their upstream dependencies, in a single DAG re-evaluation.
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).AI-assistance disclosure
I used AI assistance for:
Summary by CodeRabbit
New Features
Improvements
Documentation
Tests