fix: make cache: omit-software a rule specific property#4085
fix: make cache: omit-software a rule specific property#4085johanneskoester merged 11 commits intosnakemake:mainfrom
cache: omit-software a rule specific property#4085Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (2)
📝 WalkthroughWalkthroughRemoved Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Workflow
participant Rule
participant RuleCache
participant OutputFileCache
participant ProvenanceHashMap
User->>Workflow: define workflow (workflow_settings.cache?)
Workflow->>Rule: create rule (ruleinfo.cache)
Rule->>RuleCache: RuleCache.from_rule(rule, cache)
Note over RuleCache,Workflow: assign rule.cache
User->>Workflow: prepare DAG
Workflow->>Rule: mark rule.cache for workflow-wide cache if applicable
Rule->>RuleCache: rule.cache.check()
Note over Rule,RuleCache: validation (outputs, multiext, benchmarks)
Workflow->>Rule: execute job
Rule->>RuleCache: call exists(job) / fetch(job) / store(job)
RuleCache->>OutputFileCache: delegate exists/store/fetch(job)
OutputFileCache->>ProvenanceHashMap: get_provenance_hash(job)
ProvenanceHashMap->>OutputFileCache: return provenance hash
OutputFileCache->>OutputFileCache: perform storage operations (upload/download)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/snakemake/caching/storage.py (1)
56-60:⚠️ Potential issue | 🟡 MinorMissing format argument for
outputfilein error message.The error message contains a
{}placeholder butoutputfileis never passed to format it. This will produce an unhelpful error message that doesn't identify which file is missing.Proposed fix
if check_output_exists and not os.path.exists(outputfile): raise WorkflowError( - "Cannot move output file {} to cache. It does not exist " + f"Cannot move output file {outputfile} to cache. It does not exist " "(maybe it was not created by the job?)." )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/caching/storage.py` around lines 56 - 60, The WorkflowError raised in the check_output_exists branch of move-to-cache logic uses a "{}" placeholder but never supplies outputfile; update the raise in the block that checks "if check_output_exists and not os.path.exists(outputfile)" to include the missing filename (e.g., use an f-string or .format to inject outputfile into the error message) so the error shows which outputfile is missing.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/snakemake/caching/rule.py`:
- Around line 34-49: The code is mistakenly comparing the accumulated flag
variable to the string "hash-omit-storage-content" instead of the token; in the
loop over cache tokens (variable x) update the conditional that reads `elif flag
== "hash-omit-storage-content":` to `elif x == "hash-omit-storage-content":` and
then set `flag |= CacheFlag.omit_storage_content` (consistent with the other
hash-* branch). This change is in the loop handling `cache` tokens where `x`,
`flag`, `CacheFlag`, and `rule` are referenced and will allow the
"hash-omit-storage-content" token to be accepted rather than raising the
WorkflowError.
- Around line 10-12: Add the missing Job import to the TYPE_CHECKING block so
the forward annotations on the Job-typed parameters resolve for linters: update
the imports inside the TYPE_CHECKING section to include "from snakemake.jobs
import Job" (alongside the existing "from typing import Self, Optional, Any" and
"from snakemake.rules import Rule") so the annotations referenced in the
functions/methods at the Job annotation sites (the annotations mentioning Job on
lines showing use in this module) no longer trigger F821.
In `@src/snakemake/executors/dryrun.py`:
- Around line 56-61: The log currently calls RuleCache.exists without the
required job argument, causing a TypeError; update the call site where you use
job.rule.cache.exists() inside the logger.info (and the self.workflow.async_run
wrapper) to pass the current job (i.e., call exists with the job parameter) so
the cache check uses the job context; keep the surrounding async_run usage and
conditional logic unchanged.
In `@src/snakemake/workflow.py`:
- Around line 408-409: The rule validation misses rules promoted to
output-cached after initial checks; after the code path that sets
CacheFlag.output (the logic that promotes a rule to output-cached), re-run the
existing validation by calling RuleCache.check() on the affected rule (i.e.,
invoke rule.cache.check() again for that rule) so multiext/benchmark constraints
are enforced before later code (e.g., AbstractOutputFileCache.get_outputfiles())
runs; update the promotion logic to trigger this revalidation wherever
CacheFlag.output is enabled (references: RuleCache.check, CacheFlag.output,
rule.cache, AbstractOutputFileCache.get_outputfiles).
---
Outside diff comments:
In `@src/snakemake/caching/storage.py`:
- Around line 56-60: The WorkflowError raised in the check_output_exists branch
of move-to-cache logic uses a "{}" placeholder but never supplies outputfile;
update the raise in the block that checks "if check_output_exists and not
os.path.exists(outputfile)" to include the missing filename (e.g., use an
f-string or .format to inject outputfile into the error message) so the error
shows which outputfile is missing.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 7d5dd34f-bf9e-4f92-9e84-dd3f010a47ef
📒 Files selected for processing (9)
src/snakemake/caching/__init__.pysrc/snakemake/caching/hash.pysrc/snakemake/caching/local.pysrc/snakemake/caching/rule.pysrc/snakemake/caching/storage.pysrc/snakemake/executors/dryrun.pysrc/snakemake/executors/local.pysrc/snakemake/rules.pysrc/snakemake/workflow.py
|
Great work!! |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/snakemake/caching/rule.py`:
- Line 86: Replace the truthiness check on the workflow cache with an explicit
None check: change the condition "if not rule.workflow.workflow_settings.cache:"
to "if rule.workflow.workflow_settings.cache is None:" so the code uses an
explicit None test on rule.workflow.workflow_settings.cache and does not treat
empty-but-enabled cache selections as disabled.
- Around line 102-112: Replace the assert guards in exists, store, and fetch
with explicit runtime checks that raise a clear, descriptive exception when the
cache backend (self.rule.workflow.output_file_cache) is missing; locate the
three methods (exists, store, fetch) and before delegating to
self.rule.workflow.output_file_cache, check for None and raise a RuntimeError
(or custom CacheUnavailableError) with a message including the rule identity
(self.rule) and the operation name to aid debugging, then proceed to await the
corresponding exists/store/fetch call.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 38dc645d-1ef2-4b28-a854-c99c231ccaac
📒 Files selected for processing (2)
src/snakemake/caching/rule.pysrc/snakemake/executors/dryrun.py
There was a problem hiding this comment.
🧹 Nitpick comments (1)
src/snakemake/caching/storage.py (1)
45-50: Pre-existing logic issue:exists()only checks the first output file.The method returns immediately after checking the first entry's existence. If a job produces multiple outputs, only the first is verified. This differs from the local cache implementation which checks all files and returns
Trueonly if all exist.♻️ Suggested fix to check all outputs
async def exists(self, job: Job): for entry in self._get_storage_objects(job): try: - return await entry.managed_exists() + if not await entry.managed_exists(): + return False except Exception as e: self.raise_read_error(entry, exception=e) + return True🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@src/snakemake/caching/storage.py` around lines 45 - 50, The exists() method currently returns after checking the first storage object; change it to verify all outputs by iterating every entry from self._get_storage_objects(job), awaiting entry.managed_exists() for each, and only return True if every call returns True (i.e., use a short-circuit that returns False on the first missing object). Preserve error handling: if managed_exists() raises, call self.raise_read_error(entry, exception=e) and then return False (or re-raise if your semantics require), rather than letting the exception leak or stopping after the first entry; update the exists function accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@src/snakemake/caching/storage.py`:
- Around line 45-50: The exists() method currently returns after checking the
first storage object; change it to verify all outputs by iterating every entry
from self._get_storage_objects(job), awaiting entry.managed_exists() for each,
and only return True if every call returns True (i.e., use a short-circuit that
returns False on the first missing object). Preserve error handling: if
managed_exists() raises, call self.raise_read_error(entry, exception=e) and then
return False (or re-raise if your semantics require), rather than letting the
exception leak or stopping after the first entry; update the exists function
accordingly.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: ea55eae1-c0e8-47a2-b599-5af1e025add1
📒 Files selected for processing (1)
src/snakemake/caching/storage.py
🤖 I have created a release *beep* *boop* --- ## [9.17.0](v9.16.3...v9.17.0) (2026-03-13) ### Features * Allow storing snakemake metadata in files or databases ([#4012](#4012)) ([dd75f31](dd75f31)) * Allow to specify comparison command per-unit test ([#3956](#3956)) ([b88171c](b88171c)) * job table orderd topological when run is started ([#4018](#4018)) ([75cf506](75cf506)) * lambda functions for priority in rules ([#3253](#3253)) ([d2aa226](d2aa226)) * Make on... directive of modules accessible ([#4050](#4050)) ([e9f2e1c](e9f2e1c)) ### Bug Fixes * adjust conda tests to not fail on apple silicon; fix [#4040](#4040) ([#4049](#4049)) ([f5b0142](f5b0142)) * allow "--containerize apptainer" to output apptainer format instead of dockerfile ([#4030](#4030)) ([f5cac30](f5cac30)) * apptainer command not recognized when singularity is absent ([#4010](#4010)) ([b8162e2](b8162e2)) * capture stderr when tests fail ([#3995](#3995)) ([97d74ba](97d74ba)) * **docs:** make Data-dependent conditional execution a complete example ([#4043](#4043)) ([3a1d7f2](3a1d7f2)) * don't build the DAG when running unlock. Fixes [#4000](#4000) and [#198](#198) ([#4007](#4007)) ([acf79fd](acf79fd)) * Ensure pixi tasks may be run as advertised ([#4046](#4046)) ([88253c2](88253c2)) * fix checkpoint handling corner cases ([#3870](#3870) and [#3559](#3559)) ([#4015](#4015)) ([63f4257](63f4257)) * issue 3642 ([#4054](#4054)) ([76e6fc2](76e6fc2)) * issue 3815 ([#4026](#4026)) ([b0eec96](b0eec96)) * logging None in shellcmd context causes error ([#4064](#4064)) ([d0652cd](d0652cd)) * lookup function returns default value for empty DataFrame queries ([#4056](#4056)) ([f71de97](f71de97)) * make `cache: omit-software` a rule specific property ([#4085](#4085)) ([034a9e7](034a9e7)) * reduce number of tests leaving temporary files behind ([#4033](#4033)) ([a3a1c97](a3a1c97)) * regression in dynamic resource handling ([#4038](#4038)) ([f2c554a](f2c554a)) * somewhat shorter announce message ([#4080](#4080)) ([57efc71](57efc71)) ### Performance Improvements * switch reretry with tenacity; decouple container classes (with Python 3.7 compat for old scripts) from rest of the codebase (enabling moving to newer python versions) ([#4032](#4032)) ([ffb19e7](ffb19e7)) ### Documentation * Add AI-assisted contributions policy to contributing guidelines ([#4051](#4051)) ([dd70526](dd70526)) * **codebase:** Update & simplify plugin architecture section ([#4052](#4052)) ([176cf63](176cf63)) * Correct workflow.source_path() description in documentation ([#4036](#4036)) ([45883c5](45883c5)) * fixed wrong code example for collect() function ([#4037](#4037)) ([5c85ed8](5c85ed8)) * Minor docs improvements ([#4089](#4089)) ([29ea226](29ea226)) * switch to sphinx_design for tabs ([#3976](#3976)) ([9674614](9674614)) * typo in the migration table breaking a pip install command ([#4024](#4024)) ([66f9dda](66f9dda)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Previously a
cache: omit-softwaredirective meant that all upstream rules were considered to be immutable to the used software deployment and were not accounted for in the between workflow provenance hash.Here we introduce a CacheFlag which lives on the rule, which has the following flags:
The cache directive is then mapped for each rule to this flag:
get_provenance_hash is then using this flag.
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
New Features
Refactor
Bug Fixes