feat: Fix keep_local in storage directive and more freedom over remote retrieval behaviour#3410
Conversation
📝 WalkthroughWalkthroughThe changes introduce a new command-line flag to control the retrieval of remote storage files, documented enhancements outlining how to configure remote file retrieval and local retention via storage directives and examples, modifications in the DAG cleanup process to preserve locally flagged files, and updates to storage settings and registry methods to allow flexible handling of the Changes
Sequence Diagram(s)sequenceDiagram
participant U as User
participant CLI as Command-Line Interface
participant API as Snakemake API
participant REG as Storage Registry
participant DAG as DAG Cleanup
U->>CLI: Run snakemake with flags (\n--not-retrieve-storage, etc.)
CLI->>API: Parse arguments (set retrieve_storage accordingly)
API->>REG: Pass storage settings (retrieve, keep_local)
REG->>REG: Register storage provider using new configuration
API->>DAG: Execute workflow including file processing
DAG->>DAG: Cleanup storage objects (skip those marked keep_local)
Possibly related PRs
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
2773ab9 to
7056b8d
Compare
|
Requires snakemake/snakemake-interface-storage-plugins#60 to be merged first as that adds the |
There was a problem hiding this comment.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
tests/tests.py (1)
2113-2157: 💡 Verification agent❓ Verification inconclusive
Comprehensive test coverage for retrieve functionality
The test function properly verifies the behavior of the
retrieveparameter in various scenarios, ensuring that remote files are retrieved or not retrieved as expected.There appears to be a pipeline failure in CI with this test. Let's check if there might be an issue with the test:
The variable
pon line 2153 is assigned but never used. Consider removing the assignment:- p = sp.check_output( + sp.check_output( ["snakemake", "-s", snakefile, "-c1", "retrieve_true.flag"], cwd=tmpdir )
🏁 Script executed:
#!/bin/bash # Check if the error is related to the --not-retrieve-storage flag being unknown cd tests/test_local_and_retrieve snakemake --help | grep retrieveLength of output: 169
Attention: Address CI Environment Issue and Remove Unused Variable Assignment
The test function in
tests/tests.pyoffers comprehensive coverage for the retrieve functionality. However, two issues need addressing:
CI Environment Issue: The executed script indicates that the
snakemakecommand is not found in the CI environment. Please verify that Snakemake is correctly installed and available in the CI PATH. Adjust the CI configuration or installation setup as needed.Unused Variable Assignment: The variable
pon line 2153 is assigned but never used. Please remove the assignment. For example, update the code as follows:- p = sp.check_output( + sp.check_output( ["snakemake", "-s", snakefile, "-c1", "retrieve_true.flag"], cwd=tmpdir )Please fix these issues and verify that the CI pipeline passes.
🧰 Tools
🪛 Ruff (0.8.2)
2153-2153: Local variable
pis assigned to but never usedRemove assignment to unused variable
p(F841)
🪛 GitHub Actions: CI
[error] 2121-2121: subprocess.CalledProcessError: Command '['snakemake', '-s', '/home/runner/work/snakemake/snakemake/tests/test_local_and_retrieve/retrieve.smk', '-c1', 'retrieve_default.flag', '--not-retrieve-storage']' returned non-zero exit status 1.
🧹 Nitpick comments (2)
tests/tests.py (1)
2067-2111: Good test coverage for keep_local functionalityThe test function thoroughly checks various combinations of the
keep_localparameter, validating that files are kept or removed as expected depending on the parameter values and command-line flags.The variable
pon line 2106 is assigned but never used. Consider removing the assignment:- p = sp.check_output( + sp.check_output( ["snakemake", "-s", snakefile, "-c1", "keep_local_true_directive.flag"], cwd=tmpdir, )🧰 Tools
🪛 Ruff (0.8.2)
2106-2106: Local variable
pis assigned to but never usedRemove assignment to unused variable
p(F841)
src/snakemake/storage.py (1)
98-107: Simplify variable assignment using dictionary get() methodThe code correctly extracts
keep_localandretrievefrom settings with fallback values, but there's a more concise way to write this.Consider using Python's dictionary
get()method for a more concise implementation:- keep_local = ( - settings["keep_local"] - if "keep_local" in settings - else self.workflow.storage_settings.keep_storage_local - ) - retrieve = ( - settings["retrieve"] - if "retrieve" in settings - else self.workflow.storage_settings.retrieve_storage - ) + keep_local = settings.get("keep_local", self.workflow.storage_settings.keep_storage_local) + retrieve = settings.get("retrieve", self.workflow.storage_settings.retrieve_storage)🧰 Tools
🪛 Ruff (0.8.2)
99-101: Use
settings.get("keep_local", self.workflow.storage_settings.keep_storage_local)instead of anifblockReplace with
settings.get("keep_local", self.workflow.storage_settings.keep_storage_local)(SIM401)
104-106: Use
settings.get("retrieve", self.workflow.storage_settings.retrieve_storage)instead of anifblockReplace with
settings.get("retrieve", self.workflow.storage_settings.retrieve_storage)(SIM401)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
docs/snakefiles/storage.rst(1 hunks)src/snakemake/cli.py(2 hunks)src/snakemake/dag.py(1 hunks)src/snakemake/settings/types.py(1 hunks)src/snakemake/storage.py(4 hunks)tests/test_local_and_retrieve/keep_local.smk(1 hunks)tests/test_local_and_retrieve/retrieve.smk(1 hunks)tests/tests.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
src/snakemake/settings/types.pysrc/snakemake/dag.pysrc/snakemake/cli.pytests/tests.pysrc/snakemake/storage.py
🪛 Ruff (0.8.2)
tests/tests.py
2106-2106: Local variable p is assigned to but never used
Remove assignment to unused variable p
(F841)
2153-2153: Local variable p is assigned to but never used
Remove assignment to unused variable p
(F841)
src/snakemake/storage.py
99-101: Use settings.get("keep_local", self.workflow.storage_settings.keep_storage_local) instead of an if block
Replace with settings.get("keep_local", self.workflow.storage_settings.keep_storage_local)
(SIM401)
104-106: Use settings.get("retrieve", self.workflow.storage_settings.retrieve_storage) instead of an if block
Replace with settings.get("retrieve", self.workflow.storage_settings.retrieve_storage)
(SIM401)
🪛 GitHub Actions: CI
tests/tests.py
[error] 1960-1960: AssertionError: expected successful execution
[error] 2121-2121: subprocess.CalledProcessError: Command '['snakemake', '-s', '/home/runner/work/snakemake/snakemake/tests/test_local_and_retrieve/retrieve.smk', '-c1', 'retrieve_default.flag', '--not-retrieve-storage']' returned non-zero exit status 1.
tests/test_local_and_retrieve/retrieve.smk
[error] 1-1: TypeError: StorageProviderBase.init() got an unexpected keyword argument 'retrieve'
🔇 Additional comments (20)
src/snakemake/settings/types.py (1)
230-230: New setting for storage retrieval behavior.Adding the
retrieve_storageboolean attribute with a default value ofTruealigns with the PR objective to set the default behavior to always retrieve remote files unless explicitly stated otherwise.src/snakemake/dag.py (1)
452-456: Enhanced local storage retention implementation.The addition of the
not f.should_keep_localcondition properly implements the PR objective to respect thekeep_localflag in storage directives. This ensures that files marked to be kept locally won't be deleted during cleanup operations.src/snakemake/cli.py (2)
1354-1358: The new command-line option enhances storage retrieval controlAdding the
--not-retrieve-storageflag allows users to disable automatic retrieval of remote files, which is a valuable option for performance and bandwidth optimization when files are just being processed remotely.
1956-1956: Good implementation of storage retrieval controlThe code correctly implements the negation of the flag to set
retrieve_storage, maintaining the intuitive behavior where the default is to retrieve remote files unless explicitly told not to.docs/snakefiles/storage.rst (3)
141-154: Clear documentation of remote file behavior and command-line optionsThe documentation clearly explains the default behavior for remote files (download then remove) and how to control this with the new command-line arguments.
155-180: Good examples of storage directive configurationThe examples demonstrate how to configure storage behavior at the directive level, showing both
retrieveandkeep_localoptions.
181-202: Comprehensive examples of rule-level storage configurationThe examples show how to set
retrieveandkeep_localparameters directly in the storage plugin calls within rules, providing flexibility in controlling remote file behavior on a per-rule basis.tests/test_local_and_retrieve/keep_local.smk (3)
1-7: Good storage provider configurationThe file correctly defines an HTTP storage provider and a tagged storage provider with
keep_local=Truefor testing purposes.
8-22: Well-structured test rules for keep_local parameterThe rules test different configurations of the keep_local parameter: default, explicit true, and explicit false. Each rule properly uses the HTTP storage provider with the appropriate parameter.
23-27: Good test for tagged storage providerThis rule properly tests the tagged storage provider that was configured with
keep_local=True, ensuring that the directive-level setting is applied correctly.tests/test_local_and_retrieve/retrieve.smk (5)
1-9: Well-structured storage directivesThe storage directives are properly defined with appropriate parameters for testing retrieval behavior. The first directive sets up a standard HTTP provider with
keep_local=True, while the second creates a named directivehttp_retthat also setsretrieve=False.Note that the pipeline failure indicates this code depends on the PR #60 from snakemake-interface-storage-plugins repository which introduces the
retrieveparameter to theStorageProvider.🧰 Tools
🪛 GitHub Actions: CI
[error] 1-1: TypeError: StorageProviderBase.init() got an unexpected keyword argument 'retrieve'
11-15: Appropriate test case for default retrieval behaviorThis rule tests the default retrieval behavior without explicitly specifying the
retrieveparameter, which is a good baseline test case.
16-20: Explicit retrieve=True test caseGood test case that explicitly sets
retrieve=Trueto ensure this works as expected.
21-25: Testing explicit retrieve=False behaviorThis rule properly tests the ability to override the default retrieval behavior by explicitly setting
retrieve=False.
26-30: Testing retrieval through named storage directiveThis rule effectively tests using a pre-configured storage directive with
retrieve=False, ensuring the directive settings are properly applied.src/snakemake/storage.py (5)
108-114: Successfully updated provider parametersThe parameters have been correctly updated to pass
keep_localandretrieveto the storage provider. This aligns with the PR objective to ensure these settings are respected.
116-130: Well-implemented tagged provider handlingThis is a good addition that ensures tagged storage providers maintain their settings when registered before the untagged provider. The implementation correctly preserves the
keep_localandretrievesettings, addressing one of the PR objectives.
165-167: Updated method signature for improved flexibilityChanging from hard-coded defaults to
Optional[bool] = Noneprovides better flexibility, allowing the method to handle cases where the parameters are not specified. This aligns with the PR objective to allow these settings to be specified in both the storage directive and method calls.
176-178: Consistent parameter update in _storage_object methodThe parameter updates here match those in the
__call__method, maintaining consistency throughout the codebase.
219-221:Details
❓ Verification inconclusive
Consistent parameter update in StorageProviderProxy
The parameter updates here ensure that the StorageProviderProxy class maintains the same interface as the StorageRegistry class, which is important for consistency.
Since this PR depends on another PR in the
snakemake-interface-storage-pluginsrepository that introduces theretrieveparameter, let's verify the status of that dependency:
🏁 Script executed:
#!/bin/bash # Check for the status of PR #60 in snakemake-interface-storage-plugins repository gh pr view 60 --repo snakemake/snakemake-interface-storage-plugins --json state,title,url,mergedAtLength of output: 280
Consistent Parameter Update Verified – Dependency Still Pending
The changes in the
StorageProviderProxyclass (specifically the addition of theretrieveparameter alongsidekeep_local) correctly maintain a consistent interface with theStorageRegistryclass. However, note that the dependent PR (#60) in thesnakemake-interface-storage-pluginsrepository—which introduces theretrieveparameter—is still open. Please verify once that dependency is merged to ensure continued interface alignment.
e10ffd9 to
2db7761
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/tests.py (2)
2081-2125: Thorough test coverage for keep_local functionality.The test verifies various scenarios for the
keep_localoption, including:
- Default behavior (files not kept)
- Explicit
keep_local=false- Using command-line flag
--keep-storage-local-copies- Explicit
keep_local=truein function call- Setting the option at directive level
The test checks for the presence/absence of local files after each operation, ensuring the feature works as expected.
The subprocess output variable
pon lines 2111 and 2120 is assigned but never used. Consider removing the assignment:- p = sp.check_output( + sp.check_output(🧰 Tools
🪛 Ruff (0.8.2)
2120-2120: Local variable
pis assigned to but never usedRemove assignment to unused variable
p(F841)
2127-2171: Comprehensive test coverage for retrieve functionality.The test thoroughly checks multiple scenarios for the
retrieveoption:
- Default behavior with
--not-retrieve-storageflag- Explicit
retrieve=falsein function call- Setting the option at directive level
- Default behavior (retrieve=true) with
--force- Explicit
retrieve=trueThe test verifies file presence/absence after each operation, providing good coverage of the feature.
The subprocess output variable
pon line 2167 is assigned but never used. Consider removing the assignment:- p = sp.check_output( + sp.check_output(🧰 Tools
🪛 Ruff (0.8.2)
2167-2167: Local variable
pis assigned to but never usedRemove assignment to unused variable
p(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
docs/snakefiles/storage.rst(1 hunks)src/snakemake/cli.py(2 hunks)src/snakemake/dag.py(1 hunks)src/snakemake/settings/types.py(1 hunks)src/snakemake/storage.py(4 hunks)tests/test_local_and_retrieve/keep_local.smk(1 hunks)tests/test_local_and_retrieve/retrieve.smk(1 hunks)tests/tests.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- src/snakemake/settings/types.py
- src/snakemake/dag.py
- tests/test_local_and_retrieve/keep_local.smk
- src/snakemake/storage.py
- tests/test_local_and_retrieve/retrieve.smk
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
src/snakemake/cli.pytests/tests.py
🪛 Ruff (0.8.2)
tests/tests.py
2120-2120: Local variable p is assigned to but never used
Remove assignment to unused variable p
(F841)
2167-2167: Local variable p is assigned to but never used
Remove assignment to unused variable p
(F841)
🔇 Additional comments (3)
src/snakemake/cli.py (2)
1354-1358: New command-line flag for remote storage retrieval behavior.The new
--not-retrieve-storageflag allows users to prevent downloading remote files, providing better control over storage behavior.
1956-1956: Correctly integrates the flag into storage settings.The flag is properly negated here (
not args.not_retrieve_storage) to match the parameter name (retrieve_storage), which ensures the expected behavior - when the flag is set, retrieval is disabled.docs/snakefiles/storage.rst (1)
141-203: Comprehensive documentation for the new storage behavior options.The documentation clearly explains the default behavior and how to customize it, with well-structured examples showing three different ways to control file retrieval and local retention:
- Command-line arguments
- Storage directive level settings
- Rule-specific settings within storage function calls
The examples for both remote and local files are helpful for understanding the practical usage.
2db7761 to
ec27d97
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/tests.py (2)
2156-2200: Comprehensive tests for the keep_local functionality.The tests cover all relevant scenarios: default behavior, explicit setting, command-line flag, and directive-level settings.
There's an unused variable assignment at line 2195. You could remove the
p =part since the return value isn't used:- p = sp.check_output( + sp.check_output(🧰 Tools
🪛 Ruff (0.8.2)
2195-2195: Local variable
pis assigned to but never usedRemove assignment to unused variable
p(F841)
2202-2246: Good test coverage for the retrieve functionality.The test cases appropriately verify the retrieval behavior under different configurations, confirming that files are only retrieved when expected.
Similar to the previous test, there's an unused variable assignment at line 2242:
- p = sp.check_output( + sp.check_output(🧰 Tools
🪛 Ruff (0.8.2)
2242-2242: Local variable
pis assigned to but never usedRemove assignment to unused variable
p(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
docs/snakefiles/storage.rst(1 hunks)src/snakemake/cli.py(2 hunks)src/snakemake/dag.py(1 hunks)src/snakemake/settings/types.py(1 hunks)src/snakemake/storage.py(4 hunks)tests/test_local_and_retrieve/keep_local.smk(1 hunks)tests/test_local_and_retrieve/retrieve.smk(1 hunks)tests/tests.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- src/snakemake/settings/types.py
- tests/test_local_and_retrieve/retrieve.smk
- tests/test_local_and_retrieve/keep_local.smk
- src/snakemake/storage.py
- src/snakemake/dag.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
tests/tests.pysrc/snakemake/cli.py
🪛 Ruff (0.8.2)
tests/tests.py
2195-2195: Local variable p is assigned to but never used
Remove assignment to unused variable p
(F841)
2242-2242: Local variable p is assigned to but never used
Remove assignment to unused variable p
(F841)
🔇 Additional comments (5)
src/snakemake/cli.py (2)
1371-1375: New flag to control storage retrieval looks good.The addition of the
--not-retrieve-storageflag follows the typical Snakemake pattern for flags that disable default behavior. The help text is clear and informative.
1973-1973: Implementation of the new flag in the API is correct.The code properly negates the flag value to set the
retrieve_storageparameter in theStorageSettingsobject.docs/snakefiles/storage.rst (3)
141-154: Good documentation of the new remote file behavior.This section clearly explains the default behavior and how to modify it using command-line arguments. The explanation is concise and accurate.
155-180: Well-documented directive-level configuration examples.The examples showing how to configure retrieval and local storage retention at the directive level are clear and helpful.
181-202: Helpful rule-level configuration examples.This section properly demonstrates how to override the retrieval and local storage settings on a per-rule basis, which is an important use case.
ec27d97 to
da760f7
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/tests.py (2)
2204-2205: Remove unused variable assignmentThe variable
pis assigned but never used.- p = sp.check_output( + sp.check_output(🧰 Tools
🪛 Ruff (0.8.2)
2205-2205: Local variable
pis assigned to but never usedRemove assignment to unused variable
p(F841)
2251-2252: Remove unused variable assignmentThe variable
pis assigned but never used.- p = sp.check_output( + sp.check_output(🧰 Tools
🪛 Ruff (0.8.2)
2252-2252: Local variable
pis assigned to but never usedRemove assignment to unused variable
p(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
docs/snakefiles/storage.rst(1 hunks)src/snakemake/cli.py(2 hunks)src/snakemake/dag.py(1 hunks)src/snakemake/settings/types.py(1 hunks)src/snakemake/storage.py(4 hunks)tests/test_local_and_retrieve/keep_local.smk(1 hunks)tests/test_local_and_retrieve/retrieve.smk(1 hunks)tests/tests.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- src/snakemake/settings/types.py
- docs/snakefiles/storage.rst
- tests/test_local_and_retrieve/keep_local.smk
- tests/test_local_and_retrieve/retrieve.smk
- src/snakemake/dag.py
- src/snakemake/storage.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
src/snakemake/cli.pytests/tests.py
🪛 Ruff (0.8.2)
tests/tests.py
2205-2205: Local variable p is assigned to but never used
Remove assignment to unused variable p
(F841)
2252-2252: Local variable p is assigned to but never used
Remove assignment to unused variable p
(F841)
⏰ Context from checks skipped due to timeout of 90000ms (41)
- GitHub Check: tests (10, windows-latest, py312, bash)
- GitHub Check: tests (10, windows-latest, py311, bash)
- GitHub Check: tests (10, ubuntu-latest, py312, bash)
- GitHub Check: tests (10, ubuntu-latest, py311, bash)
- GitHub Check: tests (9, windows-latest, py312, bash)
- GitHub Check: tests (9, windows-latest, py311, bash)
- GitHub Check: tests (9, ubuntu-latest, py312, bash)
- GitHub Check: tests (9, ubuntu-latest, py311, bash)
- GitHub Check: tests (8, windows-latest, py312, bash)
- GitHub Check: tests (8, windows-latest, py311, bash)
- GitHub Check: tests (8, ubuntu-latest, py312, bash)
- GitHub Check: tests (8, ubuntu-latest, py311, bash)
- GitHub Check: tests (7, windows-latest, py312, bash)
- GitHub Check: tests (7, windows-latest, py311, bash)
- GitHub Check: tests (7, ubuntu-latest, py312, bash)
- GitHub Check: tests (7, ubuntu-latest, py311, bash)
- GitHub Check: tests (6, windows-latest, py312, bash)
- GitHub Check: tests (6, windows-latest, py311, bash)
- GitHub Check: tests (6, ubuntu-latest, py312, bash)
- GitHub Check: tests (6, ubuntu-latest, py311, bash)
- GitHub Check: tests (5, windows-latest, py312, bash)
- GitHub Check: tests (5, windows-latest, py311, bash)
- GitHub Check: tests (5, ubuntu-latest, py312, bash)
- GitHub Check: tests (5, ubuntu-latest, py311, bash)
- GitHub Check: tests (4, windows-latest, py312, bash)
- GitHub Check: tests (4, windows-latest, py311, bash)
- GitHub Check: tests (4, ubuntu-latest, py312, bash)
- GitHub Check: tests (4, ubuntu-latest, py311, bash)
- GitHub Check: tests (3, windows-latest, py312, bash)
- GitHub Check: tests (3, windows-latest, py311, bash)
- GitHub Check: tests (3, ubuntu-latest, py312, bash)
- GitHub Check: tests (3, ubuntu-latest, py311, bash)
- GitHub Check: tests (2, windows-latest, py312, bash)
- GitHub Check: tests (2, windows-latest, py311, bash)
- GitHub Check: tests (2, ubuntu-latest, py312, bash)
- GitHub Check: tests (2, ubuntu-latest, py311, bash)
- GitHub Check: tests (1, windows-latest, py312, bash)
- GitHub Check: tests (1, windows-latest, py311, bash)
- GitHub Check: tests (1, ubuntu-latest, py312, bash)
- GitHub Check: apidocs
- GitHub Check: tests (1, ubuntu-latest, py311, bash)
🔇 Additional comments (4)
src/snakemake/cli.py (2)
1377-1381: Appropriate addition of--not-retrieve-storageflagThis change adds a new command-line flag to control whether remote files should be retrieved, which aligns with the PR objective to enhance the functionality of
keep_localandretrieveoptions in remote storage plugins. The default behavior is to retrieve files unless this flag is specified, which is clearly documented in the help text.
1984-1984: Correctly implemented storage settings integrationThe setting is properly integrated with the
StorageSettingsinitialization, ensuring that the command-line flag's value is correctly passed to the storage system. The negation ofargs.not_retrieve_storageensures that the default behavior (when flag is not used) is to retrieve remote files.tests/tests.py (2)
2166-2210: Well-structured test forkeep_localfunctionalityThe test thoroughly checks various scenarios for the
keep_localoption:
- Default behavior - local files should be cleaned up
- Explicit
keep_local=False- local files should be cleaned up- Default with
--keep-storage-local-copiesflag - local files should be kept- Explicit
keep_local=True- local files should be keptkeep_localin storage directive - local files should be kept in the specified locationThis provides comprehensive coverage of the functionality described in the PR objectives.
🧰 Tools
🪛 Ruff (0.8.2)
2205-2205: Local variable
pis assigned to but never usedRemove assignment to unused variable
p(F841)
2212-2256: Comprehensive test for the new retrieval behaviorThe test covers all important scenarios for the
retrieveoption:
- Default with
--not-retrieve-storageflag - files should not be retrieved- Explicit
retrieve=False- files should not be retrievedretrieve=Falsein directive - files should not be retrieved- Default behavior (without flags) - files should be retrieved
- Explicit
retrieve=True- files should be retrievedThis test ensures that both the command-line flag and the storage directive options work as expected.
🧰 Tools
🪛 Ruff (0.8.2)
2252-2252: Local variable
pis assigned to but never usedRemove assignment to unused variable
p(F841)
da760f7 to
eb6363d
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/tests.py (2)
2181-2225: Good test implementation for keep_local functionality.This test comprehensively verifies the behavior of the
keep_localoption in storage directives by testing multiple scenarios:
- Default behavior (not keeping local copies)
- Explicit
keep_local=falsebehavior- CLI flag
--keep-storage-local-copiesoverriding default behavior- Explicit
keep_local=truebehaviorkeep_localwithin storage directiveThe variable
pis assigned but never used in multiple places. Consider removing the assignment:- p = sp.check_output( + sp.check_output(🧰 Tools
🪛 Ruff (0.8.2)
2220-2220: Local variable
pis assigned to but never usedRemove assignment to unused variable
p(F841)
2227-2270: Well-structured test for the retrieve functionality.This test effectively verifies the behavior of the
retrieveoption across different scenarios:
- Default behavior with
--not-retrieve-storageCLI flag- Explicit
retrieve=falsebehavior- Directive-level
retrieve=falsebehavior- Default behavior with
--force- Explicit
retrieve=truebehaviorThe test checks both positive and negative cases appropriately.
The variable
pis assigned but never used in multiple places. Consider removing the assignment:- p = sp.check_output( + sp.check_output(🧰 Tools
🪛 Ruff (0.8.2)
2267-2267: Local variable
pis assigned to but never usedRemove assignment to unused variable
p(F841)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pyproject.tomlis excluded by!pyproject.toml
📒 Files selected for processing (8)
docs/snakefiles/storage.rst(1 hunks)src/snakemake/cli.py(2 hunks)src/snakemake/dag.py(1 hunks)src/snakemake/settings/types.py(1 hunks)src/snakemake/storage.py(4 hunks)tests/test_local_and_retrieve/keep_local.smk(1 hunks)tests/test_local_and_retrieve/retrieve.smk(1 hunks)tests/tests.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- docs/snakefiles/storage.rst
- src/snakemake/settings/types.py
- src/snakemake/dag.py
- tests/test_local_and_retrieve/keep_local.smk
- tests/test_local_and_retrieve/retrieve.smk
- src/snakemake/storage.py
- src/snakemake/cli.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
tests/tests.py
🪛 Ruff (0.8.2)
tests/tests.py
2220-2220: Local variable p is assigned to but never used
Remove assignment to unused variable p
(F841)
2267-2267: Local variable p is assigned to but never used
Remove assignment to unused variable p
(F841)
⏰ Context from checks skipped due to timeout of 90000ms (40)
- GitHub Check: tests (10, windows-latest, py312)
- GitHub Check: tests (10, windows-latest, py311)
- GitHub Check: tests (10, ubuntu-latest, py312)
- GitHub Check: tests (10, ubuntu-latest, py311)
- GitHub Check: tests (9, windows-latest, py312)
- GitHub Check: tests (9, windows-latest, py311)
- GitHub Check: tests (9, ubuntu-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py311)
- GitHub Check: tests (8, windows-latest, py312)
- GitHub Check: tests (8, windows-latest, py311)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py311)
- GitHub Check: tests (7, windows-latest, py312)
- GitHub Check: tests (7, windows-latest, py311)
- GitHub Check: tests (7, ubuntu-latest, py312)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (6, windows-latest, py312)
- GitHub Check: tests (6, windows-latest, py311)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (5, windows-latest, py312)
- GitHub Check: tests (5, windows-latest, py311)
- GitHub Check: tests (5, ubuntu-latest, py312)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (4, windows-latest, py312)
- GitHub Check: tests (4, windows-latest, py311)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (3, windows-latest, py312)
- GitHub Check: tests (3, windows-latest, py311)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (3, ubuntu-latest, py311)
- GitHub Check: tests (2, windows-latest, py312)
- GitHub Check: tests (2, windows-latest, py311)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (1, windows-latest, py312)
- GitHub Check: tests (1, windows-latest, py311)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py311)
🤖 I have created a release *beep* *boop* --- ## [9.0.0](v8.30.0...v9.0.0) (2025-03-14) ### ⚠ BREAKING CHANGES * Logging refactor & add LoggerPluginInterface ([#3107](#3107)) ### Features * [#3412](#3412) - keep shadow folder of failed job if --keep-incomplete flag is set. ([#3430](#3430)) ([22978c3](22978c3)) * add flag --report-after-run to automatically generate the report after a successfull workflow run ([#3428](#3428)) ([b0a7f03](b0a7f03)) * add flatten function to IO utils ([#3424](#3424)) ([67fa392](67fa392)) * add helper functions to parse input files ([#2918](#2918)) ([63e45a7](63e45a7)) * Add option to print redacted file names ([#3089](#3089)) ([ba4d264](ba4d264)) * add support for validation of polars dataframe and lazyframe ([#3262](#3262)) ([c7473a6](c7473a6)) * added support for rendering dag with mermaid js ([#3409](#3409)) ([7bf8381](7bf8381)) * adding --replace-workflow-config to fully replace workflow configs (from config: directive) with --configfile, instead of merging them ([#3381](#3381)) ([47504a0](47504a0)) * Dynamic module name ([#3401](#3401)) ([024dc32](024dc32)) * Enable saving and reloading IOCache object ([#3386](#3386)) ([c935953](c935953)) * files added in rule params with workflow.source_path will be available in used containers ([#3385](#3385)) ([a6e45bf](a6e45bf)) * Fix keep_local in storage directive and more freedom over remote retrieval behaviour ([#3410](#3410)) ([67b4739](67b4739)) * inherit parameters of use rule and extend/replace individual items them when using 'with' directive ([#3365](#3365)) ([93e4b92](93e4b92)) * Logging refactor & add LoggerPluginInterface ([#3107](#3107)) ([86f1d6e](86f1d6e)) * Maximal file size for checksums ([#3368](#3368)) ([b039f8a](b039f8a)) * Modernize package configuration using Pixi ([#3369](#3369)) ([77992d8](77992d8)) * multiext support for named input/output ([#3372](#3372)) ([05e1378](05e1378)) * optionally auto-group jobs via temp files in case of remote execution ([#3378](#3378)) ([cc9bba2](cc9bba2)) ### Bug Fixes * `--delete-all-output` ignores `--dry-run` ([#3265](#3265)) ([23fef82](23fef82)) * 3342 faster touch runs and warning messages for non-existing files ([#3398](#3398)) ([cd9c3c3](cd9c3c3)) * add default value to max-jobs-per-timespan ([#3043](#3043)) ([2959abe](2959abe)) * checkpoints inside modules are overwritten ([#3359](#3359)) ([fba3ac7](fba3ac7)) * Convert Path to IOFile ([#3405](#3405)) ([c58684c](c58684c)) * Do not perform storage object cleanup with --keep-storage-local-copies set ([#3358](#3358)) ([9a6d14b](9a6d14b)) * edgecases of source deployment in case of remote execution ([#3396](#3396)) ([5da13be](5da13be)) * enhance error message formatting for strict DAG-building mode ([#3376](#3376)) ([a1c39ee](a1c39ee)) * fix bug in checkpoint handling that led to exceptions in case checkpoint output was missing upon rerun ([#3423](#3423)) ([8cf4a2f](8cf4a2f)) * force check all required outputs ([#3341](#3341)) ([495a4e7](495a4e7)) * group job formatting ([#3442](#3442)) ([f0b10a3](f0b10a3)) * in remote jobs, upload storage in topological order such that modification dates are preserved (e.g. in case of group jobs) ([#3377](#3377)) ([eace08f](eace08f)) * only skip eval when resource depends on input ([#3374](#3374)) ([4574c92](4574c92)) * Prevent execution of conda in apptainer when not explicitly requested in software deployment method ([#3388](#3388)) ([c43c5c0](c43c5c0)) * print filenames with quotes around them in RuleException ([#3269](#3269)) ([6baeda5](6baeda5)) * Re-evaluation of free resources ([#3399](#3399)) ([6371293](6371293)) * ReadTheDocs layout issue due to src directory change ([#3419](#3419)) ([695b127](695b127)) * robustly escaping quotes in generated bash scripts (v2) ([#3297](#3297)) ([#3389](#3389)) ([58720bd](58720bd)) * Show apptainer image URL in snakemake report ([#3407](#3407)) ([45f0450](45f0450)) * Update ReadTheDocs configuration for documentation build to use Pixi ([#3433](#3433)) ([3f227a6](3f227a6)) ### Documentation * Add pixi setup instructions to general use tutorial ([#3382](#3382)) ([115e81b](115e81b)) * fix contribution section heading levels, fix docs testing setup order ([#3360](#3360)) ([051dc53](051dc53)) * fix link to github.com/snakemake/poetry-snakemake-plugin ([#3436](#3436)) ([ec6d97c](ec6d97c)) * fix quoting ([#3394](#3394)) ([b40f599](b40f599)) * fix rerun-triggers default ([#3403](#3403)) ([4430e23](4430e23)) * fix typo 'safe' -> 'save' ([#3384](#3384)) ([7755861](7755861)) * mention code formatting in the contribution section ([#3431](#3431)) ([e8682b7](e8682b7)) * remove duplicated 'functions'. ([#3356](#3356)) ([7c595db](7c595db)) * update broken links documentation ([#3437](#3437)) ([e3d0d88](e3d0d88)) * Updating contributing guidelines with new pixi dev setup ([#3415](#3415)) ([8e95a12](8e95a12)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: snakemake-bot <snakemake-bot-admin@googlegroups.com>
…e retrieval behaviour (snakemake#3410) This PR enhances the behaviour of `keep_local` and `retrieve` when using remote storage plugins: * `keep_local` was not always respected when used in a storage directive (see snakemake#2844 e.g.) * This allows `keep_local` to be specified in the storage directive (which overrides the CLI argument) and in the call to `storage` (which overrides the specification, if any, in the storage directive) * The default behaviour for retrieving remote files is to always do so unless `retrieve=False` is used in the call to `storage` * This PR moves the behaviour more inline with `keep_local` i.e. a default behaviour can be specified with a new CLI argument `--not-retrieve-local` (not a very graceful name, open to suggestions!) * The default behaviour is to keep `retrieve=True` as is currently the case in snakemake but it can be overridden in a storage directive * This requires some small changes in `snakemake-interface-storage-plugins` which I will open another PR for This PR also changes the behaviour when defining a tagged storage provider without specifying the default: * If a tagged storage provider is defined but not the default one, then the settings of the tagged provider are not properly respected as the underlying provider is registered later only when the `storage_object` is accessed i.e. it has no access to the tagged settings * This is changed by checking, when a tagged storage provider is registered, if the underlying provider has not yet been registered then it is registered using the same settings as the tagged provider ### QC * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **Documentation** - Expanded documentation to guide users on managing remote files and controlling local file retention through new options. - **New Features** - Introduced a command-line option to disable remote file downloads. - Enhanced the workflow's cleanup process to preserve local files when desired. - Added configurable settings for controlling file retrieval and local storage behavior. - **Tests** - Included new example workflows and automated tests to showcase and verify the updated storage and retrieval functionalities. - Added tests for local storage retention and retrieval behaviors under various conditions. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: Johannes Koester <johannes.koester@uni-due.de>
🤖 I have created a release *beep* *boop* --- ## [9.0.0](snakemake/snakemake@v8.30.0...v9.0.0) (2025-03-14) ### ⚠ BREAKING CHANGES * Logging refactor & add LoggerPluginInterface ([snakemake#3107](snakemake#3107)) ### Features * [snakemake#3412](snakemake#3412) - keep shadow folder of failed job if --keep-incomplete flag is set. ([snakemake#3430](snakemake#3430)) ([22978c3](snakemake@22978c3)) * add flag --report-after-run to automatically generate the report after a successfull workflow run ([snakemake#3428](snakemake#3428)) ([b0a7f03](snakemake@b0a7f03)) * add flatten function to IO utils ([snakemake#3424](snakemake#3424)) ([67fa392](snakemake@67fa392)) * add helper functions to parse input files ([snakemake#2918](snakemake#2918)) ([63e45a7](snakemake@63e45a7)) * Add option to print redacted file names ([snakemake#3089](snakemake#3089)) ([ba4d264](snakemake@ba4d264)) * add support for validation of polars dataframe and lazyframe ([snakemake#3262](snakemake#3262)) ([c7473a6](snakemake@c7473a6)) * added support for rendering dag with mermaid js ([snakemake#3409](snakemake#3409)) ([7bf8381](snakemake@7bf8381)) * adding --replace-workflow-config to fully replace workflow configs (from config: directive) with --configfile, instead of merging them ([snakemake#3381](snakemake#3381)) ([47504a0](snakemake@47504a0)) * Dynamic module name ([snakemake#3401](snakemake#3401)) ([024dc32](snakemake@024dc32)) * Enable saving and reloading IOCache object ([snakemake#3386](snakemake#3386)) ([c935953](snakemake@c935953)) * files added in rule params with workflow.source_path will be available in used containers ([snakemake#3385](snakemake#3385)) ([a6e45bf](snakemake@a6e45bf)) * Fix keep_local in storage directive and more freedom over remote retrieval behaviour ([snakemake#3410](snakemake#3410)) ([67b4739](snakemake@67b4739)) * inherit parameters of use rule and extend/replace individual items them when using 'with' directive ([snakemake#3365](snakemake#3365)) ([93e4b92](snakemake@93e4b92)) * Logging refactor & add LoggerPluginInterface ([snakemake#3107](snakemake#3107)) ([86f1d6e](snakemake@86f1d6e)) * Maximal file size for checksums ([snakemake#3368](snakemake#3368)) ([b039f8a](snakemake@b039f8a)) * Modernize package configuration using Pixi ([snakemake#3369](snakemake#3369)) ([77992d8](snakemake@77992d8)) * multiext support for named input/output ([snakemake#3372](snakemake#3372)) ([05e1378](snakemake@05e1378)) * optionally auto-group jobs via temp files in case of remote execution ([snakemake#3378](snakemake#3378)) ([cc9bba2](snakemake@cc9bba2)) ### Bug Fixes * `--delete-all-output` ignores `--dry-run` ([snakemake#3265](snakemake#3265)) ([23fef82](snakemake@23fef82)) * 3342 faster touch runs and warning messages for non-existing files ([snakemake#3398](snakemake#3398)) ([cd9c3c3](snakemake@cd9c3c3)) * add default value to max-jobs-per-timespan ([snakemake#3043](snakemake#3043)) ([2959abe](snakemake@2959abe)) * checkpoints inside modules are overwritten ([snakemake#3359](snakemake#3359)) ([fba3ac7](snakemake@fba3ac7)) * Convert Path to IOFile ([snakemake#3405](snakemake#3405)) ([c58684c](snakemake@c58684c)) * Do not perform storage object cleanup with --keep-storage-local-copies set ([snakemake#3358](snakemake#3358)) ([9a6d14b](snakemake@9a6d14b)) * edgecases of source deployment in case of remote execution ([snakemake#3396](snakemake#3396)) ([5da13be](snakemake@5da13be)) * enhance error message formatting for strict DAG-building mode ([snakemake#3376](snakemake#3376)) ([a1c39ee](snakemake@a1c39ee)) * fix bug in checkpoint handling that led to exceptions in case checkpoint output was missing upon rerun ([snakemake#3423](snakemake#3423)) ([8cf4a2f](snakemake@8cf4a2f)) * force check all required outputs ([snakemake#3341](snakemake#3341)) ([495a4e7](snakemake@495a4e7)) * group job formatting ([snakemake#3442](snakemake#3442)) ([f0b10a3](snakemake@f0b10a3)) * in remote jobs, upload storage in topological order such that modification dates are preserved (e.g. in case of group jobs) ([snakemake#3377](snakemake#3377)) ([eace08f](snakemake@eace08f)) * only skip eval when resource depends on input ([snakemake#3374](snakemake#3374)) ([4574c92](snakemake@4574c92)) * Prevent execution of conda in apptainer when not explicitly requested in software deployment method ([snakemake#3388](snakemake#3388)) ([c43c5c0](snakemake@c43c5c0)) * print filenames with quotes around them in RuleException ([snakemake#3269](snakemake#3269)) ([6baeda5](snakemake@6baeda5)) * Re-evaluation of free resources ([snakemake#3399](snakemake#3399)) ([6371293](snakemake@6371293)) * ReadTheDocs layout issue due to src directory change ([snakemake#3419](snakemake#3419)) ([695b127](snakemake@695b127)) * robustly escaping quotes in generated bash scripts (v2) ([snakemake#3297](snakemake#3297)) ([snakemake#3389](snakemake#3389)) ([58720bd](snakemake@58720bd)) * Show apptainer image URL in snakemake report ([snakemake#3407](snakemake#3407)) ([45f0450](snakemake@45f0450)) * Update ReadTheDocs configuration for documentation build to use Pixi ([snakemake#3433](snakemake#3433)) ([3f227a6](snakemake@3f227a6)) ### Documentation * Add pixi setup instructions to general use tutorial ([snakemake#3382](snakemake#3382)) ([115e81b](snakemake@115e81b)) * fix contribution section heading levels, fix docs testing setup order ([snakemake#3360](snakemake#3360)) ([051dc53](snakemake@051dc53)) * fix link to github.com/snakemake/poetry-snakemake-plugin ([snakemake#3436](snakemake#3436)) ([ec6d97c](snakemake@ec6d97c)) * fix quoting ([snakemake#3394](snakemake#3394)) ([b40f599](snakemake@b40f599)) * fix rerun-triggers default ([snakemake#3403](snakemake#3403)) ([4430e23](snakemake@4430e23)) * fix typo 'safe' -> 'save' ([snakemake#3384](snakemake#3384)) ([7755861](snakemake@7755861)) * mention code formatting in the contribution section ([snakemake#3431](snakemake#3431)) ([e8682b7](snakemake@e8682b7)) * remove duplicated 'functions'. ([snakemake#3356](snakemake#3356)) ([7c595db](snakemake@7c595db)) * update broken links documentation ([snakemake#3437](snakemake#3437)) ([e3d0d88](snakemake@e3d0d88)) * Updating contributing guidelines with new pixi dev setup ([snakemake#3415](snakemake#3415)) ([8e95a12](snakemake@8e95a12)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: snakemake-bot <snakemake-bot-admin@googlegroups.com>



This PR enhances the behaviour of
keep_localandretrievewhen using remote storage plugins:keep_localwas not always respected when used in a storage directive (seekeep_localdoes not keep remote files locally #2844 e.g.)keep_localto be specified in the storage directive (which overrides the CLI argument) and in the call tostorage(which overrides the specification, if any, in the storage directive)retrieve=Falseis used in the call tostoragekeep_locali.e. a default behaviour can be specified with a new CLI argument--not-retrieve-local(not a very graceful name, open to suggestions!)retrieve=Trueas is currently the case in snakemake but it can be overridden in a storage directivesnakemake-interface-storage-pluginswhich I will open another PR forThis PR also changes the behaviour when defining a tagged storage provider without specifying the default:
storage_objectis accessed i.e. it has no access to the tagged settingsQC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
Documentation
New Features
Tests