fix: omit storage downloads during dryrun in workflows with checkpoints#3100
Conversation
WalkthroughThe changes introduced in this pull request modify the control flow within the Changes
Assessment against linked issues
Possibly related PRs
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
Outside diff range and nitpick comments (1)
snakemake/dag.py (1)
Line range hint
1-1862: Summary of changes and recommendationsThe changes made to the
postprocess_after_updatemethod insnakemake/dag.pysuccessfully implement the PR objective of omitting storage downloads during dry runs in workflows with checkpoints. The modification is minimal and focused, which reduces the risk of introducing bugs.However, to ensure the robustness of this change, I recommend the following:
- Add unit tests specifically for this new behavior, covering various scenarios of dry runs with and without shared input/output settings.
- Update the documentation to reflect this new behavior in dry run scenarios.
- Consider adding a comment in the code explaining the rationale behind this change, which will help future maintainers understand the intent.
To improve the overall architecture and maintainability of the code:
- Consider extracting the condition for storage download into a separate method, e.g.,
should_retrieve_storage_inputs(). This would improve readability and make it easier to test this logic independently.- Evaluate if this change should be reflected in other parts of the DAG implementation or related classes to ensure consistent behavior across the entire workflow.
The implementation is approved, but please address the recommendations to ensure the change is fully integrated and tested within the Snakemake ecosystem.
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- snakemake/dag.py (1 hunks)
Additional context used
Path-based instructions (1)
snakemake/dag.py (1)
Pattern
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
| if not self.workflow.dryrun and ( | ||
| (self.workflow.is_main_process and shared_input_output) | ||
| or self.workflow.remote_exec | ||
| ): |
There was a problem hiding this comment.
💡 Codebase verification
Add Tests to Verify Storage Behavior During Dry Runs
The implementation correctly introduces a condition to prevent storage downloads during dry runs, aligning with the PR objective to omit storage downloads in workflows with checkpoints.
However, the current test suite lacks specific tests that verify this behavior. To ensure the change works as intended and to prevent future regressions, please add tests that cover storage operations during dry runs.
- Location to Add Tests:
tests/tests.py
Analysis chain
Ensure dry run behavior is consistent with PR objectives
The changes introduce a new condition to prevent storage downloads during dry runs. This aligns with the PR objective to "omit storage downloads during dryrun in workflows with checkpoints".
However, there are a few points to consider:
- The condition now includes both dry run and shared input/output checks.
- The indentation suggests that this condition applies to both the main process and remote execution scenarios.
To ensure this change doesn't introduce unintended side effects, please run the following verification:
The implementation looks correct and aligns with the PR objectives. However, please ensure that this change doesn't affect other parts of the workflow that might rely on storage downloads during dry runs.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for any other occurrences of self.workflow.dryrun in the file
# to ensure consistent behavior across the DAG implementation
rg --type python 'self\.workflow\.dryrun' snakemake/dag.py
# Verify if there are any tests covering dry run scenarios with storage
rg --type python 'def test.*dryrun.*storage' tests/
Length of output: 176
Script:
#!/bin/bash
# Search for occurrences of self.workflow.dryrun in Python files within snakemake/dag.py
fd --extension py --type file | grep "^snakemake/dag\.py$" | xargs rg 'self\.workflow\.dryrun'
# Search for tests covering dry run scenarios with storage in the tests/ directory
fd --extension py --type file tests/ | xargs rg 'def test.*dryrun.*storage'
Length of output: 772
Script:
#!/bin/bash
# Search for occurrences of self.workflow.dryrun in snakemake/dag.py
fd --extension py --type file | grep "^snakemake/dag\.py$" | xargs rg 'self\.workflow\.dryrun'
# Search for tests covering dry run scenarios with storage in the tests/ directory using full paths
fd --extension py --type file --full-path 'tests/' | xargs rg 'def test.*dryrun.*storage'
Length of output: 454
Script:
#!/bin/bash
# Search for any tests related to dry run scenarios in the tests/ directory
fd --extension py --type file --full-path 'tests/' | xargs rg 'def test.*dryrun'
Length of output: 173
There was a problem hiding this comment.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- tests/test_storage_noretrieve_dryrun/Snakefile (1 hunks)
- tests/test_storage_noretrieve_dryrun/fs/test.txt (1 hunks)
- tests/tests.py (2 hunks)
Files skipped from review due to trivial changes (1)
- tests/test_storage_noretrieve_dryrun/fs/test.txt
Additional context used
Path-based instructions (1)
tests/tests.py (1)
Pattern
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Additional comments not posted (6)
tests/test_storage_noretrieve_dryrun/Snakefile (4)
2-3: LGTM: Storage configuration is appropriate for testing.The storage configuration using the "fs" provider is suitable for testing filesystem operations in the context of dry runs.
6-8: LGTM: Ruleallcorrectly defines the workflow's final output.The rule
allappropriately uses thestorage()function to specify the final output, which is essential for testing storage behavior during dry runs.
18-24: LGTM: Rulebarcorrectly implements storage operations.The rule
barappropriately uses thestorage()function for both input and output, and the shell command correctly utilizes the{input}and{output}placeholders. This setup is well-suited for testing storage behavior during dry runs.
1-24: Overall assessment: Well-structured test case with a minor issue.This Snakefile provides a well-structured test case for examining storage behavior during dry runs, which aligns perfectly with the PR objectives. The implementation includes appropriate use of storage configurations, checkpoints, and rules. However, there's a minor issue in the
foocheckpoint where the shell command doesn't write to the correct storage location.Once the checkpoint issue is addressed, this test case should effectively validate the changes related to omitting storage downloads during dry runs in workflows with checkpoints.
To ensure that this test case is properly integrated into the test suite, please run the following command:
This will help confirm that the new test case is referenced in the test suite configuration.
Verification successful
Test case is correctly included in the test suite.
The execution of the verification script confirms that
test_storage_noretrieve_dryrunis referenced intests/tests.py, ensuring the test case is properly integrated into the test suite.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Verify that the test case is included in the test suite grep -r "test_storage_noretrieve_dryrun" testsLength of output: 182
tests/tests.py (2)
12-12: New import added for mocking.The
AsyncMockclass has been imported from theunittest.mockmodule. This is appropriate for mocking asynchronous functions or methods.
1471-1474: New test function added to simulate storage retrieval failure during dry run.This test function,
test_storage_noretrieve_dryrun, uses the@patchdecorator to mock theretrieve_from_storagemethod ofsnakemake.io._IOFile. The mock is set to raise an exception, simulating a failure in storage retrieval. This is a good approach to test the behavior of Snakemake during a dry run when storage retrieval fails.A few observations and suggestions:
- The test function name clearly describes its purpose.
- The use of
AsyncMockis appropriate for mocking an asynchronous method.- The test runs Snakemake in dry run mode, which is correct for this scenario.
To ensure that this test is correctly integrated and doesn't interfere with other tests, we can check for any related configurations or setup in the test directory:
Verification successful
Test Integration Verified Successfully
The new test function
test_storage_noretrieve_dryrunis correctly integrated and does not interfere with other tests. No related configurations or modifications were found that could affect the test's behavior.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for any related configurations or setup files echo "Checking for related test configurations:" find . -maxdepth 1 -name "test_storage_noretrieve_dryrun*" echo "Checking for any modifications to test runner or configuration files:" grep -n "test_storage_noretrieve_dryrun" *.pyLength of output: 366
Script:
#!/bin/bash # Searching for related test configurations recursively echo "Searching for related test configurations recursively:" fd "test_storage_noretrieve_dryrun" . -t f --extension py # Searching for any modifications to test runner or configuration files recursively echo "Searching for modifications related to 'test_storage_noretrieve_dryrun' in Python files:" rg -n "test_storage_noretrieve_dryrun" --type pyLength of output: 564
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
|
🤖 I have created a release *beep* *boop* --- ## [8.20.5](v8.20.4...v8.20.5) (2024-09-25) ### Bug Fixes * fixed check for remote conda env pinning and post-deploy files; fixed conda env cleanup ([#3103](#3103)) ([4d0a7e9](4d0a7e9)) * omit storage downloads during dryrun in workflows with checkpoints ([#3100](#3100)) ([151216a](151216a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>



fixes snakemake/snakemake-storage-plugin-s3#29
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation