fix: omit storage downloads during dryrun in workflows with checkpoints by johanneskoester · Pull Request #3100 · snakemake/snakemake

johanneskoester · 2024-09-24T14:46:37Z

fixes snakemake/snakemake-storage-plugin-s3#29

QC

The PR contains a test case for the changes or the changes are already covered by an existing test case.
The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

New Features
- Introduced a new Snakemake workflow for managing file storage operations.
- Added a checkpoint and rules to ensure proper execution flow for file operations.
Bug Fixes
- Enhanced control flow to prevent storage input retrieval during dry runs.
Tests
- Added a new test to validate storage retrieval failure scenarios during dry runs.
Documentation
- New test file and configuration details added for clarity on workflow operations.

coderabbitai · 2024-09-24T14:46:46Z

Walkthrough

The changes introduced in this pull request modify the control flow within the postprocess_after_update asynchronous method in snakemake/dag.py. The update adds a condition to check if the workflow is not in a dry run state before retrieving storage inputs. This adjustment refines when the retrieve_storage_inputs method is invoked, specifically ensuring it only executes during actual runs rather than dry runs. Additionally, new test cases and a Snakemake workflow are introduced to validate this behavior.

Changes

File	Change Summary
snakemake/dag.py	Added a condition in `postprocess_after_update` to check for `not self.workflow.dryrun` before calling `retrieve_storage_inputs`.
tests/test_storage_noretrieve_dryrun/Snakefile	Introduced a Snakemake workflow with rules and checkpoints for file storage operations.
tests/test_storage_noretrieve_dryrun/fs/test.txt	Added a new file `test.txt` containing the text "test".
tests/tests.py	Added a test function `test_storage_noretrieve_dryrun` to simulate dry run behavior and validate storage retrieval failure.

Assessment against linked issues

Objective	Addressed	Explanation
Not download file during dry-run (#29)	✅

Possibly related PRs

fix: improve runtime complexity of post-job checkpoint handling #3096: Changes to snakemake/dag.py that improve the handling of checkpoint jobs, which relates to the control flow and execution conditions similar to the updates made in this PR.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c67ba83 and 4d4db25.

📒 Files selected for processing (1)

tests/test_storage_noretrieve_dryrun/Snakefile (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

tests/test_storage_noretrieve_dryrun/Snakefile

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 1

Outside diff range and nitpick comments (1)

snakemake/dag.py (1)

Line range hint 1-1862: Summary of changes and recommendations

The changes made to the postprocess_after_update method in snakemake/dag.py successfully implement the PR objective of omitting storage downloads during dry runs in workflows with checkpoints. The modification is minimal and focused, which reduces the risk of introducing bugs.

However, to ensure the robustness of this change, I recommend the following:

Add unit tests specifically for this new behavior, covering various scenarios of dry runs with and without shared input/output settings.

Update the documentation to reflect this new behavior in dry run scenarios.

Consider adding a comment in the code explaining the rationale behind this change, which will help future maintainers understand the intent.

To improve the overall architecture and maintainability of the code:

Consider extracting the condition for storage download into a separate method, e.g., should_retrieve_storage_inputs(). This would improve readability and make it easier to test this logic independently.

Evaluate if this change should be reflected in other parts of the DAG implementation or related classes to ensure consistent behavior across the entire workflow.

The implementation is approved, but please address the recommendations to ensure the change is fully integrated and tested within the Snakemake ecosystem.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 4baebfb and f986e63.

Files selected for processing (1)

snakemake/dag.py (1 hunks)

Additional context used

Path-based instructions (1)

snakemake/dag.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

coderabbitai · 2024-09-24T14:48:41Z

snakemake/dag.py

+        if not self.workflow.dryrun and (
+            (self.workflow.is_main_process and shared_input_output)
+            or self.workflow.remote_exec
+        ):


💡 Codebase verification

Add Tests to Verify Storage Behavior During Dry Runs

The implementation correctly introduces a condition to prevent storage downloads during dry runs, aligning with the PR objective to omit storage downloads in workflows with checkpoints.

However, the current test suite lacks specific tests that verify this behavior. To ensure the change works as intended and to prevent future regressions, please add tests that cover storage operations during dry runs.

Location to Add Tests:

tests/tests.py

Analysis chain

Ensure dry run behavior is consistent with PR objectives

The changes introduce a new condition to prevent storage downloads during dry runs. This aligns with the PR objective to "omit storage downloads during dryrun in workflows with checkpoints".

However, there are a few points to consider:

The condition now includes both dry run and shared input/output checks.

The indentation suggests that this condition applies to both the main process and remote execution scenarios.

To ensure this change doesn't introduce unintended side effects, please run the following verification:

The implementation looks correct and aligns with the PR objectives. However, please ensure that this change doesn't affect other parts of the workflow that might rely on storage downloads during dry runs.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash # Check for any other occurrences of self.workflow.dryrun in the file # to ensure consistent behavior across the DAG implementation rg --type python 'self\.workflow\.dryrun' snakemake/dag.py # Verify if there are any tests covering dry run scenarios with storage rg --type python 'def test.*dryrun.*storage' tests/

Length of output: 176

Script:

#!/bin/bash # Search for occurrences of self.workflow.dryrun in Python files within snakemake/dag.py fd --extension py --type file | grep "^snakemake/dag\.py$" | xargs rg 'self\.workflow\.dryrun' # Search for tests covering dry run scenarios with storage in the tests/ directory fd --extension py --type file tests/ | xargs rg 'def test.*dryrun.*storage'

Length of output: 772

Script:

#!/bin/bash # Search for occurrences of self.workflow.dryrun in snakemake/dag.py fd --extension py --type file | grep "^snakemake/dag\.py$" | xargs rg 'self\.workflow\.dryrun' # Search for tests covering dry run scenarios with storage in the tests/ directory using full paths fd --extension py --type file --full-path 'tests/' | xargs rg 'def test.*dryrun.*storage'

Length of output: 454

Script:

#!/bin/bash # Search for any tests related to dry run scenarios in the tests/ directory fd --extension py --type file --full-path 'tests/' | xargs rg 'def test.*dryrun'

Length of output: 173

coderabbitai

Actionable comments posted: 1

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between f986e63 and c67ba83.

Files selected for processing (3)

tests/test_storage_noretrieve_dryrun/Snakefile (1 hunks)
tests/test_storage_noretrieve_dryrun/fs/test.txt (1 hunks)
tests/tests.py (2 hunks)

Files skipped from review due to trivial changes (1)

tests/test_storage_noretrieve_dryrun/fs/test.txt

Additional context used

Path-based instructions (1)

tests/tests.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Additional comments not posted (6)

tests/test_storage_noretrieve_dryrun/Snakefile (4)
2-3: LGTM: Storage configuration is appropriate for testing.

The storage configuration using the "fs" provider is suitable for testing filesystem operations in the context of dry runs.

6-8: LGTM: Rule all correctly defines the workflow's final output.

The rule all appropriately uses the storage() function to specify the final output, which is essential for testing storage behavior during dry runs.

18-24: LGTM: Rule bar correctly implements storage operations.

The rule bar appropriately uses the storage() function for both input and output, and the shell command correctly utilizes the {input} and {output} placeholders. This setup is well-suited for testing storage behavior during dry runs.

1-24: Overall assessment: Well-structured test case with a minor issue.

This Snakefile provides a well-structured test case for examining storage behavior during dry runs, which aligns perfectly with the PR objectives. The implementation includes appropriate use of storage configurations, checkpoints, and rules. However, there's a minor issue in the foo checkpoint where the shell command doesn't write to the correct storage location.

Once the checkpoint issue is addressed, this test case should effectively validate the changes related to omitting storage downloads during dry runs in workflows with checkpoints.

To ensure that this test case is properly integrated into the test suite, please run the following command:

This will help confirm that the new test case is referenced in the test suite configuration.

Verification successful

Test case is correctly included in the test suite.

The execution of the verification script confirms that test_storage_noretrieve_dryrun is referenced in tests/tests.py, ensuring the test case is properly integrated into the test suite.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Verify that the test case is included in the test suite
grep -r "test_storage_noretrieve_dryrun" tests
Length of output: 182
tests/tests.py (2)
12-12: New import added for mocking.

The AsyncMock class has been imported from the unittest.mock module. This is appropriate for mocking asynchronous functions or methods.

1471-1474: New test function added to simulate storage retrieval failure during dry run.

This test function, test_storage_noretrieve_dryrun, uses the @patch decorator to mock the retrieve_from_storage method of snakemake.io._IOFile. The mock is set to raise an exception, simulating a failure in storage retrieval. This is a good approach to test the behavior of Snakemake during a dry run when storage retrieval fails.

A few observations and suggestions:

The test function name clearly describes its purpose.

The use of AsyncMock is appropriate for mocking an asynchronous method.

The test runs Snakemake in dry run mode, which is correct for this scenario.

To ensure that this test is correctly integrated and doesn't interfere with other tests, we can check for any related configurations or setup in the test directory:

Verification successful

Test Integration Verified Successfully

The new test function test_storage_noretrieve_dryrun is correctly integrated and does not interfere with other tests. No related configurations or modifications were found that could affect the test's behavior.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Check for any related configurations or setup files
echo "Checking for related test configurations:"
find . -maxdepth 1 -name "test_storage_noretrieve_dryrun*"
echo "Checking for any modifications to test runner or configuration files:"
grep -n "test_storage_noretrieve_dryrun" *.py
Length of output: 366

Script:
#!/bin/bash
# Searching for related test configurations recursively
echo "Searching for related test configurations recursively:"
fd "test_storage_noretrieve_dryrun" . -t f --extension py

# Searching for any modifications to test runner or configuration files recursively
echo "Searching for modifications related to 'test_storage_noretrieve_dryrun' in Python files:"
rg -n "test_storage_noretrieve_dryrun" --type py
Length of output: 564

tests/test_storage_noretrieve_dryrun/Snakefile

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

sonarqubecloud · 2024-09-25T07:51:06Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

🤖 I have created a release *beep* *boop* --- ## [8.20.5](v8.20.4...v8.20.5) (2024-09-25) ### Bug Fixes * fixed check for remote conda env pinning and post-deploy files; fixed conda env cleanup ([#3103](#3103)) ([4d0a7e9](4d0a7e9)) * omit storage downloads during dryrun in workflows with checkpoints ([#3100](#3100)) ([151216a](151216a)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

fix: omit storage downloads during dryrun in workflows with checkpoints

f986e63

johanneskoester mentioned this pull request Sep 24, 2024

Not download file durring dry-run snakemake/snakemake-storage-plugin-s3#29

Closed

coderabbitai bot reviewed Sep 24, 2024

View reviewed changes

add testcase

c67ba83

coderabbitai bot reviewed Sep 24, 2024

View reviewed changes

tests/test_storage_noretrieve_dryrun/Snakefile Outdated Show resolved Hide resolved

Update tests/test_storage_noretrieve_dryrun/Snakefile

4d4db25

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

johanneskoester merged commit 151216a into main Sep 25, 2024

johanneskoester deleted the fix/checkpoint-dryrun-no-storage-download branch September 25, 2024 08:01

github-actions bot mentioned this pull request Sep 25, 2024

chore(main): release 8.20.5 #3101

Merged

coderabbitai bot mentioned this pull request Nov 13, 2024

fix: checkpoints inside modules are overwritten #2843

Closed

2 tasks

coderabbitai bot mentioned this pull request Nov 22, 2024

feat: assume checkpoint safe temp files #3218

Open

2 tasks

This was referenced Mar 12, 2025

fix: --delete-all-output ignores --dry-run #3265

Merged

fix: edgecases of source deployment in case of remote execution #3396

Merged

feat: Fix keep_local in storage directive and more freedom over remote retrieval behaviour #3410

Merged

coderabbitai bot mentioned this pull request Mar 24, 2025

feat: allow input url passthrough to executor backend #2639

Open

2 tasks

coderabbitai bot mentioned this pull request Dec 23, 2025

Combining False/True values for retrieving storage on connected DAG nodes can lead to unexpected file deletion. #3898

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: omit storage downloads during dryrun in workflows with checkpoints#3100

fix: omit storage downloads during dryrun in workflows with checkpoints#3100
johanneskoester merged 3 commits intomainfrom
fix/checkpoint-dryrun-no-storage-download

johanneskoester commented Sep 24, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 24, 2024 •

edited

Loading

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 24, 2024

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Sep 25, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

johanneskoester commented Sep 24, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Assessment against linked issues

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 24, 2024

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Sep 25, 2024

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

johanneskoester commented Sep 24, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 24, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)