fix: properly handle remote storage when waiting for pipes#3317
fix: properly handle remote storage when waiting for pipes#3317johanneskoester merged 3 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughThis pull request refines file handling and job processing logic. In the DAG, the Changes
Sequence Diagram(s)sequenceDiagram
participant U as User/Job
participant D as DAG.handle_storage
participant S as Storage Service
U->>D: Trigger file storage
D->>D: Evaluate file condition (is storage & not flagged as pipe/service)
alt Eligible for storage
D->>S: Store file
else Not eligible (flagged as pipe/service)
D-->>U: Skip storage operation
end
sequenceDiagram
participant A as Async Caller
participant I as wait_for_files
participant L as Logger
A->>I: Call wait_for_files(files, consider_local)
I->>I: Iterate over files and check if _IOFile not in consider_local
I->>L: Log detailed missing files information
I-->>A: Return formatted missing files list
sequenceDiagram
participant J as Job
participant P as Pipe/Service Processor
participant E as Error Handler
J->>J: Initialize Job (populate pipe_or_service_output)
J->>P: Process outputs, flagging pipe/service files
J->>J: Invoke is_pipe_or_service_input() during job preparation
J->>E: In postprocess, catch ExceptionGroup for error handling
Possibly related PRs
✨ Finishing Touches
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
snakemake/jobs.py (1)
1598-1602: Enhanced error handling for group job postprocessing.The code now properly catches
ExceptionGroupexceptions, which provides better error reporting when multiple exceptions occur during parallel postprocessing of group jobs.Consider using
raise ... from eto preserve the exception chain, which helps with debugging by showing the original cause:- except ExceptionGroup as e: - raise WorkflowError( - f"Error postprocessing group job {self.jobid}.", - *e.exceptions, - ) + except ExceptionGroup as e: + raise WorkflowError( + f"Error postprocessing group job {self.jobid}.", + *e.exceptions, + ) from e🧰 Tools
🪛 Ruff (0.8.2)
1599-1602: Within an
exceptclause, raise exceptions withraise ... from errorraise ... from Noneto distinguish them from errors in exception handling(B904)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
snakemake/dag.py(1 hunks)snakemake/io.py(3 hunks)snakemake/jobs.py(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest ...
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
snakemake/dag.pysnakemake/jobs.pysnakemake/io.py
🪛 Ruff (0.8.2)
snakemake/jobs.py
1599-1602: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling
(B904)
⏰ Context from checks skipped due to timeout of 90000ms (31)
- GitHub Check: testing (10, 3.12)
- GitHub Check: testing (10, 3.11)
- GitHub Check: testing (9, 3.12)
- GitHub Check: testing (9, 3.11)
- GitHub Check: testing (8, 3.12)
- GitHub Check: testing (8, 3.11)
- GitHub Check: testing (7, 3.12)
- GitHub Check: testing (7, 3.11)
- GitHub Check: testing (6, 3.12)
- GitHub Check: testing (6, 3.11)
- GitHub Check: testing (5, 3.12)
- GitHub Check: testing (5, 3.11)
- GitHub Check: testing (4, 3.12)
- GitHub Check: testing-windows (10)
- GitHub Check: testing-windows (9)
- GitHub Check: testing (4, 3.11)
- GitHub Check: testing-windows (8)
- GitHub Check: testing (3, 3.12)
- GitHub Check: testing-windows (7)
- GitHub Check: testing (3, 3.11)
- GitHub Check: testing (2, 3.12)
- GitHub Check: testing-windows (6)
- GitHub Check: testing (2, 3.11)
- GitHub Check: testing-windows (5)
- GitHub Check: testing-windows (4)
- GitHub Check: testing (1, 3.12)
- GitHub Check: testing-windows (3)
- GitHub Check: testing (1, 3.11)
- GitHub Check: testing-windows (2)
- GitHub Check: testing-windows (1)
- GitHub Check: apidocs
🔇 Additional comments (11)
snakemake/dag.py (1)
916-921: Improved handling of pipe and service files in storage operations.The additional conditions to exclude files flagged as "pipe" or "service" from being stored in storage is a valuable improvement. These file types serve as communication channels between jobs and don't need to be persisted in storage, avoiding unnecessary storage operations.
snakemake/jobs.py (4)
291-291: Good addition for tracking pipe or service outputs.Adding a dedicated set to track outputs flagged as pipe or service will make it easier to handle these special file types throughout the job lifecycle.
301-302: Properly populating the pipe_or_service_output set.The code correctly identifies pipe or service outputs during initialization and adds them to the dedicated tracking set.
949-953: Well-implemented helper method for identifying pipe or service inputs.This method efficiently determines if an input file is a pipe or service output from one of the job's dependencies by checking against the newly added tracking set.
858-860: Improved handling of pipe/service inputs during job preparation.The
consider_localparameter now properly includes pipe and service inputs, which ensures these files are correctly handled when waiting for inputs during job preparation.snakemake/io.py (6)
944-945: Good addition of a new constant.This constant provides a clean default value for the new
consider_localparameter.
947-953: Function signature updated appropriately with type annotation.The new parameter
consider_localhas been added with proper type annotation and default value, maintaining backward compatibility.
964-966: Good addition to handle pipe/service files as local files.This check now considers files in the
consider_localset to bypass storage checks, which properly addresses the issue with pipe/service files in remote storage scenarios.
970-971: Improved error message formatting.The error message for missing files in storage now uses f-string formatting, which is more readable and Pythonic.
988-994: Enhanced logging with better formatting.The log message for missing files has been improved with better formatting and structure.
1001-1002: Consistent error formatting.This change maintains consistency with the earlier logging improvements by using the same formatting approach for missing files in the error message.
🤖 I have created a release *beep* *boop* --- ## [8.29.0](v8.28.0...v8.29.0) (2025-02-27) ### Features * provide mechanism to link between report items (snakemake.report_href, see docs) ([#3224](#3224)) ([a024e60](a024e60)) * toggles for binary labels in report ([#3296](#3296)) ([7690998](7690998)) ### Bug Fixes * add missing [@conda](https://github.com/conda) annotations ([#3298](#3298)) ([0b541cc](0b541cc)) * properly handle remote storage when waiting for pipes ([#3317](#3317)) ([9f7e092](9f7e092)) * provide proper error messages if output file preparation (pipe, mkdir) fails ([#3309](#3309)) ([77ef176](77ef176)) * record file extension (e.g. .tsv) in between workflow cache records in order to avoid that files of the wrong format are erroneously returned by the cache. This will lead to some cache misses. But avoiding errors induced by the previous behavior of ignoring file extensions when seeking for cache records is more important. ([#3314](#3314)) ([4912f00](4912f00)) * sort results lexicographically by labels in their order of appearance ([#3293](#3293)) ([a19a0ac](a19a0ac)) ### Documentation * update best practices ([7270eb3](7270eb3)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
…#3317) Before it could fail with certain storage plugins (e.g. fs), in combination with local execution. ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit - **New Features** - Refined file storage behavior by excluding certain flagged files from retention. - Enhanced file waiting functionality with an additional option for local file evaluation and clearer notifications when files are missing. - Upgraded job processing by tracking special outputs and providing improved error reporting for smoother operations. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
🤖 I have created a release *beep* *boop* --- ## [8.29.0](snakemake/snakemake@v8.28.0...v8.29.0) (2025-02-27) ### Features * provide mechanism to link between report items (snakemake.report_href, see docs) ([snakemake#3224](snakemake#3224)) ([a024e60](snakemake@a024e60)) * toggles for binary labels in report ([snakemake#3296](snakemake#3296)) ([7690998](snakemake@7690998)) ### Bug Fixes * add missing [@conda](https://github.com/conda) annotations ([snakemake#3298](snakemake#3298)) ([0b541cc](snakemake@0b541cc)) * properly handle remote storage when waiting for pipes ([snakemake#3317](snakemake#3317)) ([9f7e092](snakemake@9f7e092)) * provide proper error messages if output file preparation (pipe, mkdir) fails ([snakemake#3309](snakemake#3309)) ([77ef176](snakemake@77ef176)) * record file extension (e.g. .tsv) in between workflow cache records in order to avoid that files of the wrong format are erroneously returned by the cache. This will lead to some cache misses. But avoiding errors induced by the previous behavior of ignoring file extensions when seeking for cache records is more important. ([snakemake#3314](snakemake#3314)) ([4912f00](snakemake@4912f00)) * sort results lexicographically by labels in their order of appearance ([snakemake#3293](snakemake#3293)) ([a19a0ac](snakemake@a19a0ac)) ### Documentation * update best practices ([7270eb3](snakemake@7270eb3)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>



Before it could fail with certain storage plugins (e.g. fs), in combination with local execution.
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit