fix: ensure that intermediate files in job groups do not cause spurious mtime errors when checking for consistency with output files#3220
Conversation
…us mtime errors when checking for consistency with output files
📝 Walkthrough📝 WalkthroughWalkthroughThe pull request introduces modifications to the Changes
Possibly related PRs
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
snakemake/dag.py(2 hunks)snakemake/jobs.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
snakemake/dag.py (1)
Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.
snakemake/jobs.py (1)
Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.
🔇 Additional comments (3)
snakemake/jobs.py (1)
1121-1126: Ensure all calls to postprocess include the new check_output_mtime parameter
The postprocess method now includes the check_output_mtime parameter with a default value. Please verify that all calls to postprocess are updated to include this parameter as needed to avoid unexpected behavior.
Run the following script to find all calls to postprocess and check for the check_output_mtime parameter:
✅ Verification successful
The check_output_mtime parameter is properly handled in the codebase
The verification shows that all calls to postprocess are correctly handling the check_output_mtime parameter:
- In
snakemake/jobs.py, there's an explicit call withcheck_output_mtime=Falsewhere needed for job groups - Other calls in
snakemake/scheduler.pydon't specify the parameter, which is fine as they'll use the default value ofTrue - The recursive call in
snakemake/dag.pycorrectly propagates all keyword arguments using**kwargs
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Find all calls to `postprocess` and check if they include the `check_output_mtime` argument.
# Search for method calls to `postprocess` that may span multiple lines
rg --multiline --multiline-dotall '(?<!def )(?<!self\.|super\.)postprocess\((?:[^)]|\n)*\)' --glob '*.py'
Length of output: 367
Script:
#!/bin/bash
# Search for method calls to `postprocess` using ast-grep
ast-grep --pattern 'postprocess($$$)'
# Also search for the method definition to understand its usage
ast-grep --pattern 'def postprocess($$$)'
Length of output: 143
Script:
#!/bin/bash
# Let's search with more context around the postprocess call
rg -B 5 -A 5 'postprocess\(' --glob '*.py' --pcre2
# Also search for any class that might have postprocess as a method
ast-grep --pattern 'class $_ {
$$$
postprocess($$$) {
$$$
}
$$$
}'
Length of output: 9159
snakemake/dag.py (2)
680-685: Method signature updated correctly with 'check_output_mtime' parameter
The addition of the check_output_mtime: bool = True parameter to the check_and_touch_output method is appropriate and maintains backward compatibility with existing code by providing a default value.
737-738: Conditional execution of output modification time check is logically sound
The conditional statement correctly ensures that the check_output_mtime check is performed only when both wait_for_local and check_output_mtime are True, aligning with the intended functionality.
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Outside diff range and nitpick comments (2)
snakemake/jobs.py (2)
17-17: Remove unused importThe
timemodule is no longer used after replacingtime.sleep()withasyncio.sleep().-import time🧰 Tools
🪛 Ruff (0.7.0)
17-17:
timeimported but unusedRemove unused import:
time(F401)
1520-1535: LGTM: Improved handling of modification times in group jobsThe changes effectively prevent spurious mtime errors by:
- Processing jobs in toposorted order
- Adding delay between levels to ensure distinct modification times
- Disabling output mtime checks to prevent interference with intermediate file updates
Consider adding a docstring to explain this behavior for future maintainers.
Add a docstring explaining the modification time handling:
async def postprocess(self, error=False, **kwargs): """Process jobs in toposorted order with controlled modification times. Jobs are processed level by level with a small delay between levels to ensure distinct modification times. Output mtime checks are disabled to prevent interference with intermediate file updates from previous levels. Args: error (bool): Whether an error occurred during execution **kwargs: Additional arguments passed to individual job postprocess """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (1)
snakemake/jobs.py(4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
snakemake/jobs.py (1)
Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.
🪛 Ruff (0.7.0)
snakemake/jobs.py
17-17: time imported but unused
Remove unused import: time
(F401)
🔇 Additional comments (1)
snakemake/jobs.py (1)
1121-1126: LGTM: New parameter to control mtime checking
The addition of the check_output_mtime parameter with a default value of True maintains backward compatibility while allowing control over modification time checks. This change effectively addresses the issue of spurious mtime errors.
Also applies to: 1159-1159
🤖 I have created a release *beep* *boop* --- ## [8.25.4](v8.25.3...v8.25.4) (2024-11-27) ### Bug Fixes * clean env vars in apptainer ([#3199](#3199)) ([76d5329](76d5329)) * ensure that intermediate files in job groups do not cause spurious mtime errors when checking for consistency with output files ([#3220](#3220)) ([4ba2bdf](4ba2bdf)) * Remove incomplete marker also when drop-metadata is active ([#3215](#3215)) ([a4f2e5c](a4f2e5c)) * Remove incomplete marker for job finished only after metadata is written ([#3197](#3197)) ([6567e5f](6567e5f)) * Support versioned URLs in Asset class and fix missing versions in Snakemake report ([#3203](#3203)) ([f086f6c](f086f6c)) * update rust-script usage to recent version (v0.35.0) [#3183](#3183) ([#3208](#3208)) ([43885d7](43885d7)) ### Documentation * clarify continuously updated input section ([#3219](#3219)) ([72a6994](72a6994)) * Fix typo in CHANGELOG.md ([#3198](#3198)) ([0e445ed](0e445ed)) * refer to Merkle trees instead of "blockchain" in caching.rst ([#3216](#3216)) ([282e5d9](282e5d9)) * remove twitter in favor of bluesky and mastodon ([#3217](#3217)) ([231c6df](231c6df)) * use "dictionary" not "array" wording in config docs ([#3156](#3156)) ([17aed41](17aed41)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>



QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
New Features
Bug Fixes
Refactor
DAGandJobclasses to include new parameters for better functionality.