fix: backup output files marked for update and restore them in case of job failures#3833
Conversation
|
Please format your code with |
📝 WalkthroughWalkthroughAdds a backup-and-restore workflow for outputs marked with the Changes
Sequence Diagram(s)sequenceDiagram
participant Job
participant Persistence
participant FS as File System
rect rgb(245, 250, 255)
note over Job,Persistence: Prepare
Job->>Persistence: backup_output(path) [for outputs with update]
Persistence->>FS: copy/move output -> backups/...
end
rect rgb(255, 255, 240)
note over Job: Execute job
Job->>Job: run()
end
alt Success
rect rgb(220, 255, 220)
note over Job,Persistence: Postprocess (success)
Job->>Persistence: cleanup_backup(path)
Persistence->>FS: remove backups/...
end
else Failure
rect rgb(255, 230, 230)
note over Job,Persistence: Postprocess (failure)
Job->>Persistence: restore_output(path)
Persistence->>FS: remove failed target (if dir), move backup -> original
Persistence->>FS: remove backups/...
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes
Possibly related PRs
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (5)
✅ Files skipped from review due to trivial changes (2)
🧰 Additional context used📓 Path-based instructions (1)**/*.py⚙️ CodeRabbit configuration file
Files:
🧠 Learnings (9)📚 Learning: 2024-10-06T14:09:54.370ZApplied to files:
📚 Learning: 2024-12-21T15:10:31.992ZApplied to files:
📚 Learning: 2025-01-17T12:00:09.368ZApplied to files:
📚 Learning: 2025-09-08T12:39:09.146ZApplied to files:
📚 Learning: 2024-11-12T12:08:20.342ZApplied to files:
📚 Learning: 2024-10-04T16:12:18.927ZApplied to files:
📚 Learning: 2024-11-05T09:41:56.469ZApplied to files:
📚 Learning: 2025-05-23T09:40:24.474ZApplied to files:
📚 Learning: 2025-05-06T01:37:23.382ZApplied to files:
🧬 Code graph analysis (1)src/snakemake/jobs.py (5)
🪛 LanguageTooltests/test_update_flag_fail/in.txt[uncategorized] ~1-~1: "sedem" (STEVILA_DO_10) [uncategorized] ~1-~1: Zamenjajte z ustrezno ubeseditvijo (STEVILA_DO_100) [uncategorized] ~1-~1: Zamenjajte z ustrezno ubeseditvijo (STEVILA_DO_100) [uncategorized] ~1-~1: Zamenjajte z ustrezno ubeseditvijo (STEVILA_DO_100) ⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (39)
🔇 Additional comments (6)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
docs/snakefiles/rules.rst (1)
3374-3376: Clarify update()+remote storage and large directory caveats.The new guarantees look good. Recommend adding one sentence that update() applies to local files/dirs, or that Snakemake will retrieve remote copies before backing up when a default storage provider is used. Also note that backing up large directories can be expensive and suggest using more granular outputs when practical.
src/snakemake/jobs.py (1)
23-23: Avoid runtime import for type hint to prevent cycles.Use TYPE_CHECKING + forward reference to keep imports light and cycle-safe.
-from snakemake.rules import Rule +from typing import TYPE_CHECKING +if TYPE_CHECKING: + from snakemake.rules import Rule @@ - self.rule: Rule = rule + self.rule: "Rule" = ruleAlso applies to: 233-233
src/snakemake/persistence.py (1)
311-322: Consider consistent handling of existing paths.For directories, the code explicitly removes the existing path before restoring, but for files, it relies on
shutil.copyto overwrite. While functionally correct, this inconsistency could cause issues ifpathexists as the wrong type (e.g., a directory when a file is expected).Apply this diff for more robust handling:
def restore_output(self, path: Path) -> None: backup_path = self._get_backup_path(path) if not backup_path.exists(): raise WorkflowError(f"Cannot restore {path}: no backup found.") + # Remove existing path regardless of type for consistency + if path.exists(): + if path.is_dir(): + shutil.rmtree(path) + else: + path.unlink() if backup_path.is_dir(): - if path.exists(): - shutil.rmtree(path) shutil.copytree(backup_path, path) shutil.rmtree(backup_path) else: shutil.copy(backup_path, path) backup_path.unlink()
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
docs/snakefiles/rules.rst(1 hunks)src/snakemake/jobs.py(5 hunks)src/snakemake/persistence.py(2 hunks)tests/tests.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
tests/tests.pysrc/snakemake/jobs.pysrc/snakemake/persistence.py
🧠 Learnings (3)
📚 Learning: 2024-10-06T14:09:54.370Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.
Applied to files:
docs/snakefiles/rules.rsttests/tests.py
📚 Learning: 2024-12-14T13:10:48.450Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 2927
File: tests/test_subpath/Snakefile:11-11
Timestamp: 2024-12-14T13:10:48.450Z
Learning: In Snakemake, when a rule defines a single output file (even if the output is specified as a dictionary), `{output}` in the shell command can be used directly to refer to that file.
Applied to files:
docs/snakefiles/rules.rst
📚 Learning: 2025-05-23T09:40:24.474Z
Learnt from: johanneskoester
Repo: snakemake/snakemake PR: 3600
File: src/snakemake/jobs.py:960-964
Timestamp: 2025-05-23T09:40:24.474Z
Learning: In the `cleanup` method of the `Job` class in `src/snakemake/jobs.py`, files in the `to_remove` list should be formatted with `fmt_iofile` without specifying `as_output=True` or `as_input=True` parameters, as these files should be displayed as generic files rather than specifically as output files.
Applied to files:
src/snakemake/jobs.py
🧬 Code graph analysis (3)
tests/tests.py (1)
tests/common.py (2)
run(152-503)dpath(33-36)
src/snakemake/jobs.py (5)
src/snakemake/workflow.py (6)
rules(610-611)rules(775-776)rule(1812-2080)output(2096-2101)persistence(516-517)dag(520-521)src/snakemake/rules.py (3)
Rule(78-1354)output(315-316)output(1404-1405)src/snakemake/persistence.py (4)
rule(470-471)backup_output(303-309)restore_output(311-322)cleanup_backup(324-330)src/snakemake/io/__init__.py (2)
is_flagged(979-982)exists(515-519)src/snakemake/io/fmt.py (1)
fmt_iofile(5-50)
src/snakemake/persistence.py (1)
tests/common.py (1)
copy(117-121)
🪛 Ruff (0.14.3)
src/snakemake/persistence.py
314-314: Avoid specifying long messages outside the exception class
(TRY003)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (15)
- GitHub Check: tests (10, windows-2022, py313)
- GitHub Check: tests (10, macos-latest, py313)
- GitHub Check: tests (2, windows-2022, py313)
- GitHub Check: tests (8, macos-latest, py313)
- GitHub Check: tests (7, ubuntu-latest, py312)
- GitHub Check: tests (7, ubuntu-latest, py313)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (4, ubuntu-latest, py313)
- GitHub Check: tests (3, macos-latest, py313)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (3, windows-2022, py313)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (2, ubuntu-latest, py312)
🔇 Additional comments (5)
tests/tests.py (1)
2304-2306: Mirror Windows skip to match test_update_flag.Given test_update_flag is skipped on Windows, consider applying @skip_on_windows here for consistency unless you verified it’s stable on Windows.
Apply if needed:
@skip_on_windows # OS agnostic def test_update_flag(): run(dpath("test_update_flag")) +@skip_on_windows def test_update_flag_fail(): run(dpath("test_update_flag_fail"), shouldfail=True, check_results=True)src/snakemake/jobs.py (1)
1235-1242: LGTM: restore on error, cleanup backup on success.The update backup lifecycle is correct and runs before touch/log handling. Nice.
Edge case: if backup is missing (e.g., storage-only output not retrieved), restore_output raises; see suggested guard in prepare().
src/snakemake/persistence.py (3)
73-73: LGTM!The backup path initialization is clean and follows Python path conventions.
324-330: LGTM!The cleanup logic correctly handles both files and directories, with appropriate existence checks.
332-336: LGTM! Clever handling of absolute paths.The logic correctly converts absolute paths to relative by stripping the root using
path.relative_to(path.parents[-1]). ThePathclass handles cross-platform differences (Unix/vs WindowsC:\), so this approach should work correctly on all platforms.
| for f in self.output: | ||
| if is_flagged(f, "update"): | ||
| self.rule.workflow.persistence.backup_output(Path(f)) | ||
| f.prepare() | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
Handle update()+storage: retrieve local copy before backup.
If an update-marked output lives in a storage provider and isn’t present locally, backup_output(Path(f)) will fail, and later restore_output will raise “no backup found.” Retrieve first.
- for f in self.output:
- if is_flagged(f, "update"):
- self.rule.workflow.persistence.backup_output(Path(f))
+ for f in self.output:
+ if is_flagged(f, "update"):
+ # Ensure local copy exists for storage-backed outputs
+ if f.is_storage and not await f.exists_local():
+ await f.retrieve_from_storage()
+ self.rule.workflow.persistence.backup_output(Path(f))
f.prepare()Please confirm whether update() is intended to support storage outputs; if not, consider enforcing a guard that disallows update+storage with a clear error.
---
🏁 Script executed:
```shell
#!/bin/bash
# Get context around lines 841-845 in jobs.py to see the containing function
sed -n '820,860p' src/snakemake/jobs.py
Length of output: 1453
🏁 Script executed:
#!/bin/bash
# Find the containing function name and its async/sync status
sed -n '800,850p' src/snakemake/jobs.py | grep -B 50 "for f in self.output" | grep -E "^\s*async\s+def|^\s*def " | tail -1Length of output: 28
🏁 Script executed:
#!/bin/bash
# Search for backup_output and restore_output implementations
rg -n "def backup_output|def restore_output" --type=pyLength of output: 164
🏁 Script executed:
#!/bin/bash
# Search for the output class definition and storage-related attributes
rg -n "class.*Output|is_storage|retrieve_from_storage" --type=py -C 2 | head -100Length of output: 5794
🏁 Script executed:
#!/bin/bash
# Search for existing update flag handling with storage
rg -n "is_flagged.*update" --type=py -C 3Length of output: 4369
---
🏁 Script executed:
```shell
#!/bin/bash
# Check backup_output and restore_output implementations
sed -n '303,330p' src/snakemake/persistence.py
Length of output: 1048
🏁 Script executed:
#!/bin/bash
# Find f.prepare() method - search for prepare() in output-related files
rg -n "def prepare\(" --type=py src/snakemake/io/ -A 5Length of output: 396
🏁 Script executed:
#!/bin/bash
# Check if there are any async loops or taskgroups used for output handling
rg -n "for f in.*output|for f in self.output" --type=py src/snakemake/jobs.py -A 10 | head -80Length of output: 3074
🏁 Script executed:
#!/bin/bash
# Search for any error messages or issues related to backup_output with storage
rg -n "backup_output|restore_output" --type=py -B 2 -A 5Length of output: 2343
---
🏁 Script executed:
```shell
#!/bin/bash
# Search for test cases or examples with update flag on storage outputs
rg -n "update.*storage|storage.*update" --type=py tests/
Length of output: 0
🏁 Script executed:
#!/bin/bash
# Search for any validation or guards preventing update + storage combination
rg -n "update" --type=py src/snakemake/rules.py -B 2 -A 2 | head -60Length of output: 2297
🏁 Script executed:
#!/bin/bash
# Check if there's any special handling in DAG for update-flagged files
rg -n "update_jobs|is_flagged.*update" --type=py src/snakemake/dag.py -B 1 -A 3 | head -80Length of output: 1424
🏁 Script executed:
#!/bin/bash
# Check what Path(f) actually does - is f converting to local path or storage path?
rg -n "def __str__|def __fspath__" --type=py src/snakemake/io/ -A 3Length of output: 434
🏁 Script executed:
#!/bin/bash
# Look for any existing patterns of checking storage before backup in the codebase
rg -n "is_storage.*backup|backup.*is_storage" --type=py -B 2 -A 2Length of output: 0
---
🏁 Script executed:
```shell
#!/bin/bash
# Check the IOFile class __str__ and how Path(f) resolves
sed -n '1900,1930p' src/snakemake/io/__init__.py
Length of output: 951
🏁 Script executed:
#!/bin/bash
# Check if storage files are retrieved/localized before prepare() is called
rg -n "await.*retrieve|keep_storage_local" --type=py src/snakemake/jobs.py -B 2 -A 2 | head -100Length of output: 860
🏁 Script executed:
#!/bin/bash
# Check the full prepare method to understand the execution flow
sed -n '798,860p' src/snakemake/jobs.pyLength of output: 2273
🏁 Script executed:
#!/bin/bash
# Search for any comments or documentation about update flag and storage
rg -n "update.*flag|update.*marker" --type=py src/snakemake/ | grep -i "doc\|comment\|#" | head -20Length of output: 113
🏁 Script executed:
#!/bin/bash
# Check how files enter this prepare() - are they already local?
sed -n '750,800p' src/snakemake/jobs.pyLength of output: 1476
Storage-backed update outputs will fail at backup/restore—needs fix with proper async handling.
The concern is valid: backup_output(Path(f)) calls shutil.copy which requires a local filesystem path, but storage-backed files may not be present locally. For update-flagged storage outputs, this will fail, and restore_output will raise "no backup found."
However, the proposed diff won't work as-is—it uses await in a synchronous loop. Since prepare() is async (line 813), the fix should use the existing TaskGroup pattern (as seen at line 790-791 in the same file):
if not self.is_norun:
await self.remove_existing_output()
if self.resources.get("tmpdir"):
os.makedirs(self.resources.tmpdir, exist_ok=True)
+ # Backup storage-backed update outputs before prepare
+ async with asyncio.TaskGroup() as tg:
+ for f in self.output:
+ if is_flagged(f, "update") and f.is_storage:
+ async def backup_storage_output(f):
+ if not await f.exists_local():
+ await f.retrieve_from_storage()
+ self.rule.workflow.persistence.backup_output(Path(f))
+ tg.create_task(backup_storage_output(f))
+
for f in self.output:
if is_flagged(f, "update"):
- self.rule.workflow.persistence.backup_output(Path(f))
f.prepare()Alternatively, consider whether update is intended to support storage outputs; if not, add a validation check in the DAG or rule setup.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In src/snakemake/jobs.py around lines 841-845, the loop calls synchronous
backup_output for update-flagged outputs which will fail for storage-backed
files and the proposed diff used await inside a sync loop; fix by either
validating that update-flagged outputs are local during DAG/rule setup and raise
a clear error if not, or implement async backups using the existing TaskGroup
pattern: collect update-flagged outputs, open a TaskGroup, spawn tasks that
await self.rule.workflow.persistence.backup_output(Path(f)) for storage-backed
outputs, await completion of the group, then call await f.prepare() (or schedule
prepare() tasks in the same TaskGroup if needed) so all async work runs
correctly without using await in a plain for-loop.
| def backup_output(self, path: Path) -> None: | ||
| backup_path = self._get_backup_path(path) | ||
| backup_path.parent.mkdir(parents=True, exist_ok=True) | ||
| if path.is_dir(): | ||
| shutil.copytree(path, backup_path) | ||
| else: | ||
| shutil.copy(path, backup_path) |
There was a problem hiding this comment.
🧩 Analysis chain
Add existence check and handle pre-existing backups.
The method has two potential issues:
- If
pathdoesn't exist,path.is_dir()returnsFalse, causingshutil.copyto fail with aFileNotFoundError. - For directories,
shutil.copytreewill raiseFileExistsErrorifbackup_pathalready exists (e.g., from a previous failed backup that wasn't cleaned up).
Apply this diff to add validation and handle existing backups:
def backup_output(self, path: Path) -> None:
+ if not path.exists():
+ raise WorkflowError(f"Cannot backup {path}: file does not exist.")
backup_path = self._get_backup_path(path)
backup_path.parent.mkdir(parents=True, exist_ok=True)
+ # Clean up any existing backup first
+ if backup_path.exists():
+ if backup_path.is_dir():
+ shutil.rmtree(backup_path)
+ else:
+ backup_path.unlink()
if path.is_dir():
shutil.copytree(path, backup_path)
else:
shutil.copy(path, backup_path)Run the following script to verify that outputs marked with the "update" flag exist before backup_output is called:
🏁 Script executed:
#!/bin/bash
# Description: Verify that backup_output is only called on existing files
# Search for calls to backup_output in the codebase
rg -nP -C5 'backup_output\s*\(' --type=pyLength of output: 1528
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Search for is_flagged function and update flag handling
rg -nP 'is_flagged|update.*flag' --type=py -A2 -B2 | head -100Length of output: 5283
🏁 Script executed:
#!/bin/bash
# Search for prepare() method on output files
rg -nP 'def prepare\s*\(' --type=py -A5 -B1 | head -150Length of output: 848
🏁 Script executed:
#!/bin/bash
# Search for how backup_output is used in context of file existence
rg -nP '\.backup_output\(' --type=py -B10 -A5Length of output: 919
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Search for restore/revert logic for update-flagged files
rg -nP 'Restoring previous version|update.*restore|backup.*restore' --type=py -B5 -A10Length of output: 1206
🏁 Script executed:
#!/bin/bash
# Look at the restore logic more carefully
rg -nP 'if error:|if is_flagged.*update' --type=py -B3 -A15 | grep -A20 'if is_flagged.*update'Length of output: 7792
🏁 Script executed:
#!/bin/bash
# Check if there's any file existence check before backup_output
rg -nP '(if|and).*\.exists\(\).*backup_output|backup_output.*exists' --type=py -B5 -A5Length of output: 0
[/run_scripts]
🏁 Script executed:
#!/bin/bash
# Search for restore_output and cleanup_backup methods in persistence.py
rg -nP 'def (restore_output|cleanup_backup)' --type=py -A15Length of output: 2021
🏁 Script executed:
#!/bin/bash
# Search for _get_backup_path to understand backup path structure
rg -nP 'def _get_backup_path' --type=py -A8Length of output: 690
Handle missing output files when backing up "update"-flagged outputs.
The method does not check whether path exists before calling path.is_dir(). On the first run of a rule with the "update" flag, the output file doesn't exist yet, causing path.is_dir() to return False and shutil.copy() to fail with FileNotFoundError.
Additionally, if a previous backup exists (e.g., from a crashed job), shutil.copytree() will raise FileExistsError without the dirs_exist_ok parameter.
Apply this diff to validate file existence and handle pre-existing backups:
def backup_output(self, path: Path) -> None:
+ if not path.exists():
+ return # No existing file to backup on first run
backup_path = self._get_backup_path(path)
backup_path.parent.mkdir(parents=True, exist_ok=True)
+ # Clean up any existing backup first (edge case: crashed job)
+ if backup_path.exists():
+ if backup_path.is_dir():
+ shutil.rmtree(backup_path)
+ else:
+ backup_path.unlink()
if path.is_dir():
shutil.copytree(path, backup_path)
else:
shutil.copy(path, backup_path)🤖 Prompt for AI Agents
In src/snakemake/persistence.py around lines 303 to 309, the backup_output
method calls path.is_dir() and shutil.copy/copytree without verifying the source
exists and without handling pre-existing backups; first check if path.exists()
and return early (no-op) if it does not exist; when copying directories use
shutil.copytree(path, backup_path, dirs_exist_ok=True) to allow overwriting
existing backup directories (or alternatively remove existing backup_path before
copytree on older Python versions); keep the existing parent.mkdir(parents=True,
exist_ok=True) for the backup path.
🤖 I have created a release *beep* *boop* --- ## [9.13.7](v9.13.6...v9.13.7) (2025-11-07) ### Bug Fixes * backup output files marked for update and restore them in case of job failures ([#3833](#3833)) ([3e0904d](3e0904d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
…f job failures (snakemake#3833) ### Description <!--Add a description of your PR here--> ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Documentation** * Clarified update flag: files/directories are not removed before execution and previous versions are restored if a job fails. * **New Features** * Added backup/restore handling for outputs marked with "update" to preserve and recover prior content on failure. * **Bug Fixes** * Changed cleanup behavior to avoid removing outputs flagged as updates prematurely. * **Tests** * Added a test covering a failing update-flag scenario. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
🤖 I have created a release *beep* *boop* --- ## [9.13.7](snakemake/snakemake@v9.13.6...v9.13.7) (2025-11-07) ### Bug Fixes * backup output files marked for update and restore them in case of job failures ([snakemake#3833](snakemake#3833)) ([3e0904d](snakemake@3e0904d)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Description
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
Documentation
New Features
Bug Fixes
Tests