fix: properly handle temp files in group jobs that are not needed outside of the group (do not upload them to storage and delete them early)#3730
Conversation
…side of the group (do not upload them to storage and delete them early)
📝 WalkthroughWalkthroughBroadened DAG tempfile handling to be group-aware: Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Scheduler
participant DAG
participant Job
participant GroupJob
participant FS as TempFile
Scheduler->>DAG: get_unneeded_temp_files(job)
alt job is GroupJob
DAG->>GroupJob: iterate members, call helper per member
GroupJob-->>DAG: tempfile candidates
else job is Job
DAG->>Job: collect tempfile candidates
Job-->>DAG: tempfile candidates
end
loop for each tempfile
DAG->>DAG: is_needed_tempfile(job, tempfile, outside_of_group_job?)
alt subprocess_exec active
Note right of DAG: subprocess mode → keep tempfile
DAG-->>FS: retain
else remote_exec active
DAG->>DAG: consult storage_settings.unneeded_temp_files for outside-of-group
alt outside-of-group marked unneeded
DAG-->>FS: eligible for removal
else
DAG-->>FS: retain
end
else normal execution
alt tempfile is derived target or needed by downstream outside group
DAG-->>FS: retain
else
DAG-->>FS: eligible for removal
end
end
end
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
src/snakemake/dag.py (1)
947-947: Avoid assert for runtime invariant (S101); fail safe insteadAsserts can be stripped with -O, risking None dereference later. Guard explicitly and behave conservatively.
- assert self.workflow.storage_settings is not None + if self.workflow.storage_settings is None: + # Be conservative: do not delete when storage settings are unavailable. + return True
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/snakemake/dag.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/dag.py
🪛 Ruff (0.12.2)
src/snakemake/dag.py
947-947: Use of assert detected
(S101)
🔇 Additional comments (1)
src/snakemake/dag.py (1)
290-296: LGTM: group-aware filtering for temp filesThe scoped helper and passing the group context into is_needed_tempfile are the right approach for avoiding uploads of group-internal temps.
Also applies to: 299-302
|
Please format your code with black: |
There was a problem hiding this comment.
Actionable comments posted: 0
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
src/snakemake/dag.py (1)
479-487: Ensure unneeded_temp_files is computed/populated before storage upload checksstorage_settings.unneeded_temp_files is only set from the CLI (src/snakemake/cli.py:2005) and defaults to empty (src/snakemake/settings/types.py:239); no runtime writes/updates were found. store_storage_outputs' tostore check reads that set (src/snakemake/dag.py:468–520) and is_needed_tempfile also consults it (src/snakemake/dag.py:946–957). get_unneeded_temp_files is defined separately (src/snakemake/dag.py:289).
Action: either populate storage_settings.unneeded_temp_files with the current job/group’s computed unneeded files before running store_storage_outputs, or change store_storage_outputs to evaluate per-file necessity via get_unneeded_temp_files/is_needed_tempfile (per-job or per-group) instead of relying on a global settings set.
♻️ Duplicate comments (1)
src/snakemake/dag.py (1)
941-945: Previous predicate bug is fixedReturning True when no group context is provided restores the outside‑group “any(...)” checks in nongroup contexts.
🧹 Nitpick comments (1)
src/snakemake/dag.py (1)
289-296: Return type mismatch and minor naming clarity
- The function yields _IOFile instances, but the signature says Iterable[str].
- Rename the helper’s parameter to align with is_needed_tempfile’s argument name for readability.
Apply:
- def get_unneeded_temp_files(self, job: Union[Job, GroupJob]) -> Iterable[str]: - def get_files(job, group_job=None): + def get_unneeded_temp_files(self, job: Union[Job, GroupJob]) -> Iterable[_IOFile]: + def get_files(job, outside_of_group_job=None): for f in job.output: - if is_flagged(f, "temp") and not self.is_needed_tempfile( - job, f, outside_of_group_job=group_job - ): + if is_flagged(f, "temp") and not self.is_needed_tempfile( + job, f, outside_of_group_job=outside_of_group_job + ): yield f if isinstance(job, GroupJob): for j in job: - yield from get_files(j, group_job=job) + yield from get_files(j, outside_of_group_job=job) else: yield from get_files(job)Also applies to: 299-301
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/snakemake/dag.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/dag.py
🧬 Code graph analysis (1)
src/snakemake/dag.py (2)
src/snakemake/jobs.py (7)
Job(183-1310)GroupJob(1328-1771)output(345-346)output(349-350)output(1546-1552)jobs(1367-1368)jobs(1371-1372)src/snakemake/io/__init__.py (1)
is_flagged(960-963)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (53)
- GitHub Check: tests (10, windows-2022, py313)
- GitHub Check: tests (10, windows-2022, py312)
- GitHub Check: tests (10, ubuntu-latest, py312)
- GitHub Check: tests (8, windows-2022, py312)
- GitHub Check: tests (8, ubuntu-latest, py313)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (10, ubuntu-latest, py313)
- GitHub Check: tests (10, ubuntu-latest, py311)
- GitHub Check: tests (9, ubuntu-latest, py313)
- GitHub Check: tests (8, windows-2022, py313)
- GitHub Check: tests (9, windows-2022, py313)
- GitHub Check: tests (9, ubuntu-latest, py312)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (8, ubuntu-latest, py311)
- GitHub Check: tests (7, windows-2022, py312)
- GitHub Check: tests (9, windows-2022, py312)
- GitHub Check: tests (9, ubuntu-latest, py311)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (7, ubuntu-latest, py312)
- GitHub Check: tests (5, windows-2022, py312)
- GitHub Check: tests (7, windows-2022, py313)
- GitHub Check: tests (7, ubuntu-latest, py313)
- GitHub Check: tests (6, ubuntu-latest, py313)
- GitHub Check: tests (3, windows-2022, py313)
- GitHub Check: tests (5, ubuntu-latest, py313)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (6, windows-2022, py312)
- GitHub Check: tests (6, windows-2022, py313)
- GitHub Check: tests (5, windows-2022, py313)
- GitHub Check: tests (4, ubuntu-latest, py313)
- GitHub Check: tests (4, windows-2022, py312)
- GitHub Check: tests (3, ubuntu-latest, py313)
- GitHub Check: tests (2, macos-latest, py312)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (5, ubuntu-latest, py312)
- GitHub Check: tests (2, windows-2022, py313)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (3, windows-2022, py312)
- GitHub Check: tests (4, windows-2022, py313)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (1, macos-latest, py313)
- GitHub Check: tests (2, windows-2022, py312)
- GitHub Check: tests (3, ubuntu-latest, py311)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (1, windows-2022, py313)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: tests (1, macos-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (1, windows-2022, py312)
- GitHub Check: tests (1, ubuntu-latest, py313)
- GitHub Check: tests (1, ubuntu-latest, py311)
🔇 Additional comments (1)
src/snakemake/dag.py (1)
297-301: LGTM: group-aware tempfile filtering behaves as intendedIterating member jobs and passing the GroupJob as context ensures temp files used only within the group are treated as unneeded for upload, matching the PR goal.
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
tests/test_group_temp/qsub (1)
1-6: Quote args and prefer modern subshells (optional hardening)Safer for odd paths; no behavior change.
Apply:
#!/bin/bash +set -euo pipefail -echo `date` >> qsub.log -tail -n1 $1 >> qsub.log +echo "$(date)" >> qsub.log +tail -n1 -- "$1" >> qsub.log # simulate printing of job id by a random number -echo $RANDOM -sh $1 +echo "$RANDOM" +sh "$1"tests/tests.py (1)
2248-2250: Consider adding a storage-backed variant (optional)To cover “do not upload temp files,” add a sibling test guarded by needs_s3 that asserts no storage objects for temp files are created.
tests/test_group_temp/Snakefile (2)
19-27: Optionally consume the declared input to make the dependency explicit in the shellNot required for correctness, but avoids “unused input” patterns in demos.
Apply:
rule b: input: "a.txt" output: "b.txt" group: "group1" shell: - "echo b > {output}" + "cat {input} >/dev/null; echo b > {output}"
37-44: Stricter shell block (optional)Fail fast if any command errors and avoid partial writes.
Apply:
shell: """ + set -euo pipefail if [ -f a.txt ]; then echo "temp a.txt still exists!" exit 1 fi echo c > {output} """
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
src/snakemake/dag.py(2 hunks)tests/test_group_temp/Snakefile(1 hunks)tests/test_group_temp/expected-results/b.txt(1 hunks)tests/test_group_temp/expected-results/c.txt(1 hunks)tests/test_group_temp/expected-results/d.txt(1 hunks)tests/test_group_temp/qsub(1 hunks)tests/tests.py(1 hunks)
✅ Files skipped from review due to trivial changes (3)
- tests/test_group_temp/expected-results/d.txt
- tests/test_group_temp/expected-results/b.txt
- tests/test_group_temp/expected-results/c.txt
🚧 Files skipped from review as they are similar to previous changes (1)
- src/snakemake/dag.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
tests/tests.py
🧠 Learnings (2)
📚 Learning: 2024-11-12T20:22:54.184Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3204
File: tests/test_cores_cluster/qsub:1-6
Timestamp: 2024-11-12T20:22:54.184Z
Learning: In the Snakemake codebase, the `tests/test_cores_cluster/qsub` script is a dummy script for testing, and input validation and error handling are not required in such scripts.
Applied to files:
tests/test_group_temp/qsub
📚 Learning: 2024-10-06T14:09:54.370Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.
Applied to files:
tests/test_group_temp/Snakefile
🧬 Code graph analysis (1)
tests/tests.py (1)
tests/common.py (2)
run(152-503)dpath(33-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (56)
- GitHub Check: tests (10, windows-2022, py312)
- GitHub Check: tests (10, windows-2022, py313)
- GitHub Check: tests (9, windows-2022, py313)
- GitHub Check: tests (10, ubuntu-latest, py313)
- GitHub Check: tests (10, ubuntu-latest, py311)
- GitHub Check: tests (9, ubuntu-latest, py313)
- GitHub Check: tests (9, windows-2022, py312)
- GitHub Check: tests (10, ubuntu-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py313)
- GitHub Check: tests (8, windows-2022, py312)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (9, ubuntu-latest, py311)
- GitHub Check: tests (9, ubuntu-latest, py312)
- GitHub Check: tests (8, windows-2022, py313)
- GitHub Check: tests (8, ubuntu-latest, py311)
- GitHub Check: tests (7, ubuntu-latest, py313)
- GitHub Check: tests (7, windows-2022, py313)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (7, macos-latest, py312)
- GitHub Check: tests (7, windows-2022, py312)
- GitHub Check: tests (7, ubuntu-latest, py312)
- GitHub Check: tests (5, windows-2022, py313)
- GitHub Check: tests (6, windows-2022, py313)
- GitHub Check: tests (6, ubuntu-latest, py313)
- GitHub Check: tests (6, windows-2022, py312)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (5, windows-2022, py312)
- GitHub Check: tests (5, ubuntu-latest, py312)
- GitHub Check: tests (5, ubuntu-latest, py313)
- GitHub Check: tests (4, windows-2022, py313)
- GitHub Check: tests (4, windows-2022, py312)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (4, ubuntu-latest, py313)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (3, ubuntu-latest, py311)
- GitHub Check: tests (3, windows-2022, py312)
- GitHub Check: tests (3, windows-2022, py313)
- GitHub Check: tests (2, macos-latest, py313)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (3, ubuntu-latest, py313)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (2, windows-2022, py313)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (1, windows-2022, py312)
- GitHub Check: tests (2, windows-2022, py312)
- GitHub Check: tests (2, macos-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (1, macos-latest, py313)
- GitHub Check: tests (1, windows-2022, py313)
- GitHub Check: tests (1, ubuntu-latest, py311)
- GitHub Check: tests (1, macos-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py313)
- GitHub Check: apidocs
🔇 Additional comments (2)
tests/test_group_temp/qsub (1)
1-6: Good minimal qsub stub for testsMatches the pattern used elsewhere; fine as a dummy submit script.
tests/test_group_temp/Snakefile (1)
1-7: Scenario reads well; exercises early temp cleanup via group boundariesThe aggregator rule ensures c.txt is realized before writing d.txt. Looks good.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/snakemake/dag.py (1)
943-945: Predicate fix LGTM (matches earlier feedback).
is_other_group_or_no_group()now correctly handles the no‑group case. This resolves the earlier “always False” bug whenoutside_of_group_job is None.
🧹 Nitpick comments (1)
src/snakemake/dag.py (1)
289-301: Fix return type: yields IOFiles, not strings.
get_unneeded_temp_files()yields_IOFileobjects, but the annotation saysIterable[str]. Adjust to avoid downstream typing drift.Apply:
- def get_unneeded_temp_files(self, job: Union[Job, GroupJob]) -> Iterable[str]: + def get_unneeded_temp_files(self, job: Union[Job, GroupJob]) -> Iterable[_IOFile]:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
src/snakemake/dag.py(2 hunks)src/snakemake/spawn_jobs.py(0 hunks)
💤 Files with no reviewable changes (1)
- src/snakemake/spawn_jobs.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/dag.py
🧠 Learnings (2)
📚 Learning: 2024-11-12T12:08:20.342Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
Applied to files:
src/snakemake/dag.py
📚 Learning: 2024-09-19T18:47:50.192Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3096
File: snakemake/dag.py:1943-1943
Timestamp: 2024-09-19T18:47:50.192Z
Learning: In the temp file deletion logic, temp files are only deleted after confirming that no other job (including non-checkpoint jobs) still needs them.
Applied to files:
src/snakemake/dag.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
- GitHub Check: tests (7, windows-2022, py312)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (1, windows-2022, py313)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py311)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (1, ubuntu-latest, py312)
🔇 Additional comments (2)
src/snakemake/dag.py (2)
975-978: Good: explicit notemp early‑exit with debug.Clear message; avoids surprising temp handling when
--notempis set.
991-992: LGTM: deletion filter is correct post‑fix.Using
filterfalse(partial(self.is_needed_tempfile, ...), ...)is appropriate. Once the remote_exec fix lands, behavior will align with the PR’s goals.To be safe, please run the targeted tests that exercise remote/group temp handling and confirm no unintended deletions occur.
Also applies to: 1002-1002
| if self.workflow.remote_exec: | ||
| is_unneeded_outside = ( | ||
| tempfile in self.workflow.storage_settings.unneeded_temp_files | ||
| ) | ||
| or tempfile in self.derived_targetfiles | ||
| else: | ||
| is_unneeded_outside = True | ||
|
|
||
| is_derived_target = tempfile in self.derived_targetfiles | ||
| is_needed_by_subsequent_job = any( | ||
| tempfile in files | ||
| for j, files in self.depending[job].items() | ||
| if not self.finished(j) | ||
| and self.needrun(j) | ||
| and j != job | ||
| and is_other_group_or_no_group(j) | ||
| ) | ||
| logger.debug( | ||
| f"Temp file {tempfile}: {is_unneeded_outside=}, {is_derived_target=}, " | ||
| f"{is_needed_by_subsequent_job=}" | ||
| ) | ||
| return ( | ||
| is_derived_target or is_needed_by_subsequent_job | ||
| ) and is_unneeded_outside | ||
|
|
There was a problem hiding this comment.
Bug: remote_exec logic inverts “needed outside” and can delete required temps.
is_needed_tempfile() currently returns True only when is_unneeded_outside is True, which means files “needed outside” (i.e., not in unneeded_temp_files) are treated as not needed and can be deleted/upload‑skipped incorrectly under remote execution.
Use “keep if derived OR needed by downstream OR needed outside (remote only)”:
- if self.workflow.remote_exec:
- is_unneeded_outside = (
- tempfile in self.workflow.storage_settings.unneeded_temp_files
- )
- else:
- is_unneeded_outside = True
+ if self.workflow.remote_exec:
+ is_unneeded_outside = (
+ tempfile in self.workflow.storage_settings.unneeded_temp_files
+ )
+ else:
+ is_unneeded_outside = True # unused in local path; keep for logging
@@
- return (
- is_derived_target or is_needed_by_subsequent_job
- ) and is_unneeded_outside
+ if self.workflow.remote_exec:
+ # Keep if derived, needed by downstream outside the group, or needed outside the group.
+ return (
+ is_derived_target
+ or is_needed_by_subsequent_job
+ or not is_unneeded_outside
+ )
+ # Local/subprocess path: classic semantics.
+ return is_derived_target or is_needed_by_subsequent_jobPlease add a quick regression test where a temp output of a grouped job is required by a downstream job outside the group in remote mode; it must not be deleted or skipped for upload.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if self.workflow.remote_exec: | |
| is_unneeded_outside = ( | |
| tempfile in self.workflow.storage_settings.unneeded_temp_files | |
| ) | |
| or tempfile in self.derived_targetfiles | |
| else: | |
| is_unneeded_outside = True | |
| is_derived_target = tempfile in self.derived_targetfiles | |
| is_needed_by_subsequent_job = any( | |
| tempfile in files | |
| for j, files in self.depending[job].items() | |
| if not self.finished(j) | |
| and self.needrun(j) | |
| and j != job | |
| and is_other_group_or_no_group(j) | |
| ) | |
| logger.debug( | |
| f"Temp file {tempfile}: {is_unneeded_outside=}, {is_derived_target=}, " | |
| f"{is_needed_by_subsequent_job=}" | |
| ) | |
| return ( | |
| is_derived_target or is_needed_by_subsequent_job | |
| ) and is_unneeded_outside | |
| if self.workflow.remote_exec: | |
| is_unneeded_outside = ( | |
| tempfile in self.workflow.storage_settings.unneeded_temp_files | |
| ) | |
| else: | |
| is_unneeded_outside = True # unused in local path; keep for logging | |
| is_derived_target = tempfile in self.derived_targetfiles | |
| is_needed_by_subsequent_job = any( | |
| tempfile in files | |
| for j, files in self.depending[job].items() | |
| if not self.finished(j) | |
| and self.needrun(j) | |
| and j != job | |
| and is_other_group_or_no_group(j) | |
| ) | |
| logger.debug( | |
| f"Temp file {tempfile}: {is_unneeded_outside=}, {is_derived_target=}, " | |
| f"{is_needed_by_subsequent_job=}" | |
| ) | |
| if self.workflow.remote_exec: | |
| # Keep if derived, needed by downstream outside the group, or needed outside the group. | |
| return ( | |
| is_derived_target | |
| or is_needed_by_subsequent_job | |
| or not is_unneeded_outside | |
| ) | |
| # Local/subprocess path: classic semantics. | |
| return is_derived_target or is_needed_by_subsequent_job |
🤖 Prompt for AI Agents
In src/snakemake/dag.py around lines 948–971, the remote_exec logic inverts
"needed outside": currently the code computes is_unneeded_outside and then
requires it to be True to keep the tempfile, which causes files that are
actually needed outside to be deleted; change the logic to compute
is_needed_outside = (tempfile not in
self.workflow.storage_settings.unneeded_temp_files) when
self.workflow.remote_exec is True (and True when not remote), and then return
(is_derived_target or is_needed_by_subsequent_job or is_needed_outside). Also
add a regression test that runs in remote mode where a temp output from a
grouped job is consumed by a downstream job outside the group and assert the
temp file is not deleted or skipped for upload.
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
…ake into fix/unneeded-temp-files
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (2)
src/snakemake/dag.py (2)
940-945: Fix confirmed: Group-aware dependency predicate correctly handles both cases.The
is_other_group_or_no_grouppredicate now correctly returnsTruewhenoutside_of_group_jobis None (no group context) and when a job is outside the group. This ensures downstream dependency checks work properly in both scenarios.
948-971: Remote execution logic correctly preserves needed temp files.The logic for handling
unneeded_temp_filesin remote execution mode is now correct:
- Files NOT in
unneeded_temp_filesare needed outside and should be preserved- The final return statement properly keeps files that are either derived targets OR needed by downstream jobs when they're unneeded outside
- When files are needed outside (
is_unneeded_outside=False), they're always keptThis prevents incorrect deletion of temp files that are required by downstream jobs in remote execution scenarios.
🧹 Nitpick comments (2)
src/snakemake/dag.py (2)
936-971: Consider documenting the execution-mode behaviors more clearly.The logic handles three different execution modes (
subprocess_exec,remote_exec, and local) with different semantics for temp file retention. While the implementation appears correct, consider adding more detailed docstring documentation about these behaviors to help future maintainers understand the rationale.Apply this improvement to the docstring:
def is_needed_tempfile(self, job, tempfile, outside_of_group_job=None): - """Return whether a temp file is still needed by jobs other than the - given and not part of the eventually given group. + """Return whether a temp file is still needed by jobs other than the + given and not part of the eventually given group. + + Execution mode behaviors: + - subprocess_exec: Always returns True (temp files always needed) + - remote_exec: Considers unneeded_temp_files from storage_settings + - local: Temp files can be deleted if not needed by downstream jobs + + Args: + job: The job producing the temp file + tempfile: The temp file to check + outside_of_group_job: Optional GroupJob context for group-aware checking """
964-967: Consider adjusting log level for diagnostic output.The diagnostic logging at line 964-967 might be too verbose at DEBUG level for production use. Consider using TRACE level or making it conditional on a more verbose debug flag.
- logger.debug( + logger.log(logging.DEBUG - 5, # TRACE level f"Temp file {tempfile}: {is_unneeded_outside=}, {is_derived_target=}, " f"{is_needed_by_subsequent_job=}" )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/snakemake/dag.py(2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/dag.py
🧠 Learnings (2)
📚 Learning: 2024-11-12T12:08:20.342Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
Applied to files:
src/snakemake/dag.py
📚 Learning: 2024-09-19T18:47:50.192Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3096
File: snakemake/dag.py:1943-1943
Timestamp: 2024-09-19T18:47:50.192Z
Learning: In the temp file deletion logic, temp files are only deleted after confirming that no other job (including non-checkpoint jobs) still needs them.
Applied to files:
src/snakemake/dag.py
🧬 Code graph analysis (1)
src/snakemake/dag.py (3)
src/snakemake/jobs.py (7)
Job(183-1310)GroupJob(1328-1771)output(345-346)output(349-350)output(1546-1552)jobs(1367-1368)jobs(1371-1372)src/snakemake/workflow.py (3)
output(2086-2091)subprocess_exec(455-456)remote_exec(451-452)src/snakemake/io/__init__.py (1)
is_flagged(960-963)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (52)
- GitHub Check: tests (8, ubuntu-latest, py311)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py313)
- GitHub Check: tests (9, windows-2022, py312)
- GitHub Check: tests (10, ubuntu-latest, py312)
- GitHub Check: tests (10, windows-2022, py312)
- GitHub Check: tests (10, windows-2022, py313)
- GitHub Check: tests (7, windows-2022, py313)
- GitHub Check: tests (10, ubuntu-latest, py313)
- GitHub Check: tests (9, windows-2022, py313)
- GitHub Check: tests (6, windows-2022, py312)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (8, windows-2022, py313)
- GitHub Check: tests (6, windows-2022, py313)
- GitHub Check: tests (10, ubuntu-latest, py311)
- GitHub Check: tests (7, windows-2022, py312)
- GitHub Check: tests (7, ubuntu-latest, py313)
- GitHub Check: tests (9, ubuntu-latest, py312)
- GitHub Check: tests (5, ubuntu-latest, py313)
- GitHub Check: tests (9, ubuntu-latest, py311)
- GitHub Check: tests (7, ubuntu-latest, py312)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py313)
- GitHub Check: tests (8, windows-2022, py312)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (6, ubuntu-latest, py313)
- GitHub Check: tests (3, windows-2022, py313)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (5, windows-2022, py313)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (5, ubuntu-latest, py312)
- GitHub Check: tests (5, windows-2022, py312)
- GitHub Check: tests (3, ubuntu-latest, py311)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (3, ubuntu-latest, py313)
- GitHub Check: tests (4, windows-2022, py313)
- GitHub Check: tests (3, windows-2022, py312)
- GitHub Check: tests (4, windows-2022, py312)
- GitHub Check: tests (4, ubuntu-latest, py313)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (2, windows-2022, py312)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: tests (2, windows-2022, py313)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (1, windows-2022, py313)
- GitHub Check: tests (1, ubuntu-latest, py311)
- GitHub Check: tests (1, windows-2022, py312)
- GitHub Check: tests (1, ubuntu-latest, py313)
- GitHub Check: formatting
- GitHub Check: apidocs
🔇 Additional comments (3)
src/snakemake/dag.py (3)
289-301: LGTM! Clean implementation of group-aware temp file collection.The refactored
get_unneeded_temp_filesmethod elegantly handles both regular jobs and group jobs through the nestedget_fileshelper. The recursive processing of group members with the appropriate group context is well-structured.
973-983: Enhanced temp file handling with proper group recursion.Good addition of debug logging and proper handling of group jobs through recursion. The early return for
--notempis appropriately placed.
289-301: No changes needed for GroupJob iteration
Verified thatGroupJobinsrc/snakemake/jobs.pydefines an__iter__method (and__len__), sofor j in jobcorrectly iterates its constituent jobs.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
.github/workflows/main.yml (2)
55-56: Add rationale and re-enable TODO for the new Windows/py312 exclusion.Without a note, future maintainers won’t know why py312 is excluded on windows-2022 or when to revisit it.
Apply this diff to document intent:
exclude: - - os: windows-2022 - env: "py312" + # TODO: Revisit once root cause is resolved; temporarily exclude py312 on Windows (windows-2022). + # Rationale: currently unstable/unsupported in CI for this matrix (see linked issue/PR). + - os: windows-2022 + env: "py312"If there’s a tracking issue/PR, please add its number in the comment. Also confirm this exclusion is required only for tests (and not for other jobs).
61-62: Mirror the rationale for macOS/py312 exclusion and add tracking info.Same reasoning as Windows: add a brief comment so we know when to remove this exclude.
- - os: macos-latest - env: "py312" + # TODO: Revisit once fixed; temporarily exclude py312 on macOS (macos-latest). + # Rationale: currently unstable/unsupported in CI for this matrix (see linked issue/PR). + - os: macos-latest + env: "py312"If flakiness (not hard incompatibility) is the reason, consider keep-running coverage via
continue-on-errorin a separate, non-blocking job instead of excluding.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
.github/workflows/main.yml(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
- GitHub Check: tests (10, ubuntu-latest, py313)
- GitHub Check: tests (10, windows-2022, py313)
- GitHub Check: tests (10, ubuntu-latest, py311)
- GitHub Check: tests (7, windows-2022, py313)
- GitHub Check: tests (10, ubuntu-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py313)
- GitHub Check: tests (9, windows-2022, py313)
- GitHub Check: tests (4, ubuntu-latest, py313)
- GitHub Check: tests (9, ubuntu-latest, py312)
- GitHub Check: tests (8, windows-2022, py313)
- GitHub Check: tests (9, ubuntu-latest, py311)
- GitHub Check: tests (6, windows-2022, py313)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (7, ubuntu-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py313)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (7, ubuntu-latest, py313)
- GitHub Check: tests (8, ubuntu-latest, py311)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (6, ubuntu-latest, py313)
- GitHub Check: tests (5, ubuntu-latest, py312)
- GitHub Check: tests (2, windows-2022, py313)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (5, windows-2022, py313)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (5, ubuntu-latest, py313)
- GitHub Check: tests (3, windows-2022, py313)
- GitHub Check: tests (3, ubuntu-latest, py311)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (3, ubuntu-latest, py313)
- GitHub Check: tests (4, windows-2022, py313)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (1, windows-2022, py313)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (1, ubuntu-latest, py313)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py311)
🤖 I have created a release *beep* *boop* --- ## [9.11.4](v9.11.3...v9.11.4) (2025-09-19) ### Bug Fixes * Confusion with Overriding input After Snakemake Modularization ([#3714](#3714)) ([bd94dc1](bd94dc1)) * properly handle temp files in group jobs that are not needed outside of the group (do not upload them to storage and delete them early) ([#3730](#3730)) ([3ffd8e1](3ffd8e1)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
…side of the group (do not upload them to storage and delete them early) (snakemake#3730) ### Description <!--Add a description of your PR here--> ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Temp-file cleanup is now group-aware and recursively handles grouped jobs, respecting group boundaries. * **Bug Fixes** * Prevents removal of temp files still needed by downstream jobs outside a group, avoiding downstream failures. * Improved diagnostic logging for temp-file decisions and group handling. * **Tests** * Added an end-to-end test, expected-result assets, and a helper script to validate group-aware temp-file behavior. * **Chores** * Spawned job invocations no longer force a notemp flag, restoring default temp-file behavior. <!-- end of auto-generated comment: release notes by coderabbit.ai --> --------- Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
🤖 I have created a release *beep* *boop* --- ## [9.11.4](snakemake/snakemake@v9.11.3...v9.11.4) (2025-09-19) ### Bug Fixes * Confusion with Overriding input After Snakemake Modularization ([snakemake#3714](snakemake#3714)) ([bd94dc1](snakemake@bd94dc1)) * properly handle temp files in group jobs that are not needed outside of the group (do not upload them to storage and delete them early) ([snakemake#3730](snakemake#3730)) ([3ffd8e1](snakemake@3ffd8e1)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
Description
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
New Features
Bug Fixes
Tests
Chores