Skip to content

fix: properly handle temp files in group jobs that are not needed outside of the group (do not upload them to storage and delete them early)#3730

Merged
johanneskoester merged 14 commits intomainfrom
fix/unneeded-temp-files
Sep 19, 2025
Merged

fix: properly handle temp files in group jobs that are not needed outside of the group (do not upload them to storage and delete them early)#3730
johanneskoester merged 14 commits intomainfrom
fix/unneeded-temp-files

Conversation

@johanneskoester
Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester commented Sep 8, 2025

Description

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • New Features

    • Temp-file cleanup is now group-aware and recursively handles grouped jobs, respecting group boundaries.
  • Bug Fixes

    • Prevents removal of temp files still needed by downstream jobs outside a group, avoiding downstream failures.
    • Improved diagnostic logging and execution-mode aware decisions for temp-file retention.
  • Tests

    • Added an end-to-end test, expected-result assets, and a helper script to validate group-aware temp-file behavior.
  • Chores

    • Spawned job invocations no longer force a notemp flag, restoring default temp-file behavior.
    • CI matrix excludes select Python 3.12 platform combinations.

…side of the group (do not upload them to storage and delete them early)
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 8, 2025

📝 Walkthrough

Walkthrough

Broadened DAG tempfile handling to be group-aware: get_unneeded_temp_files accepts Job | GroupJob and uses a helper to traverse groups; is_needed_tempfile gained an outside_of_group_job parameter and execution-mode logic; handle_temp recurses group members and removes unneeded temps; tests and fixtures added.

Changes

Cohort / File(s) Summary of Changes
Group-aware tempfile handling in DAG
src/snakemake/dag.py
get_unneeded_temp_files signature changed to accept Union[Job, GroupJob] and delegates to a nested helper that traverses jobs and group members; is_needed_tempfile(job, tempfile, outside_of_group_job=None) extended to account for group boundaries, subprocess_exec/remote_exec modes, storage_settings, and new is_other_group_or_no_group logic; handle_temp now recurses into group jobs, logs diagnostics, computes unneeded temp inputs/outputs (respecting dryrun), and removes files.
New group-temp integration test and fixtures
tests/test_group_temp/Snakefile, tests/test_group_temp/qsub, tests/test_group_temp/expected-results/*, tests/tests.py
Added a new test workflow verifying group-aware temp cleanup, a qsub helper script, expected-result files (b.txt, c.txt, d.txt), and a new test function test_group_temp() integrated into the test suite.
Spawned job argument cleanup
src/snakemake/spawn_jobs.py
Removed the --notemp CLI flag from spawned Snakemake invocations in SpawnedJobArgsFactory.general_args so spawned jobs no longer receive explicit --notemp.
CI matrix exclusions
.github/workflows/main.yml
Added matrix exclusions to skip specific Python 3.12 / OS combinations in formatting and tests jobs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Scheduler
  participant DAG
  participant Job
  participant GroupJob
  participant FS as TempFile

  Scheduler->>DAG: get_unneeded_temp_files(job)
  alt job is GroupJob
    DAG->>GroupJob: iterate members, call helper per member
    GroupJob-->>DAG: tempfile candidates
  else job is Job
    DAG->>Job: collect tempfile candidates
    Job-->>DAG: tempfile candidates
  end

  loop for each tempfile
    DAG->>DAG: is_needed_tempfile(job, tempfile, outside_of_group_job?)
    alt subprocess_exec active
      Note right of DAG: subprocess mode → keep tempfile
      DAG-->>FS: retain
    else remote_exec active
      DAG->>DAG: consult storage_settings.unneeded_temp_files for outside-of-group
      alt outside-of-group marked unneeded
        DAG-->>FS: eligible for removal
      else
        DAG-->>FS: retain
      end
    else normal execution
      alt tempfile is derived target or needed by downstream outside group
        DAG-->>FS: retain
      else
        DAG-->>FS: eligible for removal
      end
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Possibly related PRs

Suggested labels

enhancement

Suggested reviewers

  • cademirch

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The PR description includes the template and the QC checkboxes are marked, but the main "Description" section still contains only the template placeholder and lacks a substantive summary of the changes, rationale, and any user-visible effects, so it is incomplete for reviewers. Please replace the placeholder with a brief, clear summary of what was changed and why, note any user-visible behavior changes, confirm the added test and documentation updates, and optionally link related issues or design notes so reviewers can quickly verify intent and scope.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title is a concise, single sentence that accurately describes the primary change: handling temporary files in group jobs so they are not uploaded to storage and can be deleted early when not needed outside the group; it directly matches the intent shown in the changes and added tests.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch fix/unneeded-temp-files

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
src/snakemake/dag.py (1)

947-947: Avoid assert for runtime invariant (S101); fail safe instead

Asserts can be stripped with -O, risking None dereference later. Guard explicitly and behave conservatively.

-        assert self.workflow.storage_settings is not None
+        if self.workflow.storage_settings is None:
+            # Be conservative: do not delete when storage settings are unavailable.
+            return True
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aea100d and 2834ef7.

📒 Files selected for processing (1)
  • src/snakemake/dag.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/dag.py
🪛 Ruff (0.12.2)
src/snakemake/dag.py

947-947: Use of assert detected

(S101)

🔇 Additional comments (1)
src/snakemake/dag.py (1)

290-296: LGTM: group-aware filtering for temp files

The scoped helper and passing the group context into is_needed_tempfile are the right approach for avoiding uploads of group-internal temps.

Also applies to: 299-302

@github-actions
Copy link
Copy Markdown
Contributor

Please format your code with black: black snakemake tests/*.py.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/snakemake/dag.py (1)

479-487: Ensure unneeded_temp_files is computed/populated before storage upload checks

storage_settings.unneeded_temp_files is only set from the CLI (src/snakemake/cli.py:2005) and defaults to empty (src/snakemake/settings/types.py:239); no runtime writes/updates were found. store_storage_outputs' tostore check reads that set (src/snakemake/dag.py:468–520) and is_needed_tempfile also consults it (src/snakemake/dag.py:946–957). get_unneeded_temp_files is defined separately (src/snakemake/dag.py:289).

Action: either populate storage_settings.unneeded_temp_files with the current job/group’s computed unneeded files before running store_storage_outputs, or change store_storage_outputs to evaluate per-file necessity via get_unneeded_temp_files/is_needed_tempfile (per-job or per-group) instead of relying on a global settings set.

♻️ Duplicate comments (1)
src/snakemake/dag.py (1)

941-945: Previous predicate bug is fixed

Returning True when no group context is provided restores the outside‑group “any(...)” checks in nongroup contexts.

🧹 Nitpick comments (1)
src/snakemake/dag.py (1)

289-296: Return type mismatch and minor naming clarity

  • The function yields _IOFile instances, but the signature says Iterable[str].
  • Rename the helper’s parameter to align with is_needed_tempfile’s argument name for readability.

Apply:

-    def get_unneeded_temp_files(self, job: Union[Job, GroupJob]) -> Iterable[str]:
-        def get_files(job, group_job=None):
+    def get_unneeded_temp_files(self, job: Union[Job, GroupJob]) -> Iterable[_IOFile]:
+        def get_files(job, outside_of_group_job=None):
             for f in job.output:
-                if is_flagged(f, "temp") and not self.is_needed_tempfile(
-                    job, f, outside_of_group_job=group_job
-                ):
+                if is_flagged(f, "temp") and not self.is_needed_tempfile(
+                    job, f, outside_of_group_job=outside_of_group_job
+                ):
                     yield f

         if isinstance(job, GroupJob):
             for j in job:
-                yield from get_files(j, group_job=job)
+                yield from get_files(j, outside_of_group_job=job)
         else:
             yield from get_files(job)

Also applies to: 299-301

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2834ef7 and ba05268.

📒 Files selected for processing (1)
  • src/snakemake/dag.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/dag.py
🧬 Code graph analysis (1)
src/snakemake/dag.py (2)
src/snakemake/jobs.py (7)
  • Job (183-1310)
  • GroupJob (1328-1771)
  • output (345-346)
  • output (349-350)
  • output (1546-1552)
  • jobs (1367-1368)
  • jobs (1371-1372)
src/snakemake/io/__init__.py (1)
  • is_flagged (960-963)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (53)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (10, windows-2022, py312)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (8, windows-2022, py312)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-2022, py312)
  • GitHub Check: tests (9, windows-2022, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (5, windows-2022, py312)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (6, windows-2022, py312)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (4, windows-2022, py312)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (2, macos-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, windows-2022, py312)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (1, macos-latest, py313)
  • GitHub Check: tests (2, windows-2022, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, macos-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, windows-2022, py312)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py311)
🔇 Additional comments (1)
src/snakemake/dag.py (1)

297-301: LGTM: group-aware tempfile filtering behaves as intended

Iterating member jobs and passing the GroupJob as context ensures temp files used only within the group are treated as unneeded for upload, matching the PR goal.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (4)
tests/test_group_temp/qsub (1)

1-6: Quote args and prefer modern subshells (optional hardening)

Safer for odd paths; no behavior change.

Apply:

 #!/bin/bash
+set -euo pipefail
-echo `date` >> qsub.log
-tail -n1 $1 >> qsub.log
+echo "$(date)" >> qsub.log
+tail -n1 -- "$1" >> qsub.log
 # simulate printing of job id by a random number
-echo $RANDOM
-sh $1
+echo "$RANDOM"
+sh "$1"
tests/tests.py (1)

2248-2250: Consider adding a storage-backed variant (optional)

To cover “do not upload temp files,” add a sibling test guarded by needs_s3 that asserts no storage objects for temp files are created.

tests/test_group_temp/Snakefile (2)

19-27: Optionally consume the declared input to make the dependency explicit in the shell

Not required for correctness, but avoids “unused input” patterns in demos.

Apply:

 rule b:
     input:
         "a.txt"
     output:
         "b.txt"
     group:
         "group1"
     shell:
-        "echo b > {output}"
+        "cat {input} >/dev/null; echo b > {output}"

37-44: Stricter shell block (optional)

Fail fast if any command errors and avoid partial writes.

Apply:

     shell:
         """
+        set -euo pipefail
         if [ -f a.txt ]; then
             echo "temp a.txt still exists!"
             exit 1
         fi
         echo c > {output}
         """
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between ba05268 and 2a9501a.

📒 Files selected for processing (7)
  • src/snakemake/dag.py (2 hunks)
  • tests/test_group_temp/Snakefile (1 hunks)
  • tests/test_group_temp/expected-results/b.txt (1 hunks)
  • tests/test_group_temp/expected-results/c.txt (1 hunks)
  • tests/test_group_temp/expected-results/d.txt (1 hunks)
  • tests/test_group_temp/qsub (1 hunks)
  • tests/tests.py (1 hunks)
✅ Files skipped from review due to trivial changes (3)
  • tests/test_group_temp/expected-results/d.txt
  • tests/test_group_temp/expected-results/b.txt
  • tests/test_group_temp/expected-results/c.txt
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/snakemake/dag.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • tests/tests.py
🧠 Learnings (2)
📚 Learning: 2024-11-12T20:22:54.184Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3204
File: tests/test_cores_cluster/qsub:1-6
Timestamp: 2024-11-12T20:22:54.184Z
Learning: In the Snakemake codebase, the `tests/test_cores_cluster/qsub` script is a dummy script for testing, and input validation and error handling are not required in such scripts.

Applied to files:

  • tests/test_group_temp/qsub
📚 Learning: 2024-10-06T14:09:54.370Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.

Applied to files:

  • tests/test_group_temp/Snakefile
🧬 Code graph analysis (1)
tests/tests.py (1)
tests/common.py (2)
  • run (152-503)
  • dpath (33-36)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (56)
  • GitHub Check: tests (10, windows-2022, py312)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (9, windows-2022, py312)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (8, windows-2022, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (7, macos-latest, py312)
  • GitHub Check: tests (7, windows-2022, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (6, windows-2022, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (5, windows-2022, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (4, windows-2022, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (3, windows-2022, py312)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (2, macos-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (1, windows-2022, py312)
  • GitHub Check: tests (2, windows-2022, py312)
  • GitHub Check: tests (2, macos-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, macos-latest, py313)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (1, macos-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: apidocs
🔇 Additional comments (2)
tests/test_group_temp/qsub (1)

1-6: Good minimal qsub stub for tests

Matches the pattern used elsewhere; fine as a dummy submit script.

tests/test_group_temp/Snakefile (1)

1-7: Scenario reads well; exercises early temp cleanup via group boundaries

The aggregator rule ensures c.txt is realized before writing d.txt. Looks good.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
src/snakemake/dag.py (1)

943-945: Predicate fix LGTM (matches earlier feedback).

is_other_group_or_no_group() now correctly handles the no‑group case. This resolves the earlier “always False” bug when outside_of_group_job is None.

🧹 Nitpick comments (1)
src/snakemake/dag.py (1)

289-301: Fix return type: yields IOFiles, not strings.

get_unneeded_temp_files() yields _IOFile objects, but the annotation says Iterable[str]. Adjust to avoid downstream typing drift.

Apply:

-    def get_unneeded_temp_files(self, job: Union[Job, GroupJob]) -> Iterable[str]:
+    def get_unneeded_temp_files(self, job: Union[Job, GroupJob]) -> Iterable[_IOFile]:
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 14429d5 and a598c8b.

📒 Files selected for processing (2)
  • src/snakemake/dag.py (2 hunks)
  • src/snakemake/spawn_jobs.py (0 hunks)
💤 Files with no reviewable changes (1)
  • src/snakemake/spawn_jobs.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/dag.py
🧠 Learnings (2)
📚 Learning: 2024-11-12T12:08:20.342Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

Applied to files:

  • src/snakemake/dag.py
📚 Learning: 2024-09-19T18:47:50.192Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3096
File: snakemake/dag.py:1943-1943
Timestamp: 2024-09-19T18:47:50.192Z
Learning: In the temp file deletion logic, temp files are only deleted after confirming that no other job (including non-checkpoint jobs) still needs them.

Applied to files:

  • src/snakemake/dag.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: tests (7, windows-2022, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py312)
🔇 Additional comments (2)
src/snakemake/dag.py (2)

975-978: Good: explicit notemp early‑exit with debug.

Clear message; avoids surprising temp handling when --notemp is set.


991-992: LGTM: deletion filter is correct post‑fix.

Using filterfalse(partial(self.is_needed_tempfile, ...), ...) is appropriate. Once the remote_exec fix lands, behavior will align with the PR’s goals.

To be safe, please run the targeted tests that exercise remote/group temp handling and confirm no unintended deletions occur.

Also applies to: 1002-1002

Comment on lines +948 to 971
if self.workflow.remote_exec:
is_unneeded_outside = (
tempfile in self.workflow.storage_settings.unneeded_temp_files
)
or tempfile in self.derived_targetfiles
else:
is_unneeded_outside = True

is_derived_target = tempfile in self.derived_targetfiles
is_needed_by_subsequent_job = any(
tempfile in files
for j, files in self.depending[job].items()
if not self.finished(j)
and self.needrun(j)
and j != job
and is_other_group_or_no_group(j)
)
logger.debug(
f"Temp file {tempfile}: {is_unneeded_outside=}, {is_derived_target=}, "
f"{is_needed_by_subsequent_job=}"
)
return (
is_derived_target or is_needed_by_subsequent_job
) and is_unneeded_outside

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Bug: remote_exec logic inverts “needed outside” and can delete required temps.

is_needed_tempfile() currently returns True only when is_unneeded_outside is True, which means files “needed outside” (i.e., not in unneeded_temp_files) are treated as not needed and can be deleted/upload‑skipped incorrectly under remote execution.

Use “keep if derived OR needed by downstream OR needed outside (remote only)”:

-        if self.workflow.remote_exec:
-            is_unneeded_outside = (
-                tempfile in self.workflow.storage_settings.unneeded_temp_files
-            )
-        else:
-            is_unneeded_outside = True
+        if self.workflow.remote_exec:
+            is_unneeded_outside = (
+                tempfile in self.workflow.storage_settings.unneeded_temp_files
+            )
+        else:
+            is_unneeded_outside = True  # unused in local path; keep for logging

@@
-        return (
-            is_derived_target or is_needed_by_subsequent_job
-        ) and is_unneeded_outside
+        if self.workflow.remote_exec:
+            # Keep if derived, needed by downstream outside the group, or needed outside the group.
+            return (
+                is_derived_target
+                or is_needed_by_subsequent_job
+                or not is_unneeded_outside
+            )
+        # Local/subprocess path: classic semantics.
+        return is_derived_target or is_needed_by_subsequent_job

Please add a quick regression test where a temp output of a grouped job is required by a downstream job outside the group in remote mode; it must not be deleted or skipped for upload.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if self.workflow.remote_exec:
is_unneeded_outside = (
tempfile in self.workflow.storage_settings.unneeded_temp_files
)
or tempfile in self.derived_targetfiles
else:
is_unneeded_outside = True
is_derived_target = tempfile in self.derived_targetfiles
is_needed_by_subsequent_job = any(
tempfile in files
for j, files in self.depending[job].items()
if not self.finished(j)
and self.needrun(j)
and j != job
and is_other_group_or_no_group(j)
)
logger.debug(
f"Temp file {tempfile}: {is_unneeded_outside=}, {is_derived_target=}, "
f"{is_needed_by_subsequent_job=}"
)
return (
is_derived_target or is_needed_by_subsequent_job
) and is_unneeded_outside
if self.workflow.remote_exec:
is_unneeded_outside = (
tempfile in self.workflow.storage_settings.unneeded_temp_files
)
else:
is_unneeded_outside = True # unused in local path; keep for logging
is_derived_target = tempfile in self.derived_targetfiles
is_needed_by_subsequent_job = any(
tempfile in files
for j, files in self.depending[job].items()
if not self.finished(j)
and self.needrun(j)
and j != job
and is_other_group_or_no_group(j)
)
logger.debug(
f"Temp file {tempfile}: {is_unneeded_outside=}, {is_derived_target=}, "
f"{is_needed_by_subsequent_job=}"
)
if self.workflow.remote_exec:
# Keep if derived, needed by downstream outside the group, or needed outside the group.
return (
is_derived_target
or is_needed_by_subsequent_job
or not is_unneeded_outside
)
# Local/subprocess path: classic semantics.
return is_derived_target or is_needed_by_subsequent_job
🤖 Prompt for AI Agents
In src/snakemake/dag.py around lines 948–971, the remote_exec logic inverts
"needed outside": currently the code computes is_unneeded_outside and then
requires it to be True to keep the tempfile, which causes files that are
actually needed outside to be deleted; change the logic to compute
is_needed_outside = (tempfile not in
self.workflow.storage_settings.unneeded_temp_files) when
self.workflow.remote_exec is True (and True when not remote), and then return
(is_derived_target or is_needed_by_subsequent_job or is_needed_outside). Also
add a regression test that runs in remote mode where a temp output from a
grouped job is consumed by a downstream job outside the group and assert the
temp file is not deleted or skipped for upload.

johanneskoester and others added 5 commits September 18, 2025 17:18
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (2)
src/snakemake/dag.py (2)

940-945: Fix confirmed: Group-aware dependency predicate correctly handles both cases.

The is_other_group_or_no_group predicate now correctly returns True when outside_of_group_job is None (no group context) and when a job is outside the group. This ensures downstream dependency checks work properly in both scenarios.


948-971: Remote execution logic correctly preserves needed temp files.

The logic for handling unneeded_temp_files in remote execution mode is now correct:

  • Files NOT in unneeded_temp_files are needed outside and should be preserved
  • The final return statement properly keeps files that are either derived targets OR needed by downstream jobs when they're unneeded outside
  • When files are needed outside (is_unneeded_outside=False), they're always kept

This prevents incorrect deletion of temp files that are required by downstream jobs in remote execution scenarios.

🧹 Nitpick comments (2)
src/snakemake/dag.py (2)

936-971: Consider documenting the execution-mode behaviors more clearly.

The logic handles three different execution modes (subprocess_exec, remote_exec, and local) with different semantics for temp file retention. While the implementation appears correct, consider adding more detailed docstring documentation about these behaviors to help future maintainers understand the rationale.

Apply this improvement to the docstring:

 def is_needed_tempfile(self, job, tempfile, outside_of_group_job=None):
-    """Return whether a temp file is still needed by jobs other than the
-    given and not part of the eventually given group.
+    """Return whether a temp file is still needed by jobs other than the
+    given and not part of the eventually given group.
+    
+    Execution mode behaviors:
+    - subprocess_exec: Always returns True (temp files always needed)
+    - remote_exec: Considers unneeded_temp_files from storage_settings
+    - local: Temp files can be deleted if not needed by downstream jobs
+    
+    Args:
+        job: The job producing the temp file
+        tempfile: The temp file to check
+        outside_of_group_job: Optional GroupJob context for group-aware checking
     """

964-967: Consider adjusting log level for diagnostic output.

The diagnostic logging at line 964-967 might be too verbose at DEBUG level for production use. Consider using TRACE level or making it conditional on a more verbose debug flag.

-        logger.debug(
+        logger.log(logging.DEBUG - 5,  # TRACE level
             f"Temp file {tempfile}: {is_unneeded_outside=}, {is_derived_target=}, "
             f"{is_needed_by_subsequent_job=}"
         )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6910079 and 28e959b.

📒 Files selected for processing (1)
  • src/snakemake/dag.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/dag.py
🧠 Learnings (2)
📚 Learning: 2024-11-12T12:08:20.342Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

Applied to files:

  • src/snakemake/dag.py
📚 Learning: 2024-09-19T18:47:50.192Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3096
File: snakemake/dag.py:1943-1943
Timestamp: 2024-09-19T18:47:50.192Z
Learning: In the temp file deletion logic, temp files are only deleted after confirming that no other job (including non-checkpoint jobs) still needs them.

Applied to files:

  • src/snakemake/dag.py
🧬 Code graph analysis (1)
src/snakemake/dag.py (3)
src/snakemake/jobs.py (7)
  • Job (183-1310)
  • GroupJob (1328-1771)
  • output (345-346)
  • output (349-350)
  • output (1546-1552)
  • jobs (1367-1368)
  • jobs (1371-1372)
src/snakemake/workflow.py (3)
  • output (2086-2091)
  • subprocess_exec (455-456)
  • remote_exec (451-452)
src/snakemake/io/__init__.py (1)
  • is_flagged (960-963)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (52)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (9, windows-2022, py312)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (10, windows-2022, py312)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (6, windows-2022, py312)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-2022, py312)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (8, windows-2022, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, windows-2022, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (3, windows-2022, py312)
  • GitHub Check: tests (4, windows-2022, py312)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (2, windows-2022, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: tests (1, windows-2022, py312)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: formatting
  • GitHub Check: apidocs
🔇 Additional comments (3)
src/snakemake/dag.py (3)

289-301: LGTM! Clean implementation of group-aware temp file collection.

The refactored get_unneeded_temp_files method elegantly handles both regular jobs and group jobs through the nested get_files helper. The recursive processing of group members with the appropriate group context is well-structured.


973-983: Enhanced temp file handling with proper group recursion.

Good addition of debug logging and proper handling of group jobs through recursion. The early return for --notemp is appropriately placed.


289-301: No changes needed for GroupJob iteration
Verified that GroupJob in src/snakemake/jobs.py defines an __iter__ method (and __len__), so for j in job correctly iterates its constituent jobs.

@johanneskoester johanneskoester merged commit 3ffd8e1 into main Sep 19, 2025
55 of 57 checks passed
@johanneskoester johanneskoester deleted the fix/unneeded-temp-files branch September 19, 2025 06:01
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
.github/workflows/main.yml (2)

55-56: Add rationale and re-enable TODO for the new Windows/py312 exclusion.

Without a note, future maintainers won’t know why py312 is excluded on windows-2022 or when to revisit it.

Apply this diff to document intent:

         exclude:
-          - os: windows-2022
-            env: "py312"
+          # TODO: Revisit once root cause is resolved; temporarily exclude py312 on Windows (windows-2022).
+          # Rationale: currently unstable/unsupported in CI for this matrix (see linked issue/PR).
+          - os: windows-2022
+            env: "py312"

If there’s a tracking issue/PR, please add its number in the comment. Also confirm this exclusion is required only for tests (and not for other jobs).


61-62: Mirror the rationale for macOS/py312 exclusion and add tracking info.

Same reasoning as Windows: add a brief comment so we know when to remove this exclude.

-          - os: macos-latest
-            env: "py312"
+          # TODO: Revisit once fixed; temporarily exclude py312 on macOS (macos-latest).
+          # Rationale: currently unstable/unsupported in CI for this matrix (see linked issue/PR).
+          - os: macos-latest
+            env: "py312"

If flakiness (not hard incompatibility) is the reason, consider keep-running coverage via continue-on-error in a separate, non-blocking job instead of excluding.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 18a78f0 and 91e1d2c.

📒 Files selected for processing (1)
  • .github/workflows/main.yml (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
  • GitHub Check: tests (10, ubuntu-latest, py313)
  • GitHub Check: tests (10, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-2022, py313)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py313)
  • GitHub Check: tests (9, windows-2022, py313)
  • GitHub Check: tests (4, ubuntu-latest, py313)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (8, windows-2022, py313)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (6, windows-2022, py313)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py313)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (7, ubuntu-latest, py313)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (6, ubuntu-latest, py313)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (2, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (5, windows-2022, py313)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py313)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (5, ubuntu-latest, py313)
  • GitHub Check: tests (3, windows-2022, py313)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, ubuntu-latest, py313)
  • GitHub Check: tests (4, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (1, windows-2022, py313)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, ubuntu-latest, py313)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)

johanneskoester pushed a commit that referenced this pull request Sep 19, 2025
🤖 I have created a release *beep* *boop*
---


##
[9.11.4](v9.11.3...v9.11.4)
(2025-09-19)


### Bug Fixes

* Confusion with Overriding input After Snakemake Modularization
([#3714](#3714))
([bd94dc1](bd94dc1))
* properly handle temp files in group jobs that are not needed outside
of the group (do not upload them to storage and delete them early)
([#3730](#3730))
([3ffd8e1](3ffd8e1))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
…side of the group (do not upload them to storage and delete them early) (snakemake#3730)

### Description

<!--Add a description of your PR here-->

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **New Features**
* Temp-file cleanup is now group-aware and recursively handles grouped
jobs, respecting group boundaries.

* **Bug Fixes**
* Prevents removal of temp files still needed by downstream jobs outside
a group, avoiding downstream failures.
* Improved diagnostic logging for temp-file decisions and group
handling.

* **Tests**
* Added an end-to-end test, expected-result assets, and a helper script
to validate group-aware temp-file behavior.

* **Chores**
* Spawned job invocations no longer force a notemp flag, restoring
default temp-file behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
🤖 I have created a release *beep* *boop*
---


##
[9.11.4](snakemake/snakemake@v9.11.3...v9.11.4)
(2025-09-19)


### Bug Fixes

* Confusion with Overriding input After Snakemake Modularization
([snakemake#3714](snakemake#3714))
([bd94dc1](snakemake@bd94dc1))
* properly handle temp files in group jobs that are not needed outside
of the group (do not upload them to storage and delete them early)
([snakemake#3730](snakemake#3730))
([3ffd8e1](snakemake@3ffd8e1))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant