fix: avoid checking output files in immediate-submit mode by landerlini · Pull Request #3569 · snakemake/snakemake

landerlini · 2025-05-12T17:22:15Z

Disable checking output files when snakemake is run with flag --immediate-submit.
Notes:

The flag --notemp should be additionally indicated (the CLI suggests for it)
The flag --not-retrieve-storage should be indicated to avoid snakemake to try retrieving remote data that may not yet exist
localrules break the DAG because their dependency pattern can not be handled by cluster logics. A warning is printed and the scheduler stops processing jobs requiring their output.

QC

The PR contains a test case for the changes or the changes are already covered by an existing test case.
The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

New Features
- Improved immediate-submit behavior for remote execution, including skipping local rules with a warning and not waiting for missing outputs when appropriate.
- More consistent postprocessing across executors when jobs are submitted immediately.
Bug Fixes
- Prevent premature waiting/retrieval of outputs when using immediate submit, reducing failures on non-shared filesystems.
Documentation
- Expanded help for --immediate-submit with guidance on skipping localrules, using --notemp, and often --not-retrieve-storage.
Tests
- Added end-to-end test simulating cluster execution without a shared filesystem, including expected outputs and a submission runner.

coderabbitai · 2025-05-12T17:22:22Z

Warning

Rate limit exceeded

@johanneskoester has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 20 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 986ffbb and a65a0b8.

📒 Files selected for processing (3)

src/snakemake/jobs.py (1 hunks)
src/snakemake/scheduling/job_scheduler.py (4 hunks)
tests/tests.py (1 hunks)

📝 Walkthrough

Walkthrough

Implements immediate-submit semantics across scheduling and postprocessing, adds a CLI help clarification, and introduces a non-shared-filesystem integration test using a custom cluster submission/execution script. The scheduler now skips local rules under immediate-submit and propagates ignore-missing-output behavior into job postprocessing. Tests verify end-to-end behavior.

Changes

Cohort / File(s)	Summary
Immediate-submit handling and postprocessing `src/snakemake/jobs.py`, `src/snakemake/scheduling/job_scheduler.py`	jobs.py: tighten an `elif` in `Job.postprocess` with `and not ignore_missing_output`. job_scheduler.py: explicit skip of local rules when `--immediate-submit`; propagate `immediate_submit` into postprocessing across execution modes (DEFAULT/SUBPROCESS/remote), setting `ignore_missing_output` accordingly and passing it on error paths.
CLI help text `src/snakemake/cli.py`	Expand `--immediate-submit` help: warns localrules are skipped, recommends `--notemp` and often `--not-retrieve-storage`.
New test suite: immediate submit without shared FS `tests/test_immediate_submit_without_shared_fs/Snakefile`, `tests/test_immediate_submit_without_shared_fs/clustersubmit`, `tests/test_immediate_submit_without_shared_fs/expected-results/report.yaml`, `tests/tests.py`	Add workflow (produce/count/sum/all), expected `report.yaml`, a cluster submission/execution script using SQLite to manage dependencies and run jobs, and a test invoking Snakemake with cluster-generic executor, immediate-submit, `--notemp`, then executing queued jobs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as CLI
  participant Scheduler as JobScheduler
  participant Exec as Executor
  participant Job as Job
  participant FS as Storage/FS

  User->>CLI: snakemake --immediate-submit [--notemp ...]
  CLI->>Scheduler: parse workflow, flags
  Scheduler->>Scheduler: plan DAG
  alt immediate-submit
    note over Scheduler: Skip local rules (warn)
  end
  loop For each runnable job
    Scheduler->>Exec: submit job (may be remote/subprocess)
  end
  Exec->>Job: run
  Job->>FS: produce outputs (async)
  Job-->>Scheduler: done (status)
  Scheduler->>Job: postprocess(ignore_missing_output = immediate_submit or touch)
  alt ignore_missing_output = true
    note over Job: Do not wait for outputs
  else
    Job->>FS: wait for outputs/benchmark
  end

sequenceDiagram
  autonumber
  participant Snk as Snakemake (cluster-generic)
  participant Sub as clustersubmit (submit)
  participant DB as SQLite DB
  participant Exe as clustersubmit (--execute)
  participant Sh as Shell

  Snk->>Sub: submit jobscript [--dependencies ...]
  Sub->>DB: insert job + edges
  loop execute cycle
    Exe->>DB: fetch runnable jobs
    DB-->>Exe: job(s) with no unfinished parents
    par async job execution
      Exe->>Sh: run jobscript
      Sh-->>Exe: exit code
      Exe->>DB: mark job done
    and possibly others
    end
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix: rerun jobs if previously failed but rule was changed afterwards (thanks to @laf070810 for bringing this up) #3237 — Also alters Job.postprocess condition handling around errors/ignore_missing_output, touching the same function.
fix: fix bug with --edit-notebook sessions causing output files marked as incomplete, fix bug leading to missing log file after edit notebook sessions #3162 — Modifies Job.postprocess semantics related to ignore_missing_output; overlaps with this PR’s adjustments.

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

landerlini · 2025-05-12T17:25:21Z

See also:

snakemake exit with MissingOutputException when using --immediate-submit #2992
snakemake exit with MissingOutputException when using --immediate-submit snakemake-executor-plugin-slurm#133
Immediate submit with cluster generic doesn't work with v8/v9 of snakemake snakemake-executor-plugin-cluster-generic#21

\cc @cmeesters

landerlini · 2025-05-15T17:12:48Z

I modified the scheduler so that local rules break the DAG without raising an error for missing output file, but logging a warning.

landerlini · 2025-06-17T16:20:02Z

@cmeesters with my apologies for the long delay, I have finally implemented a test (test_immediate_submit_without_shared_fs) failing on the main branch and succeeding with this fix.
I have implemented a script mocking the execution of the DAG as controlled by some external engine, after the submission. The scheduler is simplistic, but sufficient to reproduce the bug on checking for output files before submitting jobs.

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (7)

tests/test_immediate_submit_without_shared_fs/clustersubmit (2)
68-77: Hard-coded 100 iteration cap can leave jobs unprocessed

The loop stops after 100 passes even if new runnable jobs keep appearing (large DAGs).
Use a while True: loop with an exit condition instead.
-    for _ in range(100):
+    while True:
@@
-        else:
-            logging.info("Could not find any other job to execute.")
-            return 
+        if not result:
+            logging.info("No runnable jobs left – exiting.")
+            return
104-106: Dependency parsing discards empty strings – cleaner expression possible

Current list-comp is harder to read; using split() without an argument already removes duplicates of whitespace.
-dependencies = [s for s in args.dependencies.split(" ") if s not in [' ', '', None]]
+dependencies = args.dependencies.split()
tests/tests.py (2)
2476-2477: Drop the stray blank lines

These two empty lines introduce an unnecessary diff hunk and break the otherwise tight grouping of tests.
Feel free to remove them – it keeps the history cleaner.

2478-2489: Inline-concatenated shell command is hard to read & easy to mis-escape

The multi-line literal relies on implicit string concatenation and heavy escaping for the embedded quotes. This is easy to break the next time someone touches it.

Consider building the command as a single triple-quoted string (or "\n".join([...])) so that quoting is obvious and you only escape once. Example:
-        shellcmd="snakemake "
-        "--executor cluster-generic "
-        "--cluster-generic-submit-cmd \"./clustersubmit --dependencies \\\"{dependencies}\\\"\" "
-        "--forceall "
-        "--immediate-submit "
-        "--notemp "
-        "--jobs 10 "
-        "; ./clustersubmit --execute"
+        shellcmd=(
+            "snakemake "
+            "--executor cluster-generic "
+            "--cluster-generic-submit-cmd \"./clustersubmit --dependencies \\\"{dependencies}\\\"\" "
+            "--forceall "
+            "--immediate-submit "
+            "--notemp "
+            "--jobs 10 "
+            "; ./clustersubmit --execute"
+        )
This keeps the final string identical while being far less brittle.
tests/test_immediate_submit_without_shared_fs/Snakefile (3)
10-11: Prefer printf over echo for portable, predictable output

echo interpretation (e.g. -n, backslash escapes) varies between shells.
printf is portable and lets you control the trailing newline explicitly:
-    shell: "echo {wildcards.fname} > {output.txt}"
+    shell: "printf '%s\n' {wildcards.fname} > {output.txt}"
18-19: Minor: avoid UUOC (“useless use of cat”)

You can pass the file directly to wc and drop the extra process:
-    shell: "cat {input.txt} | wc -m > {output.dat}"
+    shell: "wc -m < {input.txt} > {output.dat}"
This slightly speeds up the rule and is a common shell best practice.

29-33: Quote loop variable to survive spaces in file names

If a file path ever contains spaces the current loop breaks.
Quoting fixes that and removes a useless subshell:
-        for i in {input.counts}; do
-            SUM=$(($SUM + $(cat $i)));
+        for i in {input.counts}; do
+            SUM=$((SUM + $(cat "$i")))
         done;
Not critical for the current test files, but makes the rule future-proof.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d3bb5e and 7cdeae8.

📒 Files selected for processing (7)

src/snakemake/cli.py (1 hunks)
src/snakemake/jobs.py (1 hunks)
src/snakemake/scheduler.py (4 hunks)
tests/test_immediate_submit_without_shared_fs/Snakefile (1 hunks)
tests/test_immediate_submit_without_shared_fs/clustersubmit (1 hunks)
tests/test_immediate_submit_without_shared_fs/expected-results/report.yaml (1 hunks)
tests/tests.py (1 hunks)

✅ Files skipped from review due to trivial changes (1)

tests/test_immediate_submit_without_shared_fs/expected-results/report.yaml

🚧 Files skipped from review as they are similar to previous changes (2)

src/snakemake/scheduler.py
src/snakemake/jobs.py

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

src/snakemake/cli.py
tests/tests.py

🧬 Code Graph Analysis (1)

tests/tests.py (1)

tests/common.py (2)

run (151-495)

dpath (32-35)

🔇 Additional comments (1)

src/snakemake/cli.py (1)

1589-1593: Help text may be outdated w.r.t. new localrule handling

The message still says “localrules will fail”, but the scheduler change in this PR now skips localrules with a warning rather than failing.
Please align the help text with the current behaviour to avoid user confusion.

tests/test_immediate_submit_without_shared_fs/clustersubmit

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (3)

tests/test_immediate_submit_without_shared_fs/clustersubmit (3)

101-104: Risk of duplicate job names with current generation strategy.

The random job name generation could create duplicates, leading to SQLite constraint violations if a PRIMARY KEY is added (as suggested above).

Consider using a more robust job ID generation:

-job_name = ''.join(
-    [properties['rule'], '-'] + 
-    [random.choice(string.ascii_lowercase) for _ in range(5)]
-)
+import time
+job_name = f"{properties['rule']}-{int(time.time())}-{''.join(random.choices(string.ascii_lowercase, k=5))}"

70-79: Arbitrary iteration limit may cause premature termination.

The hardcoded limit of 100 iterations could prevent all jobs from completing in complex DAGs. Consider using a more flexible approach.

-    for _ in range(100):
+    max_iterations = 1000  # or make configurable
+    iteration = 0
+    consecutive_empty_rounds = 0
+    
+    while iteration < max_iterations and consecutive_empty_rounds < 3:
+        iteration += 1
         with sqlite3.connect(DATABASE_FILE) as db:
             result = db.execute(QUERY_RUNNABLE).fetchall()
         if len(result):
+            consecutive_empty_rounds = 0
             async with asyncio.TaskGroup() as tg:
                 for row in result:
                     tg.create_task(execute_single_job(*row))
         else:
+            consecutive_empty_rounds += 1
-            logging.info("Could not find any other job to execute.")
-            return 
+            if consecutive_empty_rounds >= 3:
+                logging.info("No runnable jobs found after 3 attempts. Stopping.")
+                return
+            await asyncio.sleep(1)  # Brief pause before retrying

12-14: Hardcoded paths may cause issues in different environments.

Using hardcoded /tmp/ paths may not work reliably across all systems or when running multiple tests concurrently.

Consider using environment variables or temporary directories:

+import tempfile
+
 ## Configuration 
-DATABASE_FILE = "/tmp/clustersubmit.sqlite3"
-LOG_FILE = "/tmp/clustersubmit.log"
+DATABASE_FILE = os.environ.get("CLUSTER_DB_FILE", os.path.join(tempfile.gettempdir(), "clustersubmit.sqlite3"))
+LOG_FILE = os.environ.get("CLUSTER_LOG_FILE", os.path.join(tempfile.gettempdir(), "clustersubmit.log"))

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 941003c and e59e02f.

📒 Files selected for processing (1)

tests/test_immediate_submit_without_shared_fs/clustersubmit (1 hunks)

tests/test_immediate_submit_without_shared_fs/clustersubmit

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (4)

tests/test_immediate_submit_without_shared_fs/clustersubmit (4)
84-84: Fix typo in error message.

The error message contains a typo: "Falied" should be "Failed".
-        raise RuntimeError(f"Falied executing job {jobid}")
+        raise RuntimeError(f"Failed executing job {jobid}")
82-84: Consider sanitizing logged output to prevent information disclosure.

Logging stdout/stderr at CRITICAL level could expose sensitive information or create excessively large log files. Consider truncating or sanitizing the output.
-        logging.critical(stdout)
-        logging.critical(stderr)
+        logging.error(f"Job {jobid} failed with return code {proc.returncode}")
+        if stdout:
+            logging.debug(f"Job {jobid} stdout: {stdout.decode()[:500]}...")  # Truncate long output
+        if stderr:
+            logging.error(f"Job {jobid} stderr: {stderr.decode()[:500]}...")
133-136: Improve job name uniqueness for concurrent submissions.

Using time.time() alone could create naming conflicts if multiple jobs are submitted simultaneously. Consider using uuid or more granular timing.
+import uuid
+
-job_name = ''.join(
-    [properties['rule'], '-', str(time.time()), '-'] + 
-    [random.choice(string.ascii_lowercase) for _ in range(5)]
-)
+job_name = f"{properties['rule']}-{uuid.uuid4().hex[:8]}"
66-67: Optimize database cleanup frequency.

Database cleanup runs on every script invocation, which could be inefficient for frequent submissions. Consider cleanup only when executing jobs or periodically.
-    db.execute("DELETE FROM jobs WHERE created_at <= datetime('now', '-1 hour')")
-    db.execute("DELETE FROM edges WHERE created_at <= datetime('now', '-1 hour')")
+    # Only cleanup during execution mode to reduce overhead
+    if args.execute:
+        db.execute("DELETE FROM jobs WHERE created_at <= datetime('now', '-1 hour')")
+        db.execute("DELETE FROM edges WHERE created_at <= datetime('now', '-1 hour')")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e59e02f and 099e8d5.

📒 Files selected for processing (4)

src/snakemake/cli.py (1 hunks)
tests/test_immediate_submit_without_shared_fs/Snakefile (1 hunks)
tests/test_immediate_submit_without_shared_fs/clustersubmit (1 hunks)
tests/tests.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (3)

src/snakemake/cli.py
tests/tests.py
tests/test_immediate_submit_without_shared_fs/Snakefile

🔇 Additional comments (3)

tests/test_immediate_submit_without_shared_fs/clustersubmit (3)

46-64: Database schema looks well-designed with proper constraints.

The SQLite schema now includes proper primary keys, foreign key constraints, and appropriate column types. The WAL mode configuration addresses concurrency concerns effectively.

74-91: Excellent concurrency handling and error management.

The async job execution function properly handles subprocess errors, uses database locking to prevent race conditions, and includes appropriate logging. The WAL mode and timeout settings address the concurrency issues from previous reviews.

122-128: Efficient single-pass file reading implementation.

The job script is now read once and reused for both property extraction and database storage, eliminating the redundant file operations from previous versions.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

… DB lock Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

tests/test_immediate_submit_without_shared_fs/clustersubmit (2)
88-88: Fix typo in error message.

"Faied" should be "Failed".
-        raise RuntimeError(f"Faied executing job {jobid}")
+        raise RuntimeError(f"Failed executing job {jobid}")
162-162: Remove stray line number.

The standalone "162" appears to be a formatting artifact and should be removed.
-162

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6230905 and b638c98.

📒 Files selected for processing (7)

src/snakemake/cli.py (1 hunks)
src/snakemake/jobs.py (1 hunks)
src/snakemake/scheduler.py (4 hunks)
tests/test_immediate_submit_without_shared_fs/Snakefile (1 hunks)
tests/test_immediate_submit_without_shared_fs/clustersubmit (1 hunks)
tests/test_immediate_submit_without_shared_fs/expected-results/report.yaml (1 hunks)
tests/tests.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (6)

tests/test_immediate_submit_without_shared_fs/expected-results/report.yaml
src/snakemake/jobs.py
src/snakemake/cli.py
tests/tests.py
tests/test_immediate_submit_without_shared_fs/Snakefile
src/snakemake/scheduler.py

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.

Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-08-13T11:05:23.821Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.

Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.

Learnt from: leoschwarz
PR: snakemake/snakemake#3176
File: tests/test_output_index.py:99-157
Timestamp: 2025-01-17T12:00:09.368Z
Learning: New test dependencies for Snakemake should be introduced in separate PRs rather than being added as part of feature or refactoring PRs.

Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1250-1251
Timestamp: 2024-08-13T16:22:58.430Z
Learning: Avoid commenting on trailing commas in code reviews for the Snakemake repository as per the user's preference.

Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1250-1251
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Avoid commenting on trailing commas in code reviews for the Snakemake repository as per the user's preference.

Learnt from: johanneskoester
PR: snakemake/snakemake#3204
File: tests/test_cores_cluster/qsub:1-6
Timestamp: 2024-11-12T20:22:54.184Z
Learning: In the Snakemake codebase, the `tests/test_cores_cluster/qsub` script is a dummy script for testing, and input validation and error handling are not required in such scripts.

Learnt from: lczech
PR: snakemake/snakemake#3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In `snakemake/scheduler.py`, avoid suggesting the use of `asyncio.gather` in the `jobs_rewards` method due to overhead concerns and the need for immediate results.

Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.

Learnt from: johanneskoester
PR: snakemake/snakemake#3600
File: src/snakemake/jobs.py:960-964
Timestamp: 2025-05-23T09:40:24.474Z
Learning: In the `cleanup` method of the `Job` class in `src/snakemake/jobs.py`, files in the `to_remove` list should be formatted with `fmt_iofile` without specifying `as_output=True` or `as_input=True` parameters, as these files should be displayed as generic files rather than specifically as output files.

Learnt from: johanneskoester
PR: snakemake/snakemake#3096
File: snakemake/dag.py:1943-1943
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In the temp file deletion logic, temp files are only deleted after confirming that no other job (including non-checkpoint jobs) still needs them.

Learnt from: johanneskoester
PR: snakemake/snakemake#3096
File: snakemake/dag.py:1943-1943
Timestamp: 2024-09-19T18:47:50.192Z
Learning: In the temp file deletion logic, temp files are only deleted after confirming that no other job (including non-checkpoint jobs) still needs them.

tests/test_immediate_submit_without_shared_fs/clustersubmit (14)

Learnt from: johanneskoester
PR: snakemake/snakemake#3204
File: tests/test_cores_cluster/qsub:1-6
Timestamp: 2024-11-12T20:22:54.184Z
Learning: In the Snakemake codebase, the `tests/test_cores_cluster/qsub` script is a dummy script for testing, and input validation and error handling are not required in such scripts.

Learnt from: lczech
PR: snakemake/snakemake#3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In `snakemake/scheduler.py`, avoid suggesting the use of `asyncio.gather` in the `jobs_rewards` method due to overhead concerns and the need for immediate results.

Learnt from: johanneskoester
PR: snakemake/snakemake#3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.

Learnt from: leoschwarz
PR: snakemake/snakemake#3176
File: tests/test_output_index.py:99-157
Timestamp: 2025-01-17T12:00:09.368Z
Learning: New test dependencies for Snakemake should be introduced in separate PRs rather than being added as part of feature or refactoring PRs.

Learnt from: johanneskoester
PR: snakemake/snakemake#2963
File: snakemake/scheduler.py:199-199
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In the Snakemake project, do not suggest adding trailing commas in multi-line logging statements, as Black autoformatter handles this.

Learnt from: johanneskoester
PR: snakemake/snakemake#2963
File: snakemake/scheduler.py:199-199
Timestamp: 2024-08-27T14:31:21.449Z
Learning: In the Snakemake project, do not suggest adding trailing commas in multi-line logging statements, as Black autoformatter handles this.

Learnt from: landerlini
PR: snakemake/snakemake#3569
File: tests/test_immediate_submit_without_shared_fs/clustersubmit:0-0
Timestamp: 2025-06-19T13:58:18.447Z
Learning: When reviewing SQL schemas, user landerlini prefers "created_at" over "timestamp" for column naming and prefers to avoid DEFAULT CURRENT_TIMESTAMP in favor of manual timestamp insertion.

Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1250-1251
Timestamp: 2024-08-13T16:22:58.430Z
Learning: Avoid commenting on trailing commas in code reviews for the Snakemake repository as per the user's preference.

Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1250-1251
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Avoid commenting on trailing commas in code reviews for the Snakemake repository as per the user's preference.

Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

Learnt from: mbhall88
PR: snakemake/snakemake#3188
File: tests/test_script/scripts/test.sh:4-4
Timestamp: 2024-11-07T00:32:44.137Z
Learning: In test scripts within the Snakemake project, concise code is preferred over verbose error handling when simplicity suffices.

Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.

Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-08-13T11:05:23.821Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.

Learnt from: johanneskoester
PR: snakemake/snakemake#3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: Avoid adding input validation or error handling that unnecessarily complicates the code in the `snakemake` codebase, especially when the cases handled don't make sense.

🔇 Additional comments (1)

tests/test_immediate_submit_without_shared_fs/clustersubmit (1)

1-161: LGTM! Well-implemented test script.

The cluster submission script effectively simulates a cluster-managed DAG execution environment. The implementation addresses all the key requirements:

Proper database schema with constraints and appropriate column types

Asynchronous job execution with proper error handling

SQLite concurrency handling with locks and WAL mode

Efficient file reading and dependency management

Comprehensive logging for debugging

The script serves its purpose well as a test utility for validating immediate submission without shared filesystems.

johanneskoester

Thanks a lot, in particular also for the well designed test case!

coderabbitai

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b638c98 and 986ffbb.

📒 Files selected for processing (4)

src/snakemake/cli.py (1 hunks)
src/snakemake/jobs.py (1 hunks)
src/snakemake/scheduling/job_scheduler.py (4 hunks)
tests/tests.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (2)

src/snakemake/jobs.py
src/snakemake/cli.py

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

tests/tests.py
src/snakemake/scheduling/job_scheduler.py

🧠 Learnings (2)

📚 Learning: 2024-10-06T14:09:54.370Z

Learnt from: johanneskoester
PR: snakemake/snakemake#3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.

Applied to files:

tests/tests.py

📚 Learning: 2024-11-12T20:22:54.184Z

Learnt from: johanneskoester
PR: snakemake/snakemake#3204
File: tests/test_cores_cluster/qsub:1-6
Timestamp: 2024-11-12T20:22:54.184Z
Learning: In the Snakemake codebase, the `tests/test_cores_cluster/qsub` script is a dummy script for testing, and input validation and error handling are not required in such scripts.

Applied to files:

tests/tests.py

🧬 Code graph analysis (2)

tests/tests.py (1)

tests/common.py (2)

run (152-503)

dpath (33-36)

src/snakemake/scheduling/job_scheduler.py (5)

src/snakemake/jobs.py (5)

dag (1387-1388)

jobs (1362-1363)

jobs (1366-1367)

postprocess (1180-1274)

postprocess (1600-1660)

src/snakemake/workflow.py (8)

remote_exec (449-450)

local_exec (434-438)

dag (511-512)

run (2314-2315)

dryrun (410-414)

touch (417-422)

exec_mode (457-458)

linemaps (503-504)

src/snakemake/common/__init__.py (1)

async_run (91-108)

src/snakemake/dag.py (5)

retrieve_storage_inputs (401-454)

jobs (610-612)

postprocess (1866-1910)

handle_log (984-989)

handle_touch (909-916)

tests/common.py (1)

run (152-503)

🔇 Additional comments (4)

src/snakemake/scheduling/job_scheduler.py (4)

341-365: Immediate-submit: correct to skip local rules and warn

Skipping local_runjobs under --immediate-submit matches the intended semantics and avoids premature local execution; the warning is helpful.

369-369: No functional change

Looks fine.

401-403: Minor: local alias for immediate_submit

Good micro-optimization/readability; avoids repeated property lookups.

410-416: Postprocess now ignores missing outputs under immediate-submit

Passing ignore_missing_output for DEFAULT/SUBPROCESS/REMOTE and in the error path aligns postprocessing with submit-only behavior. This prevents premature failures before the external executor produces outputs.

Also applies to: 417-423, 426-434, 439-439

tests/tests.py

🤖 I have created a release *beep* *boop* --- ## [9.11.0](v9.10.1...v9.11.0) (2025-09-05) ### Features * add landing page to the report containing custom metadata defined with a YTE yaml template ([#3452](#3452)) ([c754d80](c754d80)) * Issue a warning when unsupported characters are used in wildcard names ([#1587](#1587)) ([fa57355](fa57355)) ### Bug Fixes * avoid checking output files in immediate-submit mode ([#3569](#3569)) ([58c42cf](58c42cf)) * clarify --keep-going flag help text to distinguish runtime vs DAG construction errors ([#3705](#3705)) ([a3a8ef4](a3a8ef4)) * enable docstring assignment in `use rule ... with:` block ([#3710](#3710)) ([2191962](2191962)) * fix missing attribute error in greedy scheduler settings when using touch, dryrun or immediate submit. ([6471004](6471004)) * only trigger script with CODE label ([#3707](#3707)) ([2d01f8c](2d01f8c)) * parser.py incorrectly parsing multiline f-strings ([#3701](#3701)) ([06ece76](06ece76)) * parsing ordinary yaml strings ([#3506](#3506)) ([ddf334e](ddf334e)) * remove temp files when using checkpoints ([#3716](#3716)) ([5ff3e20](5ff3e20)) * Restore python rules changes triggering reruns. (GH: [#3213](#3213)) ([#3485](#3485)) ([2f663be](2f663be)) * unit testing ([#3680](#3680)) ([06ba7e6](06ba7e6)) * use queue_input_wait_time when updating queue input jobs ([#3712](#3712)) ([a945a19](a945a19)) ### Documentation * output files within output directories is an error ([#2848](#2848)) ([#2913](#2913)) ([37272e5](37272e5)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

…3569)  Disable checking output files when snakemake is run with flag `--immediate-submit`. Notes: * The flag `--notemp` should be additionally indicated (the CLI suggests for it) * The flag `--not-retrieve-storage` should be indicated to avoid snakemake to try retrieving remote data that may not yet exist * `localrule`s break the DAG because their dependency pattern can not be handled by cluster logics. A warning is printed and the scheduler stops processing jobs requiring their output. ### QC  * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).  ## Summary by CodeRabbit ## Summary by CodeRabbit - **New Features** - Added support for immediate job submission mode that skips local job execution and adjusts output handling accordingly. - Introduced a new test workflow and cluster submission script to validate execution without a shared filesystem. - **Bug Fixes** - Improved handling of missing outputs during job postprocessing to avoid waiting for intentionally ignored files. - **Documentation** - Expanded help text for the immediate submission option with additional usage guidance.  --------- Co-authored-by: Lucio Anderlini <Lucio.Anderlini@fi.infn.it> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Johannes Koester <johannes.koester@uni-due.de>

🤖 I have created a release *beep* *boop* --- ## [9.11.0](snakemake/snakemake@v9.10.1...v9.11.0) (2025-09-05) ### Features * add landing page to the report containing custom metadata defined with a YTE yaml template ([snakemake#3452](snakemake#3452)) ([c754d80](snakemake@c754d80)) * Issue a warning when unsupported characters are used in wildcard names ([snakemake#1587](snakemake#1587)) ([fa57355](snakemake@fa57355)) ### Bug Fixes * avoid checking output files in immediate-submit mode ([snakemake#3569](snakemake#3569)) ([58c42cf](snakemake@58c42cf)) * clarify --keep-going flag help text to distinguish runtime vs DAG construction errors ([snakemake#3705](snakemake#3705)) ([a3a8ef4](snakemake@a3a8ef4)) * enable docstring assignment in `use rule ... with:` block ([snakemake#3710](snakemake#3710)) ([2191962](snakemake@2191962)) * fix missing attribute error in greedy scheduler settings when using touch, dryrun or immediate submit. ([6471004](snakemake@6471004)) * only trigger script with CODE label ([snakemake#3707](snakemake#3707)) ([2d01f8c](snakemake@2d01f8c)) * parser.py incorrectly parsing multiline f-strings ([snakemake#3701](snakemake#3701)) ([06ece76](snakemake@06ece76)) * parsing ordinary yaml strings ([snakemake#3506](snakemake#3506)) ([ddf334e](snakemake@ddf334e)) * remove temp files when using checkpoints ([snakemake#3716](snakemake#3716)) ([5ff3e20](snakemake@5ff3e20)) * Restore python rules changes triggering reruns. (GH: [snakemake#3213](snakemake#3213)) ([snakemake#3485](snakemake#3485)) ([2f663be](snakemake@2f663be)) * unit testing ([snakemake#3680](snakemake#3680)) ([06ba7e6](snakemake@06ba7e6)) * use queue_input_wait_time when updating queue input jobs ([snakemake#3712](snakemake#3712)) ([a945a19](snakemake@a945a19)) ### Documentation * output files within output directories is an error ([snakemake#2848](snakemake#2848)) ([snakemake#2913](snakemake#2913)) ([37272e5](snakemake@37272e5)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).

landerlini requested a review from johanneskoester as a code owner May 12, 2025 17:22

landerlini marked this pull request as draft May 12, 2025 17:26

landerlini force-pushed the main branch from 0c0ed6b to f200f21 Compare June 17, 2025 12:12

landerlini changed the title ~~WIP: fix: avoid checking output files in immediate-submit mode~~ fix: avoid checking output files in immediate-submit mode Jun 17, 2025

landerlini marked this pull request as ready for review June 17, 2025 16:14

coderabbitai bot reviewed Jun 17, 2025

View reviewed changes

tests/test_immediate_submit_without_shared_fs/clustersubmit Outdated Show resolved Hide resolved

tests/test_immediate_submit_without_shared_fs/clustersubmit Outdated Show resolved Hide resolved

tests/test_immediate_submit_without_shared_fs/clustersubmit Show resolved Hide resolved

coderabbitai bot reviewed Jun 19, 2025

View reviewed changes

tests/test_immediate_submit_without_shared_fs/clustersubmit Outdated Show resolved Hide resolved

tests/test_immediate_submit_without_shared_fs/clustersubmit Outdated Show resolved Hide resolved

tests/test_immediate_submit_without_shared_fs/clustersubmit Outdated Show resolved Hide resolved

coderabbitai bot reviewed Jun 19, 2025

View reviewed changes

Lucio Anderlini and others added 11 commits July 7, 2025 08:05

fix: avoid checking output files in immediate-submit mode

b094fcb

docs: updated docs for immediate-submit

05c77ae

docs: updated docs for --immediate-submit

7326c95

clean-up merge left-over

b3cdf32

skip local rules if --immediate-submit is specified

8f8a300

drafted test for immediate_submit

84736be

Change /bin/env into /usr/bin/env

bd3c267

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

avoid open without close for logging to file

96d685b

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Enabled WAL mode and introduced journal_mode=WAL to prevent potential…

0b86031

… DB lock Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

improved test quality following coderabbit suggestions

3507c4a

improvements to the clustersubmit script

b638c98

landerlini force-pushed the main branch from 6230905 to b638c98 Compare July 7, 2025 06:05

coderabbitai bot reviewed Jul 7, 2025

View reviewed changes

johanneskoester approved these changes Sep 4, 2025

View reviewed changes

Merge branch 'main' into pr/landerlini/3569

986ffbb

johanneskoester enabled auto-merge (squash) September 4, 2025 19:03

coderabbitai bot reviewed Sep 4, 2025

View reviewed changes

tests/tests.py Show resolved Hide resolved

Merge branch 'main' into main

5dcd43d

johanneskoester added 2 commits September 4, 2025 21:21

fmt

fdce9d5

fix lint

a65a0b8

johanneskoester merged commit 58c42cf into snakemake:main Sep 4, 2025
74 of 79 checks passed

snakemake-bot mentioned this pull request Sep 4, 2025

chore(main): release 9.11.0 #3722

Merged

coderabbitai bot mentioned this pull request Nov 7, 2025

fix: backup output files marked for update and restore them in case of job failures #3833

Merged

2 tasks

coderabbitai bot mentioned this pull request Nov 27, 2025

fix: cleanup update-marked output files of failed jobs if there was no backup to restore them #3843

Merged

2 tasks

coderabbitai bot mentioned this pull request Dec 23, 2025

Combining False/True values for retrieving storage on connected DAG nodes can lead to unexpected file deletion. #3898

Open

Conversation

landerlini commented May 12, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented May 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

landerlini commented May 12, 2025

Uh oh!

landerlini commented May 15, 2025

Uh oh!

landerlini commented Jun 17, 2025

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

johanneskoester left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

landerlini commented May 12, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented May 12, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)