Skip to content

fix: avoid checking output files in immediate-submit mode#3569

Merged
johanneskoester merged 15 commits intosnakemake:mainfrom
landerlini:main
Sep 4, 2025
Merged

fix: avoid checking output files in immediate-submit mode#3569
johanneskoester merged 15 commits intosnakemake:mainfrom
landerlini:main

Conversation

@landerlini
Copy link
Copy Markdown
Contributor

@landerlini landerlini commented May 12, 2025

Disable checking output files when snakemake is run with flag --immediate-submit.
Notes:

  • The flag --notemp should be additionally indicated (the CLI suggests for it)
  • The flag --not-retrieve-storage should be indicated to avoid snakemake to try retrieving remote data that may not yet exist
  • localrules break the DAG because their dependency pattern can not be handled by cluster logics. A warning is printed and the scheduler stops processing jobs requiring their output.

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • New Features

    • Improved immediate-submit behavior for remote execution, including skipping local rules with a warning and not waiting for missing outputs when appropriate.
    • More consistent postprocessing across executors when jobs are submitted immediately.
  • Bug Fixes

    • Prevent premature waiting/retrieval of outputs when using immediate submit, reducing failures on non-shared filesystems.
  • Documentation

    • Expanded help for --immediate-submit with guidance on skipping localrules, using --notemp, and often --not-retrieve-storage.
  • Tests

    • Added end-to-end test simulating cluster execution without a shared filesystem, including expected outputs and a submission runner.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented May 12, 2025

Warning

Rate limit exceeded

@johanneskoester has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 20 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Reviewing files that changed from the base of the PR and between 986ffbb and a65a0b8.

📒 Files selected for processing (3)
  • src/snakemake/jobs.py (1 hunks)
  • src/snakemake/scheduling/job_scheduler.py (4 hunks)
  • tests/tests.py (1 hunks)
📝 Walkthrough

Walkthrough

Implements immediate-submit semantics across scheduling and postprocessing, adds a CLI help clarification, and introduces a non-shared-filesystem integration test using a custom cluster submission/execution script. The scheduler now skips local rules under immediate-submit and propagates ignore-missing-output behavior into job postprocessing. Tests verify end-to-end behavior.

Changes

Cohort / File(s) Summary
Immediate-submit handling and postprocessing
src/snakemake/jobs.py, src/snakemake/scheduling/job_scheduler.py
jobs.py: tighten an elif in Job.postprocess with and not ignore_missing_output. job_scheduler.py: explicit skip of local rules when --immediate-submit; propagate immediate_submit into postprocessing across execution modes (DEFAULT/SUBPROCESS/remote), setting ignore_missing_output accordingly and passing it on error paths.
CLI help text
src/snakemake/cli.py
Expand --immediate-submit help: warns localrules are skipped, recommends --notemp and often --not-retrieve-storage.
New test suite: immediate submit without shared FS
tests/test_immediate_submit_without_shared_fs/Snakefile, tests/test_immediate_submit_without_shared_fs/clustersubmit, tests/test_immediate_submit_without_shared_fs/expected-results/report.yaml, tests/tests.py
Add workflow (produce/count/sum/all), expected report.yaml, a cluster submission/execution script using SQLite to manage dependencies and run jobs, and a test invoking Snakemake with cluster-generic executor, immediate-submit, --notemp, then executing queued jobs.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant CLI as CLI
  participant Scheduler as JobScheduler
  participant Exec as Executor
  participant Job as Job
  participant FS as Storage/FS

  User->>CLI: snakemake --immediate-submit [--notemp ...]
  CLI->>Scheduler: parse workflow, flags
  Scheduler->>Scheduler: plan DAG
  alt immediate-submit
    note over Scheduler: Skip local rules (warn)
  end
  loop For each runnable job
    Scheduler->>Exec: submit job (may be remote/subprocess)
  end
  Exec->>Job: run
  Job->>FS: produce outputs (async)
  Job-->>Scheduler: done (status)
  Scheduler->>Job: postprocess(ignore_missing_output = immediate_submit or touch)
  alt ignore_missing_output = true
    note over Job: Do not wait for outputs
  else
    Job->>FS: wait for outputs/benchmark
  end
Loading
sequenceDiagram
  autonumber
  participant Snk as Snakemake (cluster-generic)
  participant Sub as clustersubmit (submit)
  participant DB as SQLite DB
  participant Exe as clustersubmit (--execute)
  participant Sh as Shell

  Snk->>Sub: submit jobscript [--dependencies ...]
  Sub->>DB: insert job + edges
  loop execute cycle
    Exe->>DB: fetch runnable jobs
    DB-->>Exe: job(s) with no unfinished parents
    par async job execution
      Exe->>Sh: run jobscript
      Sh-->>Exe: exit code
      Exe->>DB: mark job done
    and possibly others
    end
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@landerlini landerlini marked this pull request as draft May 12, 2025 17:26
@landerlini
Copy link
Copy Markdown
Contributor Author

I modified the scheduler so that local rules break the DAG without raising an error for missing output file, but logging a warning.

@landerlini landerlini changed the title WIP: fix: avoid checking output files in immediate-submit mode fix: avoid checking output files in immediate-submit mode Jun 17, 2025
@landerlini landerlini marked this pull request as ready for review June 17, 2025 16:14
@landerlini
Copy link
Copy Markdown
Contributor Author

@cmeesters with my apologies for the long delay, I have finally implemented a test (test_immediate_submit_without_shared_fs) failing on the main branch and succeeding with this fix.
I have implemented a script mocking the execution of the DAG as controlled by some external engine, after the submission. The scheduler is simplistic, but sufficient to reproduce the bug on checking for output files before submitting jobs.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (7)
tests/test_immediate_submit_without_shared_fs/clustersubmit (2)

68-77: Hard-coded 100 iteration cap can leave jobs unprocessed

The loop stops after 100 passes even if new runnable jobs keep appearing (large DAGs).
Use a while True: loop with an exit condition instead.

-    for _ in range(100):
+    while True:
@@
-        else:
-            logging.info("Could not find any other job to execute.")
-            return 
+        if not result:
+            logging.info("No runnable jobs left – exiting.")
+            return

104-106: Dependency parsing discards empty strings – cleaner expression possible

Current list-comp is harder to read; using split() without an argument already removes duplicates of whitespace.

-dependencies = [s for s in args.dependencies.split(" ") if s not in [' ', '', None]]
+dependencies = args.dependencies.split()
tests/tests.py (2)

2476-2477: Drop the stray blank lines

These two empty lines introduce an unnecessary diff hunk and break the otherwise tight grouping of tests.
Feel free to remove them – it keeps the history cleaner.


2478-2489: Inline-concatenated shell command is hard to read & easy to mis-escape

The multi-line literal relies on implicit string concatenation and heavy escaping for the embedded quotes. This is easy to break the next time someone touches it.

Consider building the command as a single triple-quoted string (or "\n".join([...])) so that quoting is obvious and you only escape once. Example:

-        shellcmd="snakemake "
-        "--executor cluster-generic "
-        "--cluster-generic-submit-cmd \"./clustersubmit --dependencies \\\"{dependencies}\\\"\" "
-        "--forceall "
-        "--immediate-submit "
-        "--notemp "
-        "--jobs 10 "
-        "; ./clustersubmit --execute"
+        shellcmd=(
+            "snakemake "
+            "--executor cluster-generic "
+            "--cluster-generic-submit-cmd \"./clustersubmit --dependencies \\\"{dependencies}\\\"\" "
+            "--forceall "
+            "--immediate-submit "
+            "--notemp "
+            "--jobs 10 "
+            "; ./clustersubmit --execute"
+        )

This keeps the final string identical while being far less brittle.

tests/test_immediate_submit_without_shared_fs/Snakefile (3)

10-11: Prefer printf over echo for portable, predictable output

echo interpretation (e.g. -n, backslash escapes) varies between shells.
printf is portable and lets you control the trailing newline explicitly:

-    shell: "echo {wildcards.fname} > {output.txt}"
+    shell: "printf '%s\n' {wildcards.fname} > {output.txt}"

18-19: Minor: avoid UUOC (“useless use of cat”)

You can pass the file directly to wc and drop the extra process:

-    shell: "cat {input.txt} | wc -m > {output.dat}"
+    shell: "wc -m < {input.txt} > {output.dat}"

This slightly speeds up the rule and is a common shell best practice.


29-33: Quote loop variable to survive spaces in file names

If a file path ever contains spaces the current loop breaks.
Quoting fixes that and removes a useless subshell:

-        for i in {input.counts}; do
-            SUM=$(($SUM + $(cat $i)));
+        for i in {input.counts}; do
+            SUM=$((SUM + $(cat "$i")))
         done;

Not critical for the current test files, but makes the rule future-proof.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d3bb5e and 7cdeae8.

📒 Files selected for processing (7)
  • src/snakemake/cli.py (1 hunks)
  • src/snakemake/jobs.py (1 hunks)
  • src/snakemake/scheduler.py (4 hunks)
  • tests/test_immediate_submit_without_shared_fs/Snakefile (1 hunks)
  • tests/test_immediate_submit_without_shared_fs/clustersubmit (1 hunks)
  • tests/test_immediate_submit_without_shared_fs/expected-results/report.yaml (1 hunks)
  • tests/tests.py (1 hunks)
✅ Files skipped from review due to trivial changes (1)
  • tests/test_immediate_submit_without_shared_fs/expected-results/report.yaml
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/snakemake/scheduler.py
  • src/snakemake/jobs.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • src/snakemake/cli.py
  • tests/tests.py
🧬 Code Graph Analysis (1)
tests/tests.py (1)
tests/common.py (2)
  • run (151-495)
  • dpath (32-35)
🔇 Additional comments (1)
src/snakemake/cli.py (1)

1589-1593: Help text may be outdated w.r.t. new localrule handling

The message still says “localrules will fail”, but the scheduler change in this PR now skips localrules with a warning rather than failing.
Please align the help text with the current behaviour to avoid user confusion.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (3)
tests/test_immediate_submit_without_shared_fs/clustersubmit (3)

101-104: Risk of duplicate job names with current generation strategy.

The random job name generation could create duplicates, leading to SQLite constraint violations if a PRIMARY KEY is added (as suggested above).

Consider using a more robust job ID generation:

-job_name = ''.join(
-    [properties['rule'], '-'] + 
-    [random.choice(string.ascii_lowercase) for _ in range(5)]
-)
+import time
+job_name = f"{properties['rule']}-{int(time.time())}-{''.join(random.choices(string.ascii_lowercase, k=5))}"

70-79: Arbitrary iteration limit may cause premature termination.

The hardcoded limit of 100 iterations could prevent all jobs from completing in complex DAGs. Consider using a more flexible approach.

-    for _ in range(100):
+    max_iterations = 1000  # or make configurable
+    iteration = 0
+    consecutive_empty_rounds = 0
+    
+    while iteration < max_iterations and consecutive_empty_rounds < 3:
+        iteration += 1
         with sqlite3.connect(DATABASE_FILE) as db:
             result = db.execute(QUERY_RUNNABLE).fetchall()
         if len(result):
+            consecutive_empty_rounds = 0
             async with asyncio.TaskGroup() as tg:
                 for row in result:
                     tg.create_task(execute_single_job(*row))
         else:
+            consecutive_empty_rounds += 1
-            logging.info("Could not find any other job to execute.")
-            return 
+            if consecutive_empty_rounds >= 3:
+                logging.info("No runnable jobs found after 3 attempts. Stopping.")
+                return
+            await asyncio.sleep(1)  # Brief pause before retrying

12-14: Hardcoded paths may cause issues in different environments.

Using hardcoded /tmp/ paths may not work reliably across all systems or when running multiple tests concurrently.

Consider using environment variables or temporary directories:

+import tempfile
+
 ## Configuration 
-DATABASE_FILE = "/tmp/clustersubmit.sqlite3"
-LOG_FILE = "/tmp/clustersubmit.log"
+DATABASE_FILE = os.environ.get("CLUSTER_DB_FILE", os.path.join(tempfile.gettempdir(), "clustersubmit.sqlite3"))
+LOG_FILE = os.environ.get("CLUSTER_LOG_FILE", os.path.join(tempfile.gettempdir(), "clustersubmit.log"))
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 941003c and e59e02f.

📒 Files selected for processing (1)
  • tests/test_immediate_submit_without_shared_fs/clustersubmit (1 hunks)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
tests/test_immediate_submit_without_shared_fs/clustersubmit (4)

84-84: Fix typo in error message.

The error message contains a typo: "Falied" should be "Failed".

-        raise RuntimeError(f"Falied executing job {jobid}")
+        raise RuntimeError(f"Failed executing job {jobid}")

82-84: Consider sanitizing logged output to prevent information disclosure.

Logging stdout/stderr at CRITICAL level could expose sensitive information or create excessively large log files. Consider truncating or sanitizing the output.

-        logging.critical(stdout)
-        logging.critical(stderr)
+        logging.error(f"Job {jobid} failed with return code {proc.returncode}")
+        if stdout:
+            logging.debug(f"Job {jobid} stdout: {stdout.decode()[:500]}...")  # Truncate long output
+        if stderr:
+            logging.error(f"Job {jobid} stderr: {stderr.decode()[:500]}...")

133-136: Improve job name uniqueness for concurrent submissions.

Using time.time() alone could create naming conflicts if multiple jobs are submitted simultaneously. Consider using uuid or more granular timing.

+import uuid
+
-job_name = ''.join(
-    [properties['rule'], '-', str(time.time()), '-'] + 
-    [random.choice(string.ascii_lowercase) for _ in range(5)]
-)
+job_name = f"{properties['rule']}-{uuid.uuid4().hex[:8]}"

66-67: Optimize database cleanup frequency.

Database cleanup runs on every script invocation, which could be inefficient for frequent submissions. Consider cleanup only when executing jobs or periodically.

-    db.execute("DELETE FROM jobs WHERE created_at <= datetime('now', '-1 hour')")
-    db.execute("DELETE FROM edges WHERE created_at <= datetime('now', '-1 hour')")
+    # Only cleanup during execution mode to reduce overhead
+    if args.execute:
+        db.execute("DELETE FROM jobs WHERE created_at <= datetime('now', '-1 hour')")
+        db.execute("DELETE FROM edges WHERE created_at <= datetime('now', '-1 hour')")
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e59e02f and 099e8d5.

📒 Files selected for processing (4)
  • src/snakemake/cli.py (1 hunks)
  • tests/test_immediate_submit_without_shared_fs/Snakefile (1 hunks)
  • tests/test_immediate_submit_without_shared_fs/clustersubmit (1 hunks)
  • tests/tests.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • src/snakemake/cli.py
  • tests/tests.py
  • tests/test_immediate_submit_without_shared_fs/Snakefile
🔇 Additional comments (3)
tests/test_immediate_submit_without_shared_fs/clustersubmit (3)

46-64: Database schema looks well-designed with proper constraints.

The SQLite schema now includes proper primary keys, foreign key constraints, and appropriate column types. The WAL mode configuration addresses concurrency concerns effectively.


74-91: Excellent concurrency handling and error management.

The async job execution function properly handles subprocess errors, uses database locking to prevent race conditions, and includes appropriate logging. The WAL mode and timeout settings address the concurrency issues from previous reviews.


122-128: Efficient single-pass file reading implementation.

The job script is now read once and reused for both property extraction and database storage, eliminating the redundant file operations from previous versions.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
tests/test_immediate_submit_without_shared_fs/clustersubmit (2)

88-88: Fix typo in error message.

"Faied" should be "Failed".

-        raise RuntimeError(f"Faied executing job {jobid}")
+        raise RuntimeError(f"Failed executing job {jobid}")

162-162: Remove stray line number.

The standalone "162" appears to be a formatting artifact and should be removed.

-162
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6230905 and b638c98.

📒 Files selected for processing (7)
  • src/snakemake/cli.py (1 hunks)
  • src/snakemake/jobs.py (1 hunks)
  • src/snakemake/scheduler.py (4 hunks)
  • tests/test_immediate_submit_without_shared_fs/Snakefile (1 hunks)
  • tests/test_immediate_submit_without_shared_fs/clustersubmit (1 hunks)
  • tests/test_immediate_submit_without_shared_fs/expected-results/report.yaml (1 hunks)
  • tests/tests.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • tests/test_immediate_submit_without_shared_fs/expected-results/report.yaml
  • src/snakemake/jobs.py
  • src/snakemake/cli.py
  • tests/tests.py
  • tests/test_immediate_submit_without_shared_fs/Snakefile
  • src/snakemake/scheduler.py
🧰 Additional context used
🧠 Learnings (2)
📓 Common learnings
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.
Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-08-13T11:05:23.821Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.
Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.
Learnt from: leoschwarz
PR: snakemake/snakemake#3176
File: tests/test_output_index.py:99-157
Timestamp: 2025-01-17T12:00:09.368Z
Learning: New test dependencies for Snakemake should be introduced in separate PRs rather than being added as part of feature or refactoring PRs.
Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1250-1251
Timestamp: 2024-08-13T16:22:58.430Z
Learning: Avoid commenting on trailing commas in code reviews for the Snakemake repository as per the user's preference.
Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1250-1251
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Avoid commenting on trailing commas in code reviews for the Snakemake repository as per the user's preference.
Learnt from: johanneskoester
PR: snakemake/snakemake#3204
File: tests/test_cores_cluster/qsub:1-6
Timestamp: 2024-11-12T20:22:54.184Z
Learning: In the Snakemake codebase, the `tests/test_cores_cluster/qsub` script is a dummy script for testing, and input validation and error handling are not required in such scripts.
Learnt from: lczech
PR: snakemake/snakemake#3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In `snakemake/scheduler.py`, avoid suggesting the use of `asyncio.gather` in the `jobs_rewards` method due to overhead concerns and the need for immediate results.
Learnt from: kdm9
PR: snakemake/snakemake#3562
File: src/snakemake/checkpoints.py:90-90
Timestamp: 2025-05-06T01:37:23.382Z
Learning: In Snakemake checkpoints implementation, tracking only the first missing output for each checkpoint is sufficient, because if one output is missing, all outputs for that checkpoint are considered incomplete. This was the behavior before PR #3562 and maintained in the pluralized `checkpoints.get()` implementation.
Learnt from: johanneskoester
PR: snakemake/snakemake#3600
File: src/snakemake/jobs.py:960-964
Timestamp: 2025-05-23T09:40:24.474Z
Learning: In the `cleanup` method of the `Job` class in `src/snakemake/jobs.py`, files in the `to_remove` list should be formatted with `fmt_iofile` without specifying `as_output=True` or `as_input=True` parameters, as these files should be displayed as generic files rather than specifically as output files.
Learnt from: johanneskoester
PR: snakemake/snakemake#3096
File: snakemake/dag.py:1943-1943
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In the temp file deletion logic, temp files are only deleted after confirming that no other job (including non-checkpoint jobs) still needs them.
Learnt from: johanneskoester
PR: snakemake/snakemake#3096
File: snakemake/dag.py:1943-1943
Timestamp: 2024-09-19T18:47:50.192Z
Learning: In the temp file deletion logic, temp files are only deleted after confirming that no other job (including non-checkpoint jobs) still needs them.
tests/test_immediate_submit_without_shared_fs/clustersubmit (14)
Learnt from: johanneskoester
PR: snakemake/snakemake#3204
File: tests/test_cores_cluster/qsub:1-6
Timestamp: 2024-11-12T20:22:54.184Z
Learning: In the Snakemake codebase, the `tests/test_cores_cluster/qsub` script is a dummy script for testing, and input validation and error handling are not required in such scripts.
Learnt from: lczech
PR: snakemake/snakemake#3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In `snakemake/scheduler.py`, avoid suggesting the use of `asyncio.gather` in the `jobs_rewards` method due to overhead concerns and the need for immediate results.
Learnt from: johanneskoester
PR: snakemake/snakemake#3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.
Learnt from: leoschwarz
PR: snakemake/snakemake#3176
File: tests/test_output_index.py:99-157
Timestamp: 2025-01-17T12:00:09.368Z
Learning: New test dependencies for Snakemake should be introduced in separate PRs rather than being added as part of feature or refactoring PRs.
Learnt from: johanneskoester
PR: snakemake/snakemake#2963
File: snakemake/scheduler.py:199-199
Timestamp: 2024-10-08T17:41:54.542Z
Learning: In the Snakemake project, do not suggest adding trailing commas in multi-line logging statements, as Black autoformatter handles this.
Learnt from: johanneskoester
PR: snakemake/snakemake#2963
File: snakemake/scheduler.py:199-199
Timestamp: 2024-08-27T14:31:21.449Z
Learning: In the Snakemake project, do not suggest adding trailing commas in multi-line logging statements, as Black autoformatter handles this.
Learnt from: landerlini
PR: snakemake/snakemake#3569
File: tests/test_immediate_submit_without_shared_fs/clustersubmit:0-0
Timestamp: 2025-06-19T13:58:18.447Z
Learning: When reviewing SQL schemas, user landerlini prefers "created_at" over "timestamp" for column naming and prefers to avoid DEFAULT CURRENT_TIMESTAMP in favor of manual timestamp insertion.
Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1250-1251
Timestamp: 2024-08-13T16:22:58.430Z
Learning: Avoid commenting on trailing commas in code reviews for the Snakemake repository as per the user's preference.
Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1250-1251
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Avoid commenting on trailing commas in code reviews for the Snakemake repository as per the user's preference.
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
Learnt from: mbhall88
PR: snakemake/snakemake#3188
File: tests/test_script/scripts/test.sh:4-4
Timestamp: 2024-11-07T00:32:44.137Z
Learning: In test scripts within the Snakemake project, concise code is preferred over verbose error handling when simplicity suffices.
Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.
Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-08-13T11:05:23.821Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.
Learnt from: johanneskoester
PR: snakemake/snakemake#3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: Avoid adding input validation or error handling that unnecessarily complicates the code in the `snakemake` codebase, especially when the cases handled don't make sense.
🔇 Additional comments (1)
tests/test_immediate_submit_without_shared_fs/clustersubmit (1)

1-161: LGTM! Well-implemented test script.

The cluster submission script effectively simulates a cluster-managed DAG execution environment. The implementation addresses all the key requirements:

  • Proper database schema with constraints and appropriate column types
  • Asynchronous job execution with proper error handling
  • SQLite concurrency handling with locks and WAL mode
  • Efficient file reading and dependency management
  • Comprehensive logging for debugging

The script serves its purpose well as a test utility for validating immediate submission without shared filesystems.

Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot, in particular also for the well designed test case!

@johanneskoester johanneskoester enabled auto-merge (squash) September 4, 2025 19:03
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between b638c98 and 986ffbb.

📒 Files selected for processing (4)
  • src/snakemake/cli.py (1 hunks)
  • src/snakemake/jobs.py (1 hunks)
  • src/snakemake/scheduling/job_scheduler.py (4 hunks)
  • tests/tests.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/snakemake/jobs.py
  • src/snakemake/cli.py
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • tests/tests.py
  • src/snakemake/scheduling/job_scheduler.py
🧠 Learnings (2)
📚 Learning: 2024-10-06T14:09:54.370Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.

Applied to files:

  • tests/tests.py
📚 Learning: 2024-11-12T20:22:54.184Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3204
File: tests/test_cores_cluster/qsub:1-6
Timestamp: 2024-11-12T20:22:54.184Z
Learning: In the Snakemake codebase, the `tests/test_cores_cluster/qsub` script is a dummy script for testing, and input validation and error handling are not required in such scripts.

Applied to files:

  • tests/tests.py
🧬 Code graph analysis (2)
tests/tests.py (1)
tests/common.py (2)
  • run (152-503)
  • dpath (33-36)
src/snakemake/scheduling/job_scheduler.py (5)
src/snakemake/jobs.py (5)
  • dag (1387-1388)
  • jobs (1362-1363)
  • jobs (1366-1367)
  • postprocess (1180-1274)
  • postprocess (1600-1660)
src/snakemake/workflow.py (8)
  • remote_exec (449-450)
  • local_exec (434-438)
  • dag (511-512)
  • run (2314-2315)
  • dryrun (410-414)
  • touch (417-422)
  • exec_mode (457-458)
  • linemaps (503-504)
src/snakemake/common/__init__.py (1)
  • async_run (91-108)
src/snakemake/dag.py (5)
  • retrieve_storage_inputs (401-454)
  • jobs (610-612)
  • postprocess (1866-1910)
  • handle_log (984-989)
  • handle_touch (909-916)
tests/common.py (1)
  • run (152-503)
🔇 Additional comments (4)
src/snakemake/scheduling/job_scheduler.py (4)

341-365: Immediate-submit: correct to skip local rules and warn

Skipping local_runjobs under --immediate-submit matches the intended semantics and avoids premature local execution; the warning is helpful.


369-369: No functional change

Looks fine.


401-403: Minor: local alias for immediate_submit

Good micro-optimization/readability; avoids repeated property lookups.


410-416: Postprocess now ignores missing outputs under immediate-submit

Passing ignore_missing_output for DEFAULT/SUBPROCESS/REMOTE and in the error path aligns postprocessing with submit-only behavior. This prevents premature failures before the external executor produces outputs.

Also applies to: 417-423, 426-434, 439-439

@johanneskoester johanneskoester merged commit 58c42cf into snakemake:main Sep 4, 2025
74 of 79 checks passed
johanneskoester pushed a commit that referenced this pull request Sep 5, 2025
🤖 I have created a release *beep* *boop*
---


##
[9.11.0](v9.10.1...v9.11.0)
(2025-09-05)


### Features

* add landing page to the report containing custom metadata defined with
a YTE yaml template
([#3452](#3452))
([c754d80](c754d80))
* Issue a warning when unsupported characters are used in wildcard names
([#1587](#1587))
([fa57355](fa57355))


### Bug Fixes

* avoid checking output files in immediate-submit mode
([#3569](#3569))
([58c42cf](58c42cf))
* clarify --keep-going flag help text to distinguish runtime vs DAG
construction errors
([#3705](#3705))
([a3a8ef4](a3a8ef4))
* enable docstring assignment in `use rule ... with:` block
([#3710](#3710))
([2191962](2191962))
* fix missing attribute error in greedy scheduler settings when using
touch, dryrun or immediate submit.
([6471004](6471004))
* only trigger script with CODE label
([#3707](#3707))
([2d01f8c](2d01f8c))
* parser.py incorrectly parsing multiline f-strings
([#3701](#3701))
([06ece76](06ece76))
* parsing ordinary yaml strings
([#3506](#3506))
([ddf334e](ddf334e))
* remove temp files when using checkpoints
([#3716](#3716))
([5ff3e20](5ff3e20))
* Restore python rules changes triggering reruns. (GH:
[#3213](#3213))
([#3485](#3485))
([2f663be](2f663be))
* unit testing
([#3680](#3680))
([06ba7e6](06ba7e6))
* use queue_input_wait_time when updating queue input jobs
([#3712](#3712))
([a945a19](a945a19))


### Documentation

* output files within output directories is an error
([#2848](#2848))
([#2913](#2913))
([37272e5](37272e5))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
…3569)

<!--Add a description of your PR here-->

Disable checking output files when snakemake is run with flag
`--immediate-submit`.
Notes:
* The flag `--notemp` should be additionally indicated (the CLI suggests
for it)
* The flag `--not-retrieve-storage` should be indicated to avoid
snakemake to try retrieving remote data that may not yet exist
* `localrule`s break the DAG because their dependency pattern can not be
handled by cluster logics. A warning is printed and the scheduler stops
processing jobs requiring their output.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [x] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

## Summary by CodeRabbit

- **New Features**
- Added support for immediate job submission mode that skips local job
execution and adjusts output handling accordingly.
- Introduced a new test workflow and cluster submission script to
validate execution without a shared filesystem.
- **Bug Fixes**
- Improved handling of missing outputs during job postprocessing to
avoid waiting for intentionally ignored files.
- **Documentation**
- Expanded help text for the immediate submission option with additional
usage guidance.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->

---------

Co-authored-by: Lucio Anderlini <Lucio.Anderlini@fi.infn.it>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: Johannes Koester <johannes.koester@uni-due.de>
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
🤖 I have created a release *beep* *boop*
---


##
[9.11.0](snakemake/snakemake@v9.10.1...v9.11.0)
(2025-09-05)


### Features

* add landing page to the report containing custom metadata defined with
a YTE yaml template
([snakemake#3452](snakemake#3452))
([c754d80](snakemake@c754d80))
* Issue a warning when unsupported characters are used in wildcard names
([snakemake#1587](snakemake#1587))
([fa57355](snakemake@fa57355))


### Bug Fixes

* avoid checking output files in immediate-submit mode
([snakemake#3569](snakemake#3569))
([58c42cf](snakemake@58c42cf))
* clarify --keep-going flag help text to distinguish runtime vs DAG
construction errors
([snakemake#3705](snakemake#3705))
([a3a8ef4](snakemake@a3a8ef4))
* enable docstring assignment in `use rule ... with:` block
([snakemake#3710](snakemake#3710))
([2191962](snakemake@2191962))
* fix missing attribute error in greedy scheduler settings when using
touch, dryrun or immediate submit.
([6471004](snakemake@6471004))
* only trigger script with CODE label
([snakemake#3707](snakemake#3707))
([2d01f8c](snakemake@2d01f8c))
* parser.py incorrectly parsing multiline f-strings
([snakemake#3701](snakemake#3701))
([06ece76](snakemake@06ece76))
* parsing ordinary yaml strings
([snakemake#3506](snakemake#3506))
([ddf334e](snakemake@ddf334e))
* remove temp files when using checkpoints
([snakemake#3716](snakemake#3716))
([5ff3e20](snakemake@5ff3e20))
* Restore python rules changes triggering reruns. (GH:
[snakemake#3213](snakemake#3213))
([snakemake#3485](snakemake#3485))
([2f663be](snakemake@2f663be))
* unit testing
([snakemake#3680](snakemake#3680))
([06ba7e6](snakemake@06ba7e6))
* use queue_input_wait_time when updating queue input jobs
([snakemake#3712](snakemake#3712))
([a945a19](snakemake@a945a19))


### Documentation

* output files within output directories is an error
([snakemake#2848](snakemake#2848))
([snakemake#2913](snakemake#2913))
([37272e5](snakemake@37272e5))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants