fix: force check all required outputs by Hocnonsense · Pull Request #3341 · snakemake/snakemake

Hocnonsense · 2025-03-07T07:51:38Z

This may fix #3036

This fix discards totaly change future_output to created_output, and directly check if all output the rule wanted are created.
I'm somehow doubt if future_output used some elsewhere. Is it needed to add it back?

QC

The PR contains test case <tests/tests.py::test_checkpoints_many> for the changes.
the change does neither modify the language nor the behavior or functionalities of Snakemake.

Summary by CodeRabbit

New Features
- Enhanced workflow execution with refined job ordering, checkpoint validation, file parsing, checksum extraction, and updated CLI help text.
- Added support for xonsh scripts in the workflow.
Documentation
- Updated examples and guidelines for input parsing and configuration validations, with typographical corrections for improved clarity.
- Clarified usage of conda environments and apptainer integration in the documentation.
- Expanded documentation on report generation and customization options.
Tests
- Introduced new test cases for checkpoint and data format validation, and updated expected outputs and schema definitions.
- Added tests for conda deployment scripts and enhanced test coverage for validation processes.
Chores
- Revised CI workflows with conditional execution and updated environment dependencies to streamline the build process.
- Expanded asset management capabilities with new package entries and dependency tracking.

coderabbitai · 2025-03-07T07:51:47Z

📝 Walkthrough

Walkthrough

This pull request makes a series of modifications across the repository. In core modules, the checkpoint logic has been simplified by removing file existence checks, and the DAG logic has been reformatted for clarity. Conditional execution has been added for various GitHub Actions jobs based on pull request states. New I/O utility functions for parsing inputs and extracting checksums have been introduced, and the CLI help text has been clarified. Additionally, tests have been expanded, documentation has been updated, and asset metadata has been enhanced.

Changes

File(s)	Change Summary
`snakemake/checkpoints.py`	Removed the check for files in the `may_not_created` set in the `get` method; raises `IncompleteCheckpointException` if `may_not_created` is not empty.
`snakemake/dag.py`	Restructured the generator expression in `update_checkpoint_outputs` for clarity; no functional changes.
`snakemake/workflow.py`	Introduced a conditional check on `storage_settings.keep_storage_local` before performing cleanup of storage objects.
`snakemake/cli.py`	Updated help text for `--keep-storage-local-copies` to clarify retention of both remote input and output files.
`snakemake/ioutils/__init__.py` `snakemake/ioutils/input.py`	Added new functions `parse_input` and `extract_checksum`; updated global namespace registration to include these functions.
`snakemake/utils.py`	Introduced new internal methods `_validate_record`, `_validate_pandas`, and `_validate_polars` to handle JSON schema validation for dictionaries, pandas, and polars DataFrames.
`snakemake/assets/__init__.py` `snakemake/report/html_reporter/data/packages.py`	Added extensive new asset entries and package definitions with associated license metadata.
`setup.py`	Added `"unit_tests/templates/*"` to the package data in the `setup()` function.
`.github/workflows/codespell.yml` `.github/workflows/docs.yml` `.github/workflows/main.yml`	Added conditional execution (`if: github.event.pull_request.merged != true
`docs/snakefiles/rules.rst`	Added documentation for new helper functions `parse_input` and `extract_checksum` with examples.
`docs/snakefiles/configuration.rst`	Updated the function signature for `snakemake.utils.validate` to accept additional data types.
`docs/project_info/codebase.rst` `docs/project_info/contributing.rst`	Corrected typographical errors and restructured contributing guidelines; updated Conda environment naming.
`test-environment.yml`	Moved `cwltool` and `cwl-utils` into the main dependencies and added `setuptools` as a dependency.
`tests/tests.py`	Added a new test function `test_checkpoints_many()` to expand checkpoint testing coverage.
`tests/test_ioutils/Snakefile` `tests/test_ioutils/expected-results/*` `tests/test_ioutils/samples.md5`	Updated rules to include checksum input and output verifications; modified expected result files and introduced `samples.md5` for checksum references.
`tests/test_validate/Snakefile` `tests/test_validate/samples.schema.yaml`	Expanded test validations to cover Pandas and Polars DataFrame formats and updated schema properties (including new `n` and `tissue` fields, and simplified the `condition` description).
`tests/test_conda_run/Snakefile` `tests/test_conda_run/expected-results/test.txt` `tests/test_conda_run/test_python_env.yaml` `tests/test_conda_run/test_script_run.py`	Introduced new rule `random_python_conda_script` and associated files for testing Python scripts in a conda environment.
`tests/test_script_xsh/Snakefile` `tests/test_script_xsh/envs/xonsh.yaml` `tests/test_script_xsh/scripts/test.xsh`	Added new rules and scripts for testing `xonsh` scripts in a conda environment.
`tests/tests_using_conda.py`	Added new test functions to cover conda deployment scenarios.
`tests/test_issue1092/Snakefile`	Modified the `aggregate` rule to use a `run` directive instead of a direct shell command.

Sequence Diagram(s)

sequenceDiagram
    participant WF as Workflow
    participant SS as StorageSettings
    participant DAG as DAG
    WF->>SS: Check keep_storage_local attribute
    alt keep_storage_local is False
        WF->>DAG: Call cleanup_storage_objects()
    else
        WF->>WF: Skip cleanup
    end

sequenceDiagram
    participant PR as Pull Request Event
    participant CI as GitHub Actions
    participant Job as CI Job
    PR->>CI: Trigger event (push/PR)
    CI->>Job: Evaluate condition (merged != true || branch != main)
    alt Condition met
        Job->>CI: Execute job (formatting, testing, docs, etc.)
    else
        Job->>CI: Skip execution
    end

Suggested reviewers

johanneskoester

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6f39e6b and 4f72e5d.

📒 Files selected for processing (5)

snakemake/dag.py (0 hunks)
snakemake/dag.py (0 hunks)
snakemake/dag.py (1 hunks)
snakemake/dag.py (2 hunks)
snakemake/dag.py (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (4)

snakemake/dag.py
snakemake/dag.py
snakemake/dag.py
snakemake/dag.py

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.py`: Do not try to improve formatting. Do not suggest ...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/dag.py

⏰ Context from checks skipped due to timeout of 90000ms (30)

GitHub Check: testing (10, 3.12, bash)
GitHub Check: testing (10, 3.11, bash)
GitHub Check: testing (9, 3.12, bash)
GitHub Check: testing (9, 3.11, bash)
GitHub Check: testing (8, 3.12, bash)
GitHub Check: testing (8, 3.11, bash)
GitHub Check: testing (7, 3.12, bash)
GitHub Check: testing (7, 3.11, bash)
GitHub Check: testing (6, 3.12, bash)
GitHub Check: testing (6, 3.11, bash)
GitHub Check: testing-windows (10)
GitHub Check: testing (5, 3.12, bash)
GitHub Check: testing-windows (9)
GitHub Check: testing (5, 3.11, bash)
GitHub Check: testing-windows (8)
GitHub Check: testing (4, 3.12, bash)
GitHub Check: testing-windows (7)
GitHub Check: testing (4, 3.11, bash)
GitHub Check: testing-windows (6)
GitHub Check: testing (3, 3.12, bash)
GitHub Check: testing-windows (5)
GitHub Check: testing (3, 3.11, bash)
GitHub Check: testing-windows (4)
GitHub Check: testing (2, 3.12, bash)
GitHub Check: testing-windows (3)
GitHub Check: testing (2, 3.11, bash)
GitHub Check: testing-windows (2)
GitHub Check: testing (1, 3.12, bash)
GitHub Check: testing-windows (1)
GitHub Check: testing (1, 3.11, bash)

🔇 Additional comments (1)

snakemake/dag.py (1)

408-428: Improved structure for storing job outputs

The code has been refactored to use a more structured approach for storing job outputs in storage. The main changes are:

Implementing a two-level loop that first iterates over topologically sorted levels of jobs

Adding an async helper function tostore to determine if a file should be stored

Adding a conditional check to only process outputs from finished jobs

This approach ensures that all outputs specified by a rule are properly checked and stored, which resolves the issue mentioned in PR #3036 by eliminating the need for future_output.

✨ Finishing Touches

📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate docstrings to generate docstrings for this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

snakemake/checkpoints.py (1)
37-41: Core checkpoint fix: checking all outputs against created_output.

This is the core fix of the PR - properly checking if all the outputs (rather than just any) are in created_output instead of checking against future_output. This ensures that all required outputs from the checkpoint rule are created before proceeding.

The nested if statements could be simplified to a single condition for better readability.
-        if self.checkpoints.created_output is not None:
-            if set(output) <= set(self.checkpoints.created_output):
+        if self.checkpoints.created_output is not None and set(output) <= set(self.checkpoints.created_output):
                return CheckpointJob(self.rule, output)
🧰 Tools

🪛 Ruff (0.8.2)

39-40: Use a single if statement instead of nested if statements

(SIM102)
tests/test_checkpoints_many/Snakefile (1)
56-56: Typo in checkpoint usage pattern.

There's a space between the ** and wildcards in the checkpoint call which should be removed for consistency with Python's unpacking syntax.
-    outputs_i = glob.glob(f"{checkpoints.first.get(** wildcards).output}/*/")
+    outputs_i = glob.glob(f"{checkpoints.first.get(**wildcards).output}/*/")

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 75afa38 and 587dcaa.

📒 Files selected for processing (4)

snakemake/checkpoints.py (4 hunks)
snakemake/dag.py (0 hunks)
tests/test_checkpoints_many/Snakefile (1 hunks)
tests/tests.py (1 hunks)

💤 Files with no reviewable changes (1)

snakemake/dag.py

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.py`: Do not try to improve formatting. Do not suggest ...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/checkpoints.py
tests/tests.py

🪛 Ruff (0.8.2)

snakemake/checkpoints.py

39-40: Use a single if statement instead of nested if statements

(SIM102)

⏰ Context from checks skipped due to timeout of 90000ms (40)

GitHub Check: testing (10, 3.12, dash)
GitHub Check: testing (10, 3.12, bash)
GitHub Check: testing (10, 3.11, bash)
GitHub Check: testing (9, 3.12, dash)
GitHub Check: testing (9, 3.12, bash)
GitHub Check: testing (9, 3.11, bash)
GitHub Check: testing (8, 3.12, dash)
GitHub Check: testing (8, 3.12, bash)
GitHub Check: testing (8, 3.11, bash)
GitHub Check: testing (7, 3.12, dash)
GitHub Check: testing (7, 3.12, bash)
GitHub Check: testing (7, 3.11, bash)
GitHub Check: testing (6, 3.12, dash)
GitHub Check: testing (6, 3.12, bash)
GitHub Check: testing (6, 3.11, bash)
GitHub Check: testing (5, 3.12, dash)
GitHub Check: testing (5, 3.12, bash)
GitHub Check: testing (5, 3.11, bash)
GitHub Check: testing (4, 3.12, dash)
GitHub Check: testing (4, 3.12, bash)
GitHub Check: testing (4, 3.11, bash)
GitHub Check: testing (3, 3.12, dash)
GitHub Check: testing-windows (10)
GitHub Check: testing (3, 3.12, bash)
GitHub Check: testing-windows (9)
GitHub Check: testing (3, 3.11, bash)
GitHub Check: testing-windows (8)
GitHub Check: testing-windows (7)
GitHub Check: testing (2, 3.12, dash)
GitHub Check: testing-windows (6)
GitHub Check: testing (2, 3.12, bash)
GitHub Check: testing-windows (5)
GitHub Check: testing (2, 3.11, bash)
GitHub Check: testing-windows (4)
GitHub Check: testing (1, 3.12, dash)
GitHub Check: testing-windows (3)
GitHub Check: testing (1, 3.12, bash)
GitHub Check: testing-windows (2)
GitHub Check: testing (1, 3.11, bash)
GitHub Check: testing-windows (1)

🔇 Additional comments (5)

tests/tests.py (1)

1353-1355: LGTM! Good test addition.

Adding a test for the more complex checkpoint scenario is a good practice to ensure the fix works with multiple checkpoints. This will help verify that the checkpoint output validation works correctly.

snakemake/checkpoints.py (3)

1-8: Type imports look good.

The addition of proper typing imports and the TYPE_CHECKING block for forward references is a good practice to avoid circular imports while providing type hints.

26-26: Adding type hints improves code readability.

The addition of type hints for the constructor parameters is a good practice that enhances code readability and enables better IDE support.

49-49: Type hint for CheckpointJob constructor.

Consistent with the additions to the Checkpoint class, adding type hints to the CheckpointJob constructor improves code quality.

tests/test_checkpoints_many/Snakefile (1)

1-78: Well-structured test Snakefile for multiple checkpoints.

This Snakefile creates a comprehensive test case with multiple checkpoints in sequence, which is excellent for verifying the fix. The workflow tests a variety of checkpoint scenarios:

Using multiple checkpoints in sequence

Glob-based dynamic output discovery

Nested wildcard propagation

Multiple output files from checkpoints

The aggregate function properly validates that checkpoint outputs exist and combines the results from both checkpoints.

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

tests/test_checkpoints_many/Snakefile (1)
74-74: Missing line termination.

Line 74 appears to be incomplete, lacking a newline character at the end of the file. This could potentially cause issues with certain tools or systems that expect files to end with a newline.
- touch("collect/{sample}/all_done.txt"),
+ touch("collect/{sample}/all_done.txt"),
+

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between d7d5730 and 6ecc84e.

📒 Files selected for processing (4)

snakemake/checkpoints.py (4 hunks)
snakemake/dag.py (0 hunks)
tests/test_checkpoints_many/Snakefile (1 hunks)
tests/tests.py (1 hunks)

💤 Files with no reviewable changes (1)

snakemake/dag.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/tests.py
snakemake/checkpoints.py

⏰ Context from checks skipped due to timeout of 90000ms (40)

GitHub Check: testing (10, 3.12, dash)
GitHub Check: testing (10, 3.12, bash)
GitHub Check: testing (10, 3.11, bash)
GitHub Check: testing (9, 3.12, dash)
GitHub Check: testing (9, 3.12, bash)
GitHub Check: testing (9, 3.11, bash)
GitHub Check: testing (8, 3.12, dash)
GitHub Check: testing (8, 3.12, bash)
GitHub Check: testing (8, 3.11, bash)
GitHub Check: testing (7, 3.12, dash)
GitHub Check: testing (7, 3.12, bash)
GitHub Check: testing (7, 3.11, bash)
GitHub Check: testing (6, 3.12, dash)
GitHub Check: testing (6, 3.12, bash)
GitHub Check: testing (6, 3.11, bash)
GitHub Check: testing (5, 3.12, dash)
GitHub Check: testing (5, 3.12, bash)
GitHub Check: testing (5, 3.11, bash)
GitHub Check: testing (4, 3.12, dash)
GitHub Check: testing (4, 3.12, bash)
GitHub Check: testing (4, 3.11, bash)
GitHub Check: testing (3, 3.12, dash)
GitHub Check: testing-windows (10)
GitHub Check: testing (3, 3.12, bash)
GitHub Check: testing-windows (9)
GitHub Check: testing (3, 3.11, bash)
GitHub Check: testing-windows (8)
GitHub Check: testing (2, 3.12, dash)
GitHub Check: testing-windows (7)
GitHub Check: testing (2, 3.12, bash)
GitHub Check: testing-windows (6)
GitHub Check: testing (2, 3.11, bash)
GitHub Check: testing-windows (5)
GitHub Check: testing (1, 3.12, dash)
GitHub Check: testing-windows (4)
GitHub Check: testing (1, 3.12, bash)
GitHub Check: testing-windows (3)
GitHub Check: testing (1, 3.11, bash)
GitHub Check: testing-windows (2)
GitHub Check: testing-windows (1)

🔇 Additional comments (2)

tests/test_checkpoints_many/Snakefile (2)

50-67: The aggregate function provides thorough checkpoint validation.

This function effectively demonstrates the intended usage pattern for checkpoint management, traversing through outputs from both checkpoints and creating a comprehensive collection of dependent files. The assertion on line 59 is particularly valuable for testing, ensuring that the expected checkpoint outputs exist.

1-74: Well-structured test for validating checkpoint functionality.

This test file effectively validates the PR objective of checking whether all outputs specified by rules are created, rather than relying on future_output. The workflow creates a multi-step dependency chain with two checkpoints and demonstrates proper aggregation of outputs.

The test structure enables validation of the checkpoint functionality by:

Creating initial sample files

Generating nested directory structures in two checkpoint stages

Copying and aggregating the results

Verifying all expected outputs exist

This provides good coverage for testing the changes made in PR #3341 to address issue #3036.

tests/test_checkpoints_many/Snakefile

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (3)

tests/test_checkpoints_many/Snakefile (3)
3-3: Remove unused import.

The random module is imported but never used in this Snakefile.
 import glob
-import random
 from pathlib import Path
15-15: Consider using expand() for output files.

Using a list directly as output is unusual in Snakemake. For consistency and clarity, consider using the expand() function or a list comprehension here.
-    output:
-        ALL_SAMPLES,
+    output:
+        expand("{sample}", sample=ALL_SAMPLES),
53-55: Remove unnecessary blank line.

You have an empty line between related operations. For better code readability, consider removing this blank line.
    outputs_i = glob.glob(f"{checkpoints.first.get(**wildcards).output}/*/")
-
    outputs_i = [output.split("/")[-2] for output in outputs_i]

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6ecc84e and dfe767a.

📒 Files selected for processing (4)

snakemake/checkpoints.py (4 hunks)
snakemake/dag.py (0 hunks)
tests/test_checkpoints_many/Snakefile (1 hunks)
tests/tests.py (1 hunks)

💤 Files with no reviewable changes (1)

snakemake/dag.py

🚧 Files skipped from review as they are similar to previous changes (2)

tests/tests.py
snakemake/checkpoints.py

⏰ Context from checks skipped due to timeout of 90000ms (40)

GitHub Check: testing (10, 3.12, dash)
GitHub Check: testing (10, 3.12, bash)
GitHub Check: testing (10, 3.11, bash)
GitHub Check: testing (9, 3.12, dash)
GitHub Check: testing (9, 3.12, bash)
GitHub Check: testing (9, 3.11, bash)
GitHub Check: testing (8, 3.12, dash)
GitHub Check: testing (8, 3.12, bash)
GitHub Check: testing (8, 3.11, bash)
GitHub Check: testing (7, 3.12, dash)
GitHub Check: testing (7, 3.12, bash)
GitHub Check: testing (7, 3.11, bash)
GitHub Check: testing (6, 3.12, dash)
GitHub Check: testing (6, 3.12, bash)
GitHub Check: testing (6, 3.11, bash)
GitHub Check: testing (5, 3.12, dash)
GitHub Check: testing (5, 3.12, bash)
GitHub Check: testing (5, 3.11, bash)
GitHub Check: testing (4, 3.12, dash)
GitHub Check: testing (4, 3.12, bash)
GitHub Check: testing (4, 3.11, bash)
GitHub Check: testing (3, 3.12, dash)
GitHub Check: testing-windows (10)
GitHub Check: testing (3, 3.12, bash)
GitHub Check: testing-windows (9)
GitHub Check: testing (3, 3.11, bash)
GitHub Check: testing-windows (8)
GitHub Check: testing (2, 3.12, dash)
GitHub Check: testing-windows (7)
GitHub Check: testing (2, 3.12, bash)
GitHub Check: testing-windows (6)
GitHub Check: testing (2, 3.11, bash)
GitHub Check: testing-windows (5)
GitHub Check: testing (1, 3.12, dash)
GitHub Check: testing-windows (4)
GitHub Check: testing (1, 3.12, bash)
GitHub Check: testing-windows (3)
GitHub Check: testing (1, 3.11, bash)
GitHub Check: testing-windows (2)
GitHub Check: testing-windows (1)

🔇 Additional comments (2)

tests/test_checkpoints_many/Snakefile (2)

52-52: Fix spacing in the get method parameter.

There's a syntax error with the extra space between the ** operator and wildcards. This would cause a syntax error when executing the workflow.
- outputs_i = glob.glob(f"{checkpoints.first.get(** wildcards).output}/*/")
+ outputs_i = glob.glob(f"{checkpoints.first.get(**wildcards).output}/*/")
56-66: The aggregate function correctly tests the checkpoint mechanism.

This implementation aligns well with the PR objective to directly check whether all outputs specified by the rule are created. It properly verifies the existence of outputs from both checkpoints and validates that the workflow is correctly propagating wildcards between checkpoint stages.

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (6)

tests/test_checkpoints_many/Snakefile (6)
1-3: Unused import detected.

The random module is imported but not used anywhere in this Snakefile.
 import glob
-import random
 from pathlib import Path
15-15: Consider using a more explicit output specification.

Using the variable directly as output is a bit unusual and might be confusing. Consider using a more explicit pattern that shows these are files.
-        ALL_SAMPLES,
+        expand("{sample}", sample=ALL_SAMPLES),
28-29: Good use of Path for directory creation, but consider adding error handling.

The code correctly uses Path().mkdir() with parents=True to create directories. Consider adding error handling for potential I/O errors.
-            Path(f"{output[0]}/{i}").mkdir(parents=True, exist_ok=True)
-            Path(f"{output[0]}/{i}/test.txt").touch()
+            try:
+                Path(f"{output[0]}/{i}").mkdir(parents=True, exist_ok=True)
+                Path(f"{output[0]}/{i}/test.txt").touch()
+            except IOError as e:
+                raise WorkflowError(f"Error creating checkpoint output: {e}")
58-59: Good checkpoint verification but consider enhancing error message.

The assertion verifies that the checkpoint output exists, which aligns with the PR objective of checking all required outputs. Consider adding a more descriptive error message.
-        assert Path(s2out).exists()
+        assert Path(s2out).exists(), f"Checkpoint output directory '{s2out}' does not exist"
63-65: Consider simplifying the expand pattern.

The expand call could be simplified since you're only expanding a single wildcard.
-            split_files.extend(
-                expand(f"copy/{{sample}}/{i}/{j}/test2.txt", sample=wildcards.sample)
-            )
+            split_files.append(f"copy/{wildcards.sample}/{i}/{j}/test2.txt")
73-74: Missing newline at end of file.

Add a newline at the end of the file to follow standard file format conventions.
        touch("collect/{sample}/all_done.txt"),
+

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dfe767a and b04742f.

📒 Files selected for processing (1)

tests/test_checkpoints_many/Snakefile (1 hunks)

⏰ Context from checks skipped due to timeout of 90000ms (18)

GitHub Check: testing (10, 3.12, dash)
GitHub Check: testing (10, 3.12, bash)
GitHub Check: testing (10, 3.11, bash)
GitHub Check: testing-windows (10)
GitHub Check: testing-windows (9)
GitHub Check: testing-windows (8)
GitHub Check: testing-windows (7)
GitHub Check: testing (2, 3.12, bash)
GitHub Check: testing-windows (6)
GitHub Check: testing (2, 3.11, bash)
GitHub Check: testing-windows (5)
GitHub Check: testing (1, 3.12, dash)
GitHub Check: testing-windows (4)
GitHub Check: testing (1, 3.12, bash)
GitHub Check: testing-windows (3)
GitHub Check: testing (1, 3.11, bash)
GitHub Check: testing-windows (2)
GitHub Check: testing-windows (1)

🔇 Additional comments (2)

tests/test_checkpoints_many/Snakefile (2)

52-52: Fix spacing in the get method parameter.

There's a syntax error with the extra space between the ** operator and wildcards. This would cause a syntax error when executing the workflow.
- outputs_i = glob.glob(f"{checkpoints.first.get(** wildcards).output}/*/")
+ outputs_i = glob.glob(f"{checkpoints.first.get(**wildcards).output}/*/")
50-66: Excellent implementation of checkpoint dependency tracking.

The aggregate function effectively implements the PR's objective by directly checking that required outputs from checkpoints exist. It correctly uses checkpoint.get().output to access the outputs and verifies their existence with Path().exists(), which is the core change mentioned in the PR objectives.

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🧹 Nitpick comments (9)

.github/workflows/main.yml (2)

52-55: Empty 'exclude' Section

The exclude: [] block is currently empty. Static analysis (actionlint) signals that an empty exclude section may be unintentional or unnecessary. If no exclusions are needed, consider removing this block to simplify the configuration and avoid potential warnings. Otherwise, populate it with the intended exclusions.

🧰 Tools

🪛 actionlint (1.7.4)

52-52: "exclude" section should not be empty

(syntax-check)

🪛 YAMLlint (1.35.1)

[error] 53-53: trailing spaces

(trailing-spaces)

211-218: Addition of 'testing-done' Job and Final Newline Reminder

The new testing-done job serves as a final confirmation step by echoing "All tests passed" when the preceding jobs (testing and testing-windows) complete successfully. This addition enhances clarity in the CI pipeline.

Note: YAMLlint reported a missing newline at the end of the file (line 218). Please ensure that a newline character is added at the end to comply with the YAML specification and avoid linting warnings.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 218-218: no new line character at the end of file

(new-line-at-end-of-file)
snakemake/workflow.py (1)
1813-1814: Minor style improvement suggestion.

The inequality comparison with False could be replaced with a more Pythonic truth check.
-if not ruleinfo.template_engine and ruleinfo.container_img != False:
+if not ruleinfo.template_engine and ruleinfo.container_img:
🧰 Tools

🪛 Ruff (0.8.2)

1813-1813: Avoid inequality comparisons to False; use if ruleinfo.container_img: for truth checks

Replace with ruleinfo.container_img

(E712)
snakemake/ioutils/input.py (1)
18-28: Consider handling missing files more gracefully.

The current implementation will raise a KeyError if the specified file is not found in the checksum file, which might not provide a clear error message to users. Consider adding a more specific error handling.
         return (
             pd.read_csv(
                 infile,
                 sep="  ",
                 header=None,
                 engine="python",
                 converters={1: fix_file_name},
             )
             .set_index(1)
-            .loc[fix_file_name(kwargs.get("file"))]
-            .item()
+            .loc[fix_file_name(kwargs.get("file")), 0]
+            if fix_file_name(kwargs.get("file")) in pd.read_csv(
+                infile,
+                sep="  ",
+                header=None,
+                engine="python",
+                converters={1: fix_file_name},
+            ).set_index(1).index else
+            raise WorkflowError(f"File {kwargs.get('file')} not found in checksum file {infile}")
         )
docs/snakefiles/rules.rst (2)

550-567: Review of parse_input Documentation
The new documentation for the parse_input function is clear and concise. It explains the purpose, signature, and usage with an example. Consider explicitly mentioning that the third parameter, kwargs, is expected to be a dictionary (or that it supports arbitrary keyword arguments via **kwargs) for even greater clarity.

573-589: Review of extract_checksum Documentation
This section clearly outlines the purpose and signature of the extract_checksum function along with an example usage. It might be beneficial to briefly note (or link to) which checksum algorithm is applied, if that information is available, to help users understand potential performance or compatibility considerations.

docs/snakefiles/reporting.rst (3)

104-110: Defining File Labels Section
This section introduces the concept of assigning human‐friendly labels to output files. The explanation and accompanying example (the modified rule “b”) help illustrate how one can abstract away technical file details in the report. Consider adding a brief note on the benefits of this approach for nontechnical report consumers.

141-143: Dynamic Determination of Categories and Labels
The explanation of how to dynamically determine category, subcategory, and labels via functions is comprehensive and informative. The header is descriptive, although its length might be reduced for brevity.

151-153: Subsection: From Captions
The “From captions” subsection is a concise introduction to the hyperlink mechanism for captions. It could be even more helpful with a small inline example or a direct reference to further detailed documentation.

🛑 Comments failed to post (1)

snakemake/ioutils/input.py (1)

12-30: ⚠️ Potential issue

The extract_checksum function needs some improvements.

There are several issues with this implementation:

Missing import for WorkflowError
Using lambda for function definition instead of a regular def
Missing exception chaining for better error tracking
No docstring explaining the function's purpose and parameters

+from snakemake.exceptions import WorkflowError

 def extract_checksum(infile, **kwargs):
+    """Extract checksum from a file for a specified file.
+    
+    Args:
+        infile: Path to the file containing checksums
+        **kwargs: Additional arguments, including 'file' to specify target filename
+        
+    Returns:
+        The checksum for the specified file
+    """
     try:
         import pandas as pd
 
-        fix_file_name = lambda x: x.removeprefix("./")
+        def fix_file_name(x):
+            return x.removeprefix("./")
+            
         return (
             pd.read_csv(
                 infile,
                 sep="  ",
                 header=None,
                 engine="python",
                 converters={1: fix_file_name},
             )
             .set_index(1)
             .loc[fix_file_name(kwargs.get("file"))]
             .item()
         )
     except ImportError as err:
-        raise WorkflowError("Pandas is required to extract checksum from file.")
+        raise WorkflowError("Pandas is required to extract checksum from file.") from err

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from snakemake.exceptions import WorkflowError

def extract_checksum(infile, **kwargs):
    """Extract checksum from a file for a specified file.
    
    Args:
        infile: Path to the file containing checksums
        **kwargs: Additional arguments, including 'file' to specify target filename
        
    Returns:
        The checksum for the specified file
    """
    try:
        import pandas as pd

        def fix_file_name(x):
            return x.removeprefix("./")
            
        return (
            pd.read_csv(
                infile,
                sep="  ",
                header=None,
                engine="python",
                converters={1: fix_file_name},
            )
            .set_index(1)
            .loc[fix_file_name(kwargs.get("file"))]
            .item()
        )
    except ImportError as err:
        raise WorkflowError("Pandas is required to extract checksum from file.") from err

🧰 Tools

🪛 Ruff (0.8.2)

16-16: Do not assign a lambda expression, use a def

Rewrite fix_file_name as a def

(E731)

30-30: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

30-30: Undefined name WorkflowError

(F821)

coderabbitai

Caution

Inline review comments failed to post. This is likely due to GitHub's limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1

🧹 Nitpick comments (10)

snakemake/report/html_reporter/data/packages.py (1)

41-185: Avoid manual duplication of asset references.
All these newly introduced packages match the corresponding entries within snakemake/assets/__init__.py. To reduce the risk of mismatches, you could generate these references programmatically from Assets.spec, ensuring both files stay in sync as you update versions and checksums.
snakemake/ioutils/input.py (1)
1-9: Implementation looks good, but consider adding type hints and docstring

The parse_input function implementation is correct and provides a clean way to parse input files with a custom parser. Consider adding docstring and type hints to improve code readability and IDE support.
-def parse_input(infile, parser, **kwargs):
+def parse_input(infile, parser=None, **kwargs):
+    """Parse an input file with an optional parser function.
+    
+    Args:
+        infile: Path to the input file
+        parser: Optional function to parse the file content
+        **kwargs: Additional arguments passed to the parser
+        
+    Returns:
+        A function that reads the file and applies the parser if provided
+    """
     def inner(wildcards, input, output):
         with open(infile, "r") as fh:
             if parser is None:
                 return fh.read().strip()
             else:
                 return parser(fh, **kwargs)

     return inner
tests/test_validate/Snakefile (1)

41-50: Partial validation of Polars LazyFrame.

Validating only the first 1000 records is acceptable for large datasets, but consider whether additional rows need checking.
snakemake/utils.py (3)
118-149: Pandas DataFrame row-by-row validation is solid.

One minor improvement is to raise exceptions using Python’s preferred exception chaining.

Apply this diff in the except clause to follow best practices:
-raise WorkflowError(
-    f"Error validating row {i} of data frame.", e
-)
+raise WorkflowError(
+    f"Error validating row {i} of data frame."
+) from e
🧰 Tools

🪛 Ruff (0.8.2)

132-134: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

150-218: Polars DataFrame validation approach is well-structured.

Similar improvement for exception chaining is recommended in both exception blocks:
-raise WorkflowError(
-    f"Error validating row {i} of data frame.", e
-)
+raise WorkflowError(
+    f"Error validating row {i} of data frame."
+) from e
And similarly for the lazy validation block:
-raise WorkflowError(
-    f"Error validating row {i} of data frame.", e
-)
+raise WorkflowError(
+    f"Error validating row {i} of data frame."
+) from e
🧰 Tools

🪛 Ruff (0.8.2)

168-170: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

206-208: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

219-226: Config dictionary validation is straightforward.

Again, consider adopting the “raise ... from e” format for clarity:
-raise WorkflowError("Error validating config file.", e)
+raise WorkflowError("Error validating config file.") from e
🧰 Tools

🪛 Ruff (0.8.2)

224-224: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)
.github/workflows/main.yml (4)

51-54: Empty exclude List and Commented-Out Shell Option

An empty exclude: [] is now present, and the option for the dash shell has been commented out. Note that the static analysis tool flagged an issue with an empty exclude; if this key isn’t needed, consider removing it or adding an inline comment to clarify its purpose.

🧰 Tools

🪛 actionlint (1.7.4)

52-52: "exclude" section should not be empty

(syntax-check)

🪛 YAMLlint (1.35.1)

[error] 53-53: trailing spaces

(trailing-spaces)

211-218: New "testing-done" Job Addition

The new testing-done job, which echoes "All tests passed," follows the same conditional logic and depends on both the testing and testing-windows jobs. This addition appears useful for signaling overall test success, but ensure that its placement in the dependency chain doesn’t accidentally mask upstream failures.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 218-218: no new line character at the end of file

(new-line-at-end-of-file)

53-53: Trailing Spaces Detected

YAMLlint reported trailing spaces on this line. Please remove any trailing whitespace to conform to the YAML formatting standards.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 53-53: trailing spaces

(trailing-spaces)

218-218: Missing Newline at End of File

A newline character is missing at the end of this file. Adding a newline will help avoid potential issues with some parsers or tools that expect a final newline.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 218-218: no new line character at the end of file

(new-line-at-end-of-file)

🛑 Comments failed to post (1)

snakemake/ioutils/input.py (1)

12-30: ⚠️ Potential issue

Fix the undefined WorkflowError and lambda expression issues

This function has several issues that need to be addressed:

WorkflowError is undefined - need to import it
Using lambda assignment is discouraged (E731)
Exception handling should use from err syntax
Consider adding docstring and type hints

+from snakemake.exceptions import WorkflowError

 def extract_checksum(infile, **kwargs):
+    """Extract checksum from a file for a given filename.
+    
+    Args:
+        infile: Path to the checksum file
+        **kwargs: Must contain 'file' key with the filename to look up
+        
+    Returns:
+        The checksum for the specified file
+        
+    Raises:
+        WorkflowError: If pandas is not available
+    """
     try:
         import pandas as pd

-        fix_file_name = lambda x: x.removeprefix("./")
+        def fix_file_name(x):
+            return x.removeprefix("./")
+            
         return (
             pd.read_csv(
                 infile,
                 sep="  ",
                 header=None,
                 engine="python",
                 converters={1: fix_file_name},
             )
             .set_index(1)
             .loc[fix_file_name(kwargs.get("file"))]
             .item()
         )
-    except ImportError:
-        raise WorkflowError("Pandas is required to extract checksum from file.")
+    except ImportError as err:
+        raise WorkflowError("Pandas is required to extract checksum from file.") from err

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

from snakemake.exceptions import WorkflowError

def extract_checksum(infile, **kwargs):
    """Extract checksum from a file for a given filename.
    
    Args:
        infile: Path to the checksum file
        **kwargs: Must contain 'file' key with the filename to look up
        
    Returns:
        The checksum for the specified file
        
    Raises:
        WorkflowError: If pandas is not available
    """
    try:
        import pandas as pd

        def fix_file_name(x):
            return x.removeprefix("./")
            
        return (
            pd.read_csv(
                infile,
                sep="  ",
                header=None,
                engine="python",
                converters={1: fix_file_name},
            )
            .set_index(1)
            .loc[fix_file_name(kwargs.get("file"))]
            .item()
        )
    except ImportError as err:
        raise WorkflowError("Pandas is required to extract checksum from file.") from err

🧰 Tools

🪛 Ruff (0.8.2)

16-16: Do not assign a lambda expression, use a def

Rewrite fix_file_name as a def

(E731)

30-30: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

30-30: Undefined name WorkflowError

(F821)

… pr/Hocnonsense/3341

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

snakemake/ioutils/input.py (1)
12-31: ⚠️ Potential issue

Missing exception handling and imports in extract_checksum function

The function contains several issues that need to be addressed:

WorkflowError is undefined - the import is missing

Lambda function is assigned to a variable instead of using a proper function definition

Exception should be raised with from err syntax to preserve the traceback

Apply these fixes:
+from snakemake.exceptions import WorkflowError

def extract_checksum(infile, **kwargs):
    try:
        import pandas as pd

-        fix_file_name = lambda x: x.removeprefix("./")
+        def fix_file_name(x):
+            return x.removeprefix("./")
        return (
            pd.read_csv(
                infile,
                sep="  ",
                header=None,
                engine="python",
                converters={1: fix_file_name},
            )
            .set_index(1)
            .loc[fix_file_name(kwargs.get("file"))]
            .item()
        )
    except ImportError as err:
-        raise WorkflowError("Pandas is required to extract checksum from file.")
+        raise WorkflowError("Pandas is required to extract checksum from file.") from err
🧰 Tools

🪛 Ruff (0.8.2)

16-16: Do not assign a lambda expression, use a def

Rewrite fix_file_name as a def

(E731)

30-30: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

30-30: Undefined name WorkflowError

(F821)

♻️ Duplicate comments (1)

tests/test_checkpoints_many/Snakefile (1)
56-56: ⚠️ Potential issue

Fix spacing in the get method parameter.

There's a syntax error with the extra space between the ** operator and wildcards. This would cause a syntax error when executing the workflow.
- outputs_i = glob.glob(f"{checkpoints.first.get(** wildcards).output}/*/")
+ outputs_i = glob.glob(f"{checkpoints.first.get(**wildcards).output}/*/")

🧹 Nitpick comments (9)

.github/workflows/main.yml (2)
50-54: Review the Matrix Configuration Exclude Section and Related Comments.

Within the testing job’s matrix configuration, an empty exclusion list is specified:
exclude: []
Additionally, there is a commented-out dash shell configuration. If no exclusions are required, consider removing the exclude key entirely for clarity. Also, static analysis identified trailing spaces on line 53—please remove these to comply with YAML formatting guidelines.

🧰 Tools

🪛 actionlint (1.7.4)

52-52: "exclude" section should not be empty

(syntax-check)

🪛 YAMLlint (1.35.1)

[error] 53-53: trailing spaces

(trailing-spaces)

211-218: New Testing-Done Job: Final Output Notification and Formatting Note.

The newly added testing-done job is a clear and concise way to notify that all tests have passed. Its conditional execution is consistent with the other jobs. However, static analysis indicates that there is no newline at the end of the file (line 218). Please add a newline for YAML compliance.

🧰 Tools

🪛 YAMLlint (1.35.1)

[error] 218-218: no new line character at the end of file

(new-line-at-end-of-file)
snakemake/workflow.py (1)
1813-1816: Consider using a more pythonic condition check

The code is using an inequality comparison with False, which is not the recommended Python style.
-if not ruleinfo.template_engine and ruleinfo.container_img != False:
+if not ruleinfo.template_engine and ruleinfo.container_img is not False:
Alternative approach:
-if not ruleinfo.template_engine and ruleinfo.container_img != False:
+if not ruleinfo.template_engine and ruleinfo.container_img:
Note: The second approach assumes that False is the only falsy value that should be treated differently from other falsy values (like None or empty strings).

🧰 Tools

🪛 Ruff (0.8.2)

1813-1813: Avoid inequality comparisons to False; use if ruleinfo.container_img: for truth checks

Replace with ruleinfo.container_img

(E712)
tests/test_checkpoints_many/Snakefile (2)
54-71: Consider adding error handling to the aggregate function.

While the function includes an assertion to verify path existence, it could benefit from more robust error handling, especially for the glob operations which may return empty lists.
def aggregate(wildcards):
    outputs_i = glob.glob(f"{checkpoints.first.get(**wildcards).output}/*/")
+    if not outputs_i:
+        raise ValueError(f"No outputs found for first checkpoint with wildcards {wildcards}")
    
    outputs_i = [output.split("/")[-2] for output in outputs_i]
    
    split_files = []
    for i in outputs_i:
        s2out = checkpoints.second.get(**wildcards, i=i).output[0]
        assert Path(s2out).exists()
        output_j = glob.glob(f"{s2out}/*/")
+        if not output_j:
+            raise ValueError(f"No outputs found for second checkpoint with wildcards {wildcards}, i={i}")
        
        outputs_j = [output.split("/")[-2] for output in output_j]
        for j in outputs_j:
            split_files.extend(
                expand(f"copy/{{sample}}/{i}/{j}/test2.txt", sample=wildcards.sample)
            )
+    if not split_files:
+        raise ValueError(f"No files to aggregate for wildcards {wildcards}")
    return split_files
43-51: Consider using a more efficient copying mechanism.

For copying files, using Python's shutil might be more cross-platform compatible than relying on the shell cp command.
rule copy:
    input:
        "second/{sample}/{i}/{j}/test2.txt",
    output:
        "copy/{sample}/{i}/{j}/test2.txt",
-    shell:
-        """
-        cp -f {input} {output}
-        """
+    run:
+        import shutil
+        Path(output[0]).parent.mkdir(parents=True, exist_ok=True)
+        shutil.copy2(input[0], output[0])
snakemake/utils.py (4)
132-134: Use exception chaining for clarity.

When re-raising an exception in an except block, you can use from e to clearly chain the original error:
- raise WorkflowError(f"Error validating row {i} of data frame.", e)
+ raise WorkflowError(f"Error validating row {i} of data frame.") from e
🧰 Tools

🪛 Ruff (0.8.2)

132-134: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

168-170: Use exception chaining here as well.

Similar to the previous suggestion, applying explicit exception chaining helps clarify error origins:
- raise WorkflowError(f"Error validating row {i} of data frame.", e)
+ raise WorkflowError(f"Error validating row {i} of data frame.") from e
🧰 Tools

🪛 Ruff (0.8.2)

168-170: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

206-208: Apply exception chaining for completeness.

Following the same pattern ensures consistent error propagation and traceability:
- raise WorkflowError(f"Error validating row {i} of data frame.", e)
+ raise WorkflowError(f"Error validating row {i} of data frame.") from e
🧰 Tools

🪛 Ruff (0.8.2)

206-208: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

224-224: Consider explicit exception chaining for config file error.

Likewise, you can chain the original exception to retain context:
- raise WorkflowError("Error validating config file.", e)
+ raise WorkflowError("Error validating config file.") from e
🧰 Tools

🪛 Ruff (0.8.2)

224-224: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6717b95 and 6f39e6b.

⛔ Files ignored due to path filters (7)

CHANGELOG.md is excluded by !CHANGELOG.md
pyproject.toml is excluded by !pyproject.toml
tests/test_script_xsh/expected-results/test.out is excluded by !**/*.out
CHANGELOG.md is excluded by !CHANGELOG.md
pyproject.toml is excluded by !pyproject.toml
tests/test_validate/samples.tsv is excluded by !**/*.tsv
tests/test_validate/samples.tsv is excluded by !**/*.tsv

📒 Files selected for processing (72)

snakemake/checkpoints.py (4 hunks)
snakemake/dag.py (0 hunks)
tests/test_checkpoints_many/Snakefile (1 hunks)
tests/tests.py (1 hunks)
snakemake/checkpoints.py (2 hunks)
tests/test_checkpoints_many/Snakefile (1 hunks)
tests/test_checkpoints_many/Snakefile (1 hunks)
.github/workflows/codespell.yml (1 hunks)
.github/workflows/docs.yml (2 hunks)
.github/workflows/main.yml (6 hunks)
docs/snakefiles/deployment.rst (2 hunks)
docs/snakefiles/reporting.rst (5 hunks)
docs/snakefiles/rules.rst (3 hunks)
setup.py (1 hunks)
snakemake/cli.py (2 hunks)
snakemake/ioutils/__init__.py (2 hunks)
snakemake/ioutils/input.py (1 hunks)
snakemake/script/__init__.py (3 hunks)
snakemake/workflow.py (2 hunks)
test-environment.yml (2 hunks)
tests/test_conda_python_3_7_script/Snakefile (1 hunks)
tests/test_conda_python_3_7_script/test_script_python_3_7.py (1 hunks)
tests/test_conda_run/Snakefile (1 hunks)
tests/test_conda_run/expected-results/test.txt (1 hunks)
tests/test_conda_run/test_python_env.yaml (1 hunks)
tests/test_conda_run/test_script_run.py (1 hunks)
tests/test_ioutils/Snakefile (4 hunks)
tests/test_ioutils/expected-results/c/1.txt (1 hunks)
tests/test_ioutils/expected-results/results/switch~someswitch.column~sample.txt (1 hunks)
tests/test_ioutils/samples.md5 (1 hunks)
tests/test_script_xsh/Snakefile (1 hunks)
tests/test_script_xsh/envs/xonsh.yaml (1 hunks)
tests/test_script_xsh/scripts/test.xsh (1 hunks)
tests/tests_using_conda.py (1 hunks)
snakemake/checkpoints.py (1 hunks)
snakemake/dag.py (0 hunks)
tests/tests.py (1 hunks)
.github/workflows/codespell.yml (1 hunks)
.github/workflows/docs.yml (2 hunks)
.github/workflows/main.yml (6 hunks)
docs/project_info/codebase.rst (1 hunks)
docs/project_info/contributing.rst (2 hunks)
docs/snakefiles/configuration.rst (1 hunks)
docs/snakefiles/rules.rst (2 hunks)
setup.py (1 hunks)
snakemake/assets/__init__.py (4 hunks)
snakemake/cli.py (1 hunks)
snakemake/dag.py (1 hunks)
snakemake/ioutils/__init__.py (2 hunks)
snakemake/ioutils/input.py (1 hunks)
snakemake/report/html_reporter/data/packages.py (1 hunks)
snakemake/utils.py (1 hunks)
snakemake/workflow.py (1 hunks)
test-environment.yml (2 hunks)
tests/test_ioutils/Snakefile (4 hunks)
tests/test_ioutils/expected-results/c/1.txt (1 hunks)
tests/test_ioutils/expected-results/results/switch~someswitch.column~sample.txt (1 hunks)
tests/test_ioutils/samples.md5 (1 hunks)
tests/test_validate/Snakefile (1 hunks)
tests/test_validate/samples.schema.yaml (1 hunks)
docs/project_info/codebase.rst (1 hunks)
docs/project_info/contributing.rst (2 hunks)
docs/snakefiles/configuration.rst (1 hunks)
snakemake/assets/__init__.py (4 hunks)
snakemake/checkpoints.py (0 hunks)
snakemake/dag.py (2 hunks)
snakemake/report/html_reporter/data/packages.py (1 hunks)
snakemake/utils.py (1 hunks)
tests/test_issue1092/Snakefile (1 hunks)
tests/test_validate/Snakefile (1 hunks)
tests/test_validate/samples.schema.yaml (1 hunks)
tests/test_issue1092/Snakefile (1 hunks)

✅ Files skipped from review due to trivial changes (7)

tests/test_conda_run/test_script_run.py
tests/test_script_xsh/envs/xonsh.yaml
tests/test_script_xsh/scripts/test.xsh
tests/test_conda_run/expected-results/test.txt
tests/test_conda_run/test_python_env.yaml
tests/test_conda_python_3_7_script/test_script_python_3_7.py
tests/test_checkpoints_many/Snakefile

🚧 Files skipped from review as they are similar to previous changes (35)

docs/project_info/codebase.rst
tests/test_ioutils/expected-results/c/1.txt
.github/workflows/codespell.yml
docs/project_info/codebase.rst
.github/workflows/codespell.yml
tests/test_ioutils/expected-results/results/switch~~someswitch.column~~sample.txt
.github/workflows/docs.yml
tests/test_ioutils/expected-results/results/switch~~someswitch.column~~sample.txt
setup.py
docs/snakefiles/configuration.rst
tests/test_ioutils/expected-results/c/1.txt
snakemake/checkpoints.py
snakemake/cli.py
tests/tests.py
docs/snakefiles/configuration.rst
.github/workflows/docs.yml
setup.py
snakemake/workflow.py
snakemake/cli.py
test-environment.yml
snakemake/dag.py
snakemake/ioutils/init.py
snakemake/ioutils/init.py
tests/tests.py
tests/test_ioutils/samples.md5
docs/project_info/contributing.rst
tests/test_validate/samples.schema.yaml
snakemake/dag.py
snakemake/dag.py
snakemake/checkpoints.py
.github/workflows/main.yml
test-environment.yml
tests/test_validate/samples.schema.yaml
tests/test_ioutils/samples.md5
snakemake/ioutils/input.py

🧰 Additional context used

📓 Path-based instructions (1)

`**/*.py`: Do not try to improve formatting. Do not suggest ...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

tests/tests_using_conda.py
snakemake/dag.py
snakemake/utils.py
snakemake/workflow.py
snakemake/checkpoints.py
snakemake/script/__init__.py
snakemake/assets/__init__.py
snakemake/report/html_reporter/data/packages.py
snakemake/ioutils/input.py

🪛 GitHub Actions: CI

snakemake/dag.py

[error] 272-272: Black formatting check failed. 1 file would be reformatted. Please run 'black' to fix code style issues in this file.

🪛 Ruff (0.8.2)

snakemake/utils.py

132-134: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

168-170: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

206-208: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

224-224: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

132-134: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

168-170: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

206-208: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

224-224: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

snakemake/workflow.py

1813-1813: Avoid inequality comparisons to False; use if ruleinfo.container_img: for truth checks

Replace with ruleinfo.container_img

(E712)

snakemake/ioutils/input.py

16-16: Do not assign a lambda expression, use a def

Rewrite fix_file_name as a def

(E731)

30-30: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

30-30: Undefined name WorkflowError

(F821)

🪛 actionlint (1.7.4)

.github/workflows/main.yml

52-52: "exclude" section should not be empty

(syntax-check)

🪛 YAMLlint (1.35.1)

.github/workflows/main.yml

[error] 53-53: trailing spaces

(trailing-spaces)

[error] 218-218: no new line character at the end of file

(new-line-at-end-of-file)

🔇 Additional comments (57)

.github/workflows/main.yml (4)

15-16: Ensure Consistent Conditional Execution for CI Jobs.

The condition
if: github.event.pull_request.merged != true || github.ref != 'refs/heads/main'
in the formatting job correctly restricts execution when the pull request has been merged or when running on the main branch. Please verify that this logic fully aligns with your intended CI trigger scenarios.

41-42: Revalidate the Conditional Check in the Testing Job.

The testing job now uses the same conditional as the formatting job. Confirm that this condition properly prevents job execution on merged PRs or on the main branch, as intended by your PR objectives.

156-157: Validate Conditional Execution in the Build-Container Image Job.

The build-container-image job now also includes:
if: github.event.pull_request.merged != true || github.ref != 'refs/heads/main'
This is consistent with the approach used in other jobs. Please confirm that this condition is correctly applied for your CI needs.

166-167: Confirm the Conditional Check in the Testing-Windows Job.

The testing-windows job leverages the same conditional check. Ensure that this condition maintains the desired behavior across all platforms (in this case, Windows).
tests/test_conda_python_3_7_script/Snakefile (1)

7-7: Script file path update looks good

The script file reference has been updated from "test_script.py" to "test_script_python_3_7.py", which appears to be providing a more specific Python 3.7 version of the test script. This change matches the testing approach that's being used in the other test files being added/modified in this PR.

tests/test_script_xsh/Snakefile (1)

1-12: New test workflow for xonsh script execution looks good

This new Snakefile creates a proper test workflow with clearly defined input/output dependencies. The file contains:

An "all" rule that establishes the final expected output

A "test_xonsh" rule that uses a conda environment and executes a xonsh script

The structure follows Snakemake best practices by clearly specifying the output and using the script directive to run the xonsh script within the conda environment defined in envs/xonsh.yaml.

tests/test_conda_run/Snakefile (1)

1-13: New test for conda script execution with explanatory comments

This new rule definition provides a good test case for running Python scripts with conda environments. The informative comments (lines 9-12) are particularly helpful for users, clarifying that this approach is only for testing purposes and that the script directive would normally be preferred in real workflows.

The pattern used here is valuable for testing the direct shell execution of Python scripts within conda environments, which appears to be a different execution path from the script directive approach.

tests/tests_using_conda.py (2)

309-315: New test for xonsh script execution looks good

This test function properly uses the appropriate decorators to:

Skip on Windows (since xonsh scripts may have platform-specific behavior)

Use the conda deployment method

The test definition is consistent with other similar test functions in the file, making it easy to understand and maintain.

318-320: New test for conda run functionality looks good

This test function correctly implements a test for the conda run functionality. Unlike the xonsh test, this one doesn't have the @skip_on_windows decorator, suggesting that this functionality should work across platforms.

The implementation is clean and follows the same pattern as other similar test functions in the file.

snakemake/script/__init__.py (4)

1600-1603: Well-structured Xonsh script execution implementation.

This implementation extends support for Xonsh scripts by inheriting from PythonScript and properly overriding the execute_script method to use the xonsh command.

1667-1668: LGTM - Good extension recognition implementation.

The detection logic for .xsh files follows the same pattern as other supported languages in the codebase.

1732-1732: LGTM - Proper registration of the XonshScript executor.

The XonshScript class is correctly added to the executor dictionary, allowing it to handle files with the .xsh extension.

1736-1736: LGTM - Comprehensive error message update.

The error message has been properly updated to include Xonsh (.xsh) in the list of supported script types.

snakemake/checkpoints.py (1)

33-35: Simple and effective approach for checking checkpoint outputs

This change implements the fix described in PR #3341, replacing the use of future_output with a direct check of whether all outputs specified by the rule are created, making the logic more straightforward.
snakemake/dag.py (3)

270-273: Method name change accurately reflects expanded functionality

The renamed method finished_and_not_needrun_checkpoint_jobs now explicitly describes what it does: yield jobs that are checkpoint jobs and are either finished or not marked for running. This change is more precise than the previous name and matches the implementation.

🧰 Tools

🪛 GitHub Actions: CI

[error] 272-272: Black formatting check failed. 1 file would be reformatted. Please run 'black' to fix code style issues in this file.

276-278: Update aligns checkpoint outputs with method changes

The update_checkpoint_outputs method now correctly uses the renamed and modified finished_and_not_needrun_checkpoint_jobs method, ensuring consistency in how checkpoint outputs are tracked throughout the DAG.

409-428:
Details

❓ Verification inconclusive

Improved storage output handling with topological sorting

The store_storage_outputs method has been significantly restructured to use a two-level loop that iterates over levels of jobs obtained from toposorted rather than a single loop over needrun_jobs. This approach allows for more organized processing of jobs based on their dependency levels, ensuring that storage outputs are handled in the correct order.

Let's verify that all outputs are still properly stored with the new implementation:

🏁 Script executed:
#!/bin/bash
# Check for any references to the old future_output attribute in the codebase
rg "future_output" --type py
Length of output: 28
Revised Storage Handling Implementation Verified – Please Confirm

The updated code in snakemake/dag.py (lines 409–428) now processes jobs in dependency levels using toposorted iterations. Our search for the legacy future_output attribute yielded no results, suggesting that the refactoring has successfully removed outdated references. Please manually verify that all outputs (including benchmarks and job logs) continue to be stored correctly under the new asynchronous TaskGroup setup.

Code now iterates over dependency levels from toposorted(…) rather than using a single flat loop.

No occurrences of future_output were found across the codebase.

Confirm proper storage of all outputs asynchronously under the new implementation.
docs/project_info/contributing.rst (2)

188-214: Documentation improvements align with Sphinx standards

The section heading structure has been updated to follow the Sphinx recommendations, which provides a more consistent and standardized approach to documentation. This will improve readability and maintainability of the docs.

228-230: Environment name updated for documentation build

The Conda environment name has been changed from snakemake to snakemake_docs, which better indicates the purpose of this environment and separates it from the general development environment.

docs/snakefiles/reporting.rst (7)

7-10: Enhanced report description improves user understanding

The documentation now provides a clearer explanation of Snakemake's reporting capabilities, highlighting that reports contain runtime statistics, provenance information, and workflow topology. It also introduces the two main report types: self-contained HTML files for smaller reports and ZIP archives for more complex ones.

13-17: New section improves organization of report documentation

Adding a dedicated section for "Including results in a report" with proper heading makes the documentation more structured and easier to navigate for users looking for specific information about report customization.

257-264: Added context about report data collection

This new section clarifies where the report metadata comes from (the .snakemake directory), which helps users better understand the reporting mechanism and troubleshoot potential issues.

265-285: New section on HTML reports provides clear usage instructions

The dedicated section for self-contained HTML reports clearly explains how to generate them, including customizing the report filename. The warning about suitability only for smaller reports helps users choose the appropriate report format for their needs.

286-303: ZIP archive report documentation addresses scalability

This section provides valuable guidance on using ZIP archive reports for more complex workflows, clearly explaining the benefits and usage. The information about the main entry point (report.html) is particularly helpful for users sharing reports with collaborators.

304-317: New section on partial reports enhances flexibility

Documentation on partial reports is a valuable addition that explains how to generate reports for specific targets, which is useful for workflows that haven't completed or when exploring intermediate results.

318-331: Custom layout section enhances report customization options

The new section on custom layouts provides clear instructions on how to apply custom stylesheets to reports, including references to example files. This will help institutions create branded reports that match their visual identity.

tests/test_issue1092/Snakefile (1)

32-35: Code refactored to use direct shell directive

The rule implementation has been refactored from using a run directive with a shell() function call to a direct shell: directive. This change simplifies the implementation while maintaining the same behavior.

snakemake/ioutils/input.py (1)

1-9: Simple but effective utility for parsing input files

The parse_input function provides a clean way to parse input files with an optional parser function. It returns a closure that handles file opening and applies the parser when needed.

tests/test_ioutils/Snakefile (4)

11-13: Added test for new extract_checksum function

Good test to verify the new extract_checksum functionality works correctly by checking the expected checksum value for a file.

22-31: Updated rule to use the new checksum functionality

The rule has been enhanced to use the new extract_checksum functionality, demonstrating how to parse a checksum file and use it in a rule.

58-58: Added trailing comma for consistent syntax

Adding the trailing comma is a good practice for maintainability as it makes future additions easier without causing diff changes to the line.

85-85: Reformatted shell command

No functional change, just reformatting.

snakemake/workflow.py (5)

1780-1788: Well-designed helper function to centralize software deployment checks

Good refactoring to extract common logic for checking if software deployment methods are allowed with template engines into a reusable helper function.

1794-1796: Replaced inline check with helper function call

Good refactoring to use the new helper function, improving code maintainability.

1800-1802: Replaced inline check with helper function call

Good refactoring to use the new helper function, improving code maintainability.

1808-1810: Replaced inline check with helper function call

Good refactoring to use the new helper function, improving code maintainability.

1293-1296: Added conditional execution for storage cleanup

This is the core fix of the PR that addresses the issue mentioned in the PR objective. Now it only cleans up storage objects if keep_storage_local is False. This aligns with the goal of "discarding the use of future_output in favor of directly checking whether all outputs specified by the rule are created."

tests/test_validate/Snakefile (6)

4-4: Good addition of Polars support.

Adding the Polars import expands the testing framework to cover more dataframe libraries beyond just Pandas.

11-20: Well-structured validation tests for dictionary data.

This new section properly tests dictionary-based validation with proper assertions for both complete data and data with null values filtered out. The code follows good practices by isolating test cases and asserting specific values.

22-27: Good addition of Pandas DataFrame validation without index.

The code provides additional test coverage for validating Pandas DataFrames without explicit indexing, which is a common use case.

29-39: Well-implemented Polars DataFrame validation.

The Polars DataFrame test case is a good addition covering schema definition, null value handling, and proper assertions. This enhances the test coverage for the validation functionality across different dataframe libraries.

41-49: Comprehensive Polars LazyFrame test case.

This section adds testing for Polars LazyFrame, which is important for ensuring the validation works with lazy evaluation patterns. The collect() call properly demonstrates how to handle lazy operations in validation contexts.

51-57: Good addition of index-based assertions.

The additional assertions for the indexed DataFrame test case complete the test coverage for the various DataFrame formats and access patterns.

docs/snakefiles/deployment.rst (2)

288-293: Clear documentation update for conda with run directive.

This documentation update properly clarifies that the run directive can use conda environments, with an important explanation of its special behavior. The note explains that the conda environment only affects shell calls within the run script, which is critical information for users.

464-467: Consistent documentation for apptainer with run directive.

The update mirrors the conda documentation to consistently explain how the run directive interacts with containers. This maintains a coherent documentation style and helps users understand the parallel behaviors between conda and apptainer integrations.

docs/snakefiles/rules.rst (2)

550-571: Documentation for new parse_input function looks good.

The added documentation clearly explains the purpose and usage of the parse_input function, which allows parsing an input file to extract values. The signature, parameters, and example usage are all well documented.

573-592: Documentation for new extract_checksum function is well structured.

The extract_checksum function is properly documented with clear explanation of its purpose, parameters, and usage example. The code sample showing how to apply it in a rule's params section is particularly helpful.

tests/test_checkpoints_many/Snakefile (4)

21-30: Checkpoint implementation looks good.

The first checkpoint correctly implements directory output handling and creates a predictable structure with numbered subdirectories. This establishes a good foundation for testing the checkpoint functionality.

32-41: Good use of nested checkpoint dependency.

The second checkpoint properly builds upon the first checkpoint, creating an additional level of directories. This helps test how Snakemake handles deep checkpoint dependencies.

8-11: The rule all target is properly defined.

The all rule correctly specifies the final target that should trigger the execution of the entire workflow, using expand to handle all samples.

13-19: Good initialization rule for creating sample files.

The before rule properly initializes the workflow by creating empty files for each sample, which is a clean way to set up prerequisites for subsequent rules.

snakemake/utils.py (4)

111-117: Looks good for _validate_record.

The logic to distinguish between using the default validator and the standard validator is clear and concise, with no detected issues.

118-149: Pandas DataFrame validation logic appears solid.

Your approach of excluding null values, validating each record, and conditionally updating the DataFrame with default values is well-organized. The repeated creation of a record list, followed by reconstructing the DataFrame, cleanly handles updated columns.

🧰 Tools

🪛 Ruff (0.8.2)

132-134: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

150-218: Polars DataFrame validation logic is similarly robust.

Great job mirroring the pandas strategy for null handling, record-based validation, partial checks for LazyFrame, and optional application of default values. This keeps the logic consistent across different DataFrame types.

🧰 Tools

🪛 Ruff (0.8.2)

168-170: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

206-208: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

219-226: Dictionary validation approach looks good.

The fallback logic for dictionary validation is straightforward.

🧰 Tools

🪛 Ruff (0.8.2)

224-224: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

snakemake/assets/__init__.py (1)

96-530: Extensive asset references are well-handled.

Each asset’s pinned version and SHA256 ensures reproducibility and explicit licensing coverage. No issues detected with the approach of referencing external resources.

snakemake/report/html_reporter/data/packages.py (1)

41-186: Package listings are methodical and consistent.

All newly added packages properly reference their corresponding license files, preserving clear licensing provenance. This looks great.

sonarqubecloud · 2025-03-11T19:23:39Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

🤖 I have created a release *beep* *boop* --- ## [9.0.0](v8.30.0...v9.0.0) (2025-03-14) ### ⚠ BREAKING CHANGES * Logging refactor & add LoggerPluginInterface ([#3107](#3107)) ### Features * [#3412](#3412) - keep shadow folder of failed job if --keep-incomplete flag is set. ([#3430](#3430)) ([22978c3](22978c3)) * add flag --report-after-run to automatically generate the report after a successfull workflow run ([#3428](#3428)) ([b0a7f03](b0a7f03)) * add flatten function to IO utils ([#3424](#3424)) ([67fa392](67fa392)) * add helper functions to parse input files ([#2918](#2918)) ([63e45a7](63e45a7)) * Add option to print redacted file names ([#3089](#3089)) ([ba4d264](ba4d264)) * add support for validation of polars dataframe and lazyframe ([#3262](#3262)) ([c7473a6](c7473a6)) * added support for rendering dag with mermaid js ([#3409](#3409)) ([7bf8381](7bf8381)) * adding --replace-workflow-config to fully replace workflow configs (from config: directive) with --configfile, instead of merging them ([#3381](#3381)) ([47504a0](47504a0)) * Dynamic module name ([#3401](#3401)) ([024dc32](024dc32)) * Enable saving and reloading IOCache object ([#3386](#3386)) ([c935953](c935953)) * files added in rule params with workflow.source_path will be available in used containers ([#3385](#3385)) ([a6e45bf](a6e45bf)) * Fix keep_local in storage directive and more freedom over remote retrieval behaviour ([#3410](#3410)) ([67b4739](67b4739)) * inherit parameters of use rule and extend/replace individual items them when using 'with' directive ([#3365](#3365)) ([93e4b92](93e4b92)) * Logging refactor & add LoggerPluginInterface ([#3107](#3107)) ([86f1d6e](86f1d6e)) * Maximal file size for checksums ([#3368](#3368)) ([b039f8a](b039f8a)) * Modernize package configuration using Pixi ([#3369](#3369)) ([77992d8](77992d8)) * multiext support for named input/output ([#3372](#3372)) ([05e1378](05e1378)) * optionally auto-group jobs via temp files in case of remote execution ([#3378](#3378)) ([cc9bba2](cc9bba2)) ### Bug Fixes * `--delete-all-output` ignores `--dry-run` ([#3265](#3265)) ([23fef82](23fef82)) * 3342 faster touch runs and warning messages for non-existing files ([#3398](#3398)) ([cd9c3c3](cd9c3c3)) * add default value to max-jobs-per-timespan ([#3043](#3043)) ([2959abe](2959abe)) * checkpoints inside modules are overwritten ([#3359](#3359)) ([fba3ac7](fba3ac7)) * Convert Path to IOFile ([#3405](#3405)) ([c58684c](c58684c)) * Do not perform storage object cleanup with --keep-storage-local-copies set ([#3358](#3358)) ([9a6d14b](9a6d14b)) * edgecases of source deployment in case of remote execution ([#3396](#3396)) ([5da13be](5da13be)) * enhance error message formatting for strict DAG-building mode ([#3376](#3376)) ([a1c39ee](a1c39ee)) * fix bug in checkpoint handling that led to exceptions in case checkpoint output was missing upon rerun ([#3423](#3423)) ([8cf4a2f](8cf4a2f)) * force check all required outputs ([#3341](#3341)) ([495a4e7](495a4e7)) * group job formatting ([#3442](#3442)) ([f0b10a3](f0b10a3)) * in remote jobs, upload storage in topological order such that modification dates are preserved (e.g. in case of group jobs) ([#3377](#3377)) ([eace08f](eace08f)) * only skip eval when resource depends on input ([#3374](#3374)) ([4574c92](4574c92)) * Prevent execution of conda in apptainer when not explicitly requested in software deployment method ([#3388](#3388)) ([c43c5c0](c43c5c0)) * print filenames with quotes around them in RuleException ([#3269](#3269)) ([6baeda5](6baeda5)) * Re-evaluation of free resources ([#3399](#3399)) ([6371293](6371293)) * ReadTheDocs layout issue due to src directory change ([#3419](#3419)) ([695b127](695b127)) * robustly escaping quotes in generated bash scripts (v2) ([#3297](#3297)) ([#3389](#3389)) ([58720bd](58720bd)) * Show apptainer image URL in snakemake report ([#3407](#3407)) ([45f0450](45f0450)) * Update ReadTheDocs configuration for documentation build to use Pixi ([#3433](#3433)) ([3f227a6](3f227a6)) ### Documentation * Add pixi setup instructions to general use tutorial ([#3382](#3382)) ([115e81b](115e81b)) * fix contribution section heading levels, fix docs testing setup order ([#3360](#3360)) ([051dc53](051dc53)) * fix link to github.com/snakemake/poetry-snakemake-plugin ([#3436](#3436)) ([ec6d97c](ec6d97c)) * fix quoting ([#3394](#3394)) ([b40f599](b40f599)) * fix rerun-triggers default ([#3403](#3403)) ([4430e23](4430e23)) * fix typo 'safe' -> 'save' ([#3384](#3384)) ([7755861](7755861)) * mention code formatting in the contribution section ([#3431](#3431)) ([e8682b7](e8682b7)) * remove duplicated 'functions'. ([#3356](#3356)) ([7c595db](7c595db)) * update broken links documentation ([#3437](#3437)) ([e3d0d88](e3d0d88)) * Updating contributing guidelines with new pixi dev setup ([#3415](#3415)) ([8e95a12](8e95a12)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: snakemake-bot <snakemake-bot-admin@googlegroups.com>

This may fix snakemake#3036 This fix discards totaly change `future_output` to `created_output`, and directly check if all output the rule wanted are created. I'm somehow doubt if `future_output` used some elsewhere. Is it needed to add it back? ### QC  * [x] The PR contains test case <`tests/tests.py::test_checkpoints_many`> for the changes. * [x] the change does neither modify the language nor the behavior or functionalities of Snakemake.  ## Summary by CodeRabbit - **New Features** - Enhanced workflow execution with refined job ordering, checkpoint validation, file parsing, checksum extraction, and updated CLI help text. - **Documentation** - Updated examples and guidelines for input parsing and configuration validations, with typographical corrections for improved clarity. - **Tests** - Introduced new test cases for checkpoint and data format validation, and updated expected outputs and schema definitions. - **Chores** - Revised CI workflows with conditional execution and updated environment dependencies to streamline the build process.  --------- Co-authored-by: Johannes Köster <johannes.koester@uni-due.de> Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>

🤖 I have created a release *beep* *boop* --- ## [9.0.0](snakemake/snakemake@v8.30.0...v9.0.0) (2025-03-14) ### ⚠ BREAKING CHANGES * Logging refactor & add LoggerPluginInterface ([snakemake#3107](snakemake#3107)) ### Features * [snakemake#3412](snakemake#3412) - keep shadow folder of failed job if --keep-incomplete flag is set. ([snakemake#3430](snakemake#3430)) ([22978c3](snakemake@22978c3)) * add flag --report-after-run to automatically generate the report after a successfull workflow run ([snakemake#3428](snakemake#3428)) ([b0a7f03](snakemake@b0a7f03)) * add flatten function to IO utils ([snakemake#3424](snakemake#3424)) ([67fa392](snakemake@67fa392)) * add helper functions to parse input files ([snakemake#2918](snakemake#2918)) ([63e45a7](snakemake@63e45a7)) * Add option to print redacted file names ([snakemake#3089](snakemake#3089)) ([ba4d264](snakemake@ba4d264)) * add support for validation of polars dataframe and lazyframe ([snakemake#3262](snakemake#3262)) ([c7473a6](snakemake@c7473a6)) * added support for rendering dag with mermaid js ([snakemake#3409](snakemake#3409)) ([7bf8381](snakemake@7bf8381)) * adding --replace-workflow-config to fully replace workflow configs (from config: directive) with --configfile, instead of merging them ([snakemake#3381](snakemake#3381)) ([47504a0](snakemake@47504a0)) * Dynamic module name ([snakemake#3401](snakemake#3401)) ([024dc32](snakemake@024dc32)) * Enable saving and reloading IOCache object ([snakemake#3386](snakemake#3386)) ([c935953](snakemake@c935953)) * files added in rule params with workflow.source_path will be available in used containers ([snakemake#3385](snakemake#3385)) ([a6e45bf](snakemake@a6e45bf)) * Fix keep_local in storage directive and more freedom over remote retrieval behaviour ([snakemake#3410](snakemake#3410)) ([67b4739](snakemake@67b4739)) * inherit parameters of use rule and extend/replace individual items them when using 'with' directive ([snakemake#3365](snakemake#3365)) ([93e4b92](snakemake@93e4b92)) * Logging refactor & add LoggerPluginInterface ([snakemake#3107](snakemake#3107)) ([86f1d6e](snakemake@86f1d6e)) * Maximal file size for checksums ([snakemake#3368](snakemake#3368)) ([b039f8a](snakemake@b039f8a)) * Modernize package configuration using Pixi ([snakemake#3369](snakemake#3369)) ([77992d8](snakemake@77992d8)) * multiext support for named input/output ([snakemake#3372](snakemake#3372)) ([05e1378](snakemake@05e1378)) * optionally auto-group jobs via temp files in case of remote execution ([snakemake#3378](snakemake#3378)) ([cc9bba2](snakemake@cc9bba2)) ### Bug Fixes * `--delete-all-output` ignores `--dry-run` ([snakemake#3265](snakemake#3265)) ([23fef82](snakemake@23fef82)) * 3342 faster touch runs and warning messages for non-existing files ([snakemake#3398](snakemake#3398)) ([cd9c3c3](snakemake@cd9c3c3)) * add default value to max-jobs-per-timespan ([snakemake#3043](snakemake#3043)) ([2959abe](snakemake@2959abe)) * checkpoints inside modules are overwritten ([snakemake#3359](snakemake#3359)) ([fba3ac7](snakemake@fba3ac7)) * Convert Path to IOFile ([snakemake#3405](snakemake#3405)) ([c58684c](snakemake@c58684c)) * Do not perform storage object cleanup with --keep-storage-local-copies set ([snakemake#3358](snakemake#3358)) ([9a6d14b](snakemake@9a6d14b)) * edgecases of source deployment in case of remote execution ([snakemake#3396](snakemake#3396)) ([5da13be](snakemake@5da13be)) * enhance error message formatting for strict DAG-building mode ([snakemake#3376](snakemake#3376)) ([a1c39ee](snakemake@a1c39ee)) * fix bug in checkpoint handling that led to exceptions in case checkpoint output was missing upon rerun ([snakemake#3423](snakemake#3423)) ([8cf4a2f](snakemake@8cf4a2f)) * force check all required outputs ([snakemake#3341](snakemake#3341)) ([495a4e7](snakemake@495a4e7)) * group job formatting ([snakemake#3442](snakemake#3442)) ([f0b10a3](snakemake@f0b10a3)) * in remote jobs, upload storage in topological order such that modification dates are preserved (e.g. in case of group jobs) ([snakemake#3377](snakemake#3377)) ([eace08f](snakemake@eace08f)) * only skip eval when resource depends on input ([snakemake#3374](snakemake#3374)) ([4574c92](snakemake@4574c92)) * Prevent execution of conda in apptainer when not explicitly requested in software deployment method ([snakemake#3388](snakemake#3388)) ([c43c5c0](snakemake@c43c5c0)) * print filenames with quotes around them in RuleException ([snakemake#3269](snakemake#3269)) ([6baeda5](snakemake@6baeda5)) * Re-evaluation of free resources ([snakemake#3399](snakemake#3399)) ([6371293](snakemake@6371293)) * ReadTheDocs layout issue due to src directory change ([snakemake#3419](snakemake#3419)) ([695b127](snakemake@695b127)) * robustly escaping quotes in generated bash scripts (v2) ([snakemake#3297](snakemake#3297)) ([snakemake#3389](snakemake#3389)) ([58720bd](snakemake@58720bd)) * Show apptainer image URL in snakemake report ([snakemake#3407](snakemake#3407)) ([45f0450](snakemake@45f0450)) * Update ReadTheDocs configuration for documentation build to use Pixi ([snakemake#3433](snakemake#3433)) ([3f227a6](snakemake@3f227a6)) ### Documentation * Add pixi setup instructions to general use tutorial ([snakemake#3382](snakemake#3382)) ([115e81b](snakemake@115e81b)) * fix contribution section heading levels, fix docs testing setup order ([snakemake#3360](snakemake#3360)) ([051dc53](snakemake@051dc53)) * fix link to github.com/snakemake/poetry-snakemake-plugin ([snakemake#3436](snakemake#3436)) ([ec6d97c](snakemake@ec6d97c)) * fix quoting ([snakemake#3394](snakemake#3394)) ([b40f599](snakemake@b40f599)) * fix rerun-triggers default ([snakemake#3403](snakemake#3403)) ([4430e23](snakemake@4430e23)) * fix typo 'safe' -> 'save' ([snakemake#3384](snakemake#3384)) ([7755861](snakemake@7755861)) * mention code formatting in the contribution section ([snakemake#3431](snakemake#3431)) ([e8682b7](snakemake@e8682b7)) * remove duplicated 'functions'. ([snakemake#3356](snakemake#3356)) ([7c595db](snakemake@7c595db)) * update broken links documentation ([snakemake#3437](snakemake#3437)) ([e3d0d88](snakemake@e3d0d88)) * Updating contributing guidelines with new pixi dev setup ([snakemake#3415](snakemake#3415)) ([8e95a12](snakemake@8e95a12)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: snakemake-bot <snakemake-bot-admin@googlegroups.com>

Hocnonsense requested a review from johanneskoester as a code owner March 7, 2025 07:51

coderabbitai bot reviewed Mar 7, 2025

View reviewed changes

Hocnonsense force-pushed the fix-issue3036 branch from d7d5730 to 6ecc84e Compare March 9, 2025 21:00

coderabbitai bot reviewed Mar 9, 2025

View reviewed changes

tests/test_checkpoints_many/Snakefile Outdated Show resolved Hide resolved

Hocnonsense added 4 commits March 10, 2025 11:50

fix: force check all required outputs

b3454cb

test: add test

ba77433

fix: check file existance if not created

a016e6b

test: support windows

dfe767a

Hocnonsense force-pushed the fix-issue3036 branch from 6ecc84e to dfe767a Compare March 10, 2025 10:50

coderabbitai bot reviewed Mar 10, 2025

View reviewed changes

Update tests/test_checkpoints_many/Snakefile

b04742f

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

coderabbitai bot reviewed Mar 10, 2025

View reviewed changes

Merge branch 'main' into fix-issue3036

c6a773d

coderabbitai bot reviewed Mar 11, 2025

View reviewed changes

force check all required outputs without check if file existence

6426310

Hocnonsense force-pushed the fix-issue3036 branch from c6a773d to 6426310 Compare March 11, 2025 17:05

Merge branch 'main' into fix-issue3036

6717b95

coderabbitai bot reviewed Mar 11, 2025

View reviewed changes

johanneskoester added 2 commits March 11, 2025 18:40

Merge branch 'fix-issue3036' of github.com:Hocnonsense/snakemake into…

ca36fc1

… pr/Hocnonsense/3341

revert testcase to run

6f39e6b

johanneskoester approved these changes Mar 11, 2025

View reviewed changes

johanneskoester enabled auto-merge (squash) March 11, 2025 17:43

coderabbitai bot reviewed Mar 11, 2025

View reviewed changes

black

4f72e5d

johanneskoester merged commit 495a4e7 into snakemake:main Mar 11, 2025
41 checks passed

github-actions bot mentioned this pull request Mar 11, 2025

chore(main): release 9.0.0 #3380

Merged

Hocnonsense deleted the fix-issue3036 branch March 11, 2025 19:32

ezherman mentioned this pull request Mar 12, 2025

Using get() on checkpoints in input function multiple times only schedules the checkpoint once #3283

Closed

Hocnonsense mentioned this pull request Mar 12, 2025

Chained checkpoints are not evaluated correctly #2958

Closed

Hocnonsense mentioned this pull request Mar 15, 2025

Allow for nested checkpoints #106

Closed

ASLeonard mentioned this pull request Mar 25, 2025

Checkpoint "get" still only resolvable once per aggregate function #3475

Open

coderabbitai bot mentioned this pull request Nov 28, 2025

feat!: generic software deployment plugin support #3339

Open

2 tasks

Hocnonsense self-assigned this Mar 18, 2026

Conversation

Hocnonsense commented Mar 7, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Suggested reviewers

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

sonarqubecloud bot commented Mar 11, 2025

Quality Gate passed

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Hocnonsense commented Mar 7, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 7, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)