Skip to content

feat: assume checkpoint safe temp files#3218

Open
mhulsman wants to merge 7 commits intosnakemake:mainfrom
mhulsman:option_checkpoint_safe_temp_files
Open

feat: assume checkpoint safe temp files#3218
mhulsman wants to merge 7 commits intosnakemake:mainfrom
mhulsman:option_checkpoint_safe_temp_files

Conversation

@mhulsman
Copy link
Copy Markdown
Contributor

@mhulsman mhulsman commented Nov 22, 2024

Snakemake can only ensure that temp files will not be needed anymore
after all checkpoint jobs have been executed (#609, #2732, fixed in #2737).
However, waiting for all checkpoint jobs to be completed can lead to
excessive storage use (#2982).

In most cases temp file usage is either unaffected by checkpoint jobs, or
can be explicitly managed by including the temp files as input to
checkpoint jobs or their downstream dependencies. This will ensure they
are kept on disk until they (might) be needed.

To address cases where pipelines using checkpoints become infeasible due
to high storage demands, this commit introduces a new command-line
option --assume-checkpoint-safe-temp-files.

This option lets Snakemake know that it can safely remove temporary files
not required by current jobs, even if checkpoint jobs are still pending.

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • New Features

    • Introduced a command-line flag to allow enhanced control over temporary file management in workflows that use checkpoint rules.
    • Added a new configuration option for handling temporary files in relation to checkpoint settings.
  • Documentation

    • Clarified behavior of temporary file retention with checkpoint rules and instructions for safely overriding it.
  • Tests

    • Added automated tests to verify the new temporary file management option works as expected.

Snakemake can only ensure that temp files will not be needed anymore
after all checkpoint jobs have been executed (snakemake#609, snakemake#2732, fixed in snakemake#2737).
However, waiting for all checkpoint jobs to be completed can lead to
excessive storage use (snakemake#2982).

In most cases temp file usage is either unaffected by checkpoint jobs, or
can be explicitly managed by including the temp files as input to
checkpoint jobs or their downstream dependencies. This will ensure they
are kept on disk until they (might) be needed.

To address cases where pipelines using checkpoints become infeasible due
to high storage demands, this commit introduces a new command-line
option. This option lets Snakemake know that it can  safely remove
temporary files not required by current jobs, even if checkpoint jobs
are still pending.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 22, 2024

📝 Walkthrough

Walkthrough

This pull request introduces a new configuration for managing temporary files when using checkpoint rules. It adds clarifications in the documentation, a new command-line flag (--assume-checkpoint-safe-temp-files), an additional parameter in the workflow execution (reflected in tests and API), and updates within the DAG logic to conditionally handle temporary file deletion based on the new setting. A corresponding attribute is also added to the workflow settings.

Changes

File(s) Change Summary
docs/snakefiles/rules.rst Added a paragraph clarifying checkpoint rules, explaining that temporary files may be retained until all checkpoints complete and describing the --assume-checkpoint-safe-temp-files option.
tests/common.py, tests/tests.py Introduced new parameter assume_checkpoint_safe_temp_files in the run() function and added the test_assume_checkpoint_safe_temp_files test case (expected to fail) to validate the behavior.
src/snakemake/cli.py Added --assume-checkpoint-safe-temp-files flag to the argument parser and updated args_to_api to include this flag in the API call.
src/snakemake/dag.py, src/snakemake/settings/types.py Updated the DAG class to include a new instance variable for checkpoint temp file handling and modified the finish() method logic; added the assume_checkpoint_safe_temp_files attribute (default False) in DAGSettings.

Sequence Diagram(s)

sequenceDiagram
    participant U as User
    participant CP as CLI Parser (get_argument_parser)
    participant API as args_to_api / Snakemake API
    participant D as DAG (dag.py)

    U->>CP: Run snakemake with --assume-checkpoint-safe-temp-files flag
    CP->>API: Parse flag and pass it along
    API->>D: Initialize DAG with dag_settings (assume_checkpoint_safe_temp_files flag)
    D->>D: Use flag in finish() to decide on temporary file deletion
Loading

Suggested labels

enhancement

Suggested reviewers

  • johanneskoester

Tip

⚡💬 Agentic Chat (Pro Plan, General Availability)
  • We're introducing multi-step agentic chat in review comments and issue comments, within and outside of PR's. This feature enhances review and issue discussions with the CodeRabbit agentic chat by enabling advanced interactions, including the ability to create pull requests directly from comments and add commits to existing pull requests.

📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b50e92 and bc6a372.

📒 Files selected for processing (4)
  • src/snakemake/cli.py (3 hunks)
  • src/snakemake/dag.py (4 hunks)
  • tests/common.py (2 hunks)
  • tests/tests.py (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • tests/common.py
  • tests/tests.py
  • src/snakemake/dag.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • src/snakemake/cli.py
⏰ Context from checks skipped due to timeout of 90000ms (36)
  • GitHub Check: tests (10, macos-latest, py312)
  • GitHub Check: tests (10, windows-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, windows-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (8, windows-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, windows-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (6, windows-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (5, windows-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, macos-latest, py312)
  • GitHub Check: tests (4, windows-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, macos-latest, py312)
  • GitHub Check: tests (3, windows-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, macos-latest, py312)
  • GitHub Check: tests (2, windows-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, macos-latest, py312)
  • GitHub Check: tests (1, windows-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
  • GitHub Check: apidocs
🔇 Additional comments (2)
src/snakemake/cli.py (2)

855-866: Well implemented command-line argument with clear documentation.

The new --assume-checkpoint-safe-temp-files flag is properly added to the execution argument group with a comprehensive help text that clearly explains:

  1. The flag's purpose (allowing temporary files to be deleted when there are checkpoint rules)
  2. The default behavior being overridden (temp files are normally kept until all checkpoints complete)
  3. The potential risks of using this flag (temporary files might be needed after checkpoint execution)
  4. Alternative approaches (explicitly declaring needed temp files as inputs)

This implementation provides users with a useful option while also ensuring they understand the implications of using it.


2113-2114: Correctly integrated flag value into DAG settings.

The boolean value from the new command-line flag is properly passed to the DAGSettings constructor, ensuring that the user's preference is correctly propagated to the DAG implementation where the temporary file handling logic resides.

✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (3)
snakemake/settings/types.py (1)

205-205: LGTM! Consider adding a docstring.

The new boolean flag is well-placed within the DAGSettings class and aligns with the PR's objective of optimizing temporary file handling in checkpoint jobs. However, adding a docstring would improve code maintainability.

Consider adding a docstring to document the flag's purpose:

 class DAGSettings(SettingsBase):
     targets: AnySet[str] = frozenset()
     target_jobs: AnySet[str] = frozenset()
     target_files_omit_workdir_adjustment: bool = False
     batch: Optional[Batch] = None
     forcetargets: bool = False
     forceall: bool = False
     forcerun: AnySet[str] = frozenset()
     until: AnySet[str] = frozenset()
     omit_from: AnySet[str] = frozenset()
     force_incomplete: bool = False
     allowed_rules: AnySet[str] = frozenset()
     rerun_triggers: AnySet[RerunTrigger] = RerunTrigger.all()
     max_inventory_wait_time: int = 20
-    assume_checkpoint_safe_temp_files: bool = False
+    assume_checkpoint_safe_temp_files: bool = False
+    """When True, allows Snakemake to remove temporary files that are not required by current jobs,
+    even if there are pending checkpoint jobs. This can significantly reduce storage requirements
+    in workflows with checkpoint rules."""
docs/snakefiles/rules.rst (1)

1659-1666: Documentation enhancement looks good, with a minor suggestion.

The added documentation clearly explains the behavior of temporary files with checkpoint rules and introduces the new --assume-checkpoint-safe-temp-files option. The warning about potential premature deletion is well-articulated.

Consider adding a small code example to illustrate the proper way to include temporary files as inputs to checkpoint jobs or their dependencies. This would make the documentation more actionable for users.

Example addition:

# Example of properly including temp files in checkpoint dependencies
rule create_temp:
    output:
        temp("temp_file.txt")
    shell:
        "..."

checkpoint process_data:
    input:
        "temp_file.txt"  # temp file properly included as input
    output:
        "checkpoint_output.txt"
    shell:
        "..."
snakemake/dag.py (1)

1979-1982: Implement Conditional Temp File Deletion and Address TODO

The condition now allows temp files to be deleted when assume_checkpoint_safe_temp_files is True, which is correct. There's a TODO comment suggesting a more accurate method to determine dependencies on temp files in relation to checkpoints. I can assist with this enhancement.

Would you like me to work on a solution or open a new GitHub issue to address this TODO?

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between a4f2e5c and 956fae4.

📒 Files selected for processing (6)
  • docs/snakefiles/rules.rst (1 hunks)
  • snakemake/cli.py (3 hunks)
  • snakemake/dag.py (4 hunks)
  • snakemake/settings/types.py (1 hunks)
  • tests/common.py (2 hunks)
  • tests/tests.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (5)
snakemake/cli.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/dag.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/settings/types.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

tests/common.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

tests/tests.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🔇 Additional comments (14)
tests/common.py (2)

216-216: LGTM: Parameter addition follows existing patterns

The new parameter assume_checkpoint_safe_temp_files is well-placed and maintains backward compatibility with its default False value.


373-373: LGTM: Verify test coverage for the new parameter

The parameter is correctly passed to the DAG settings. However, let's verify that there are corresponding test cases that exercise this parameter.

tests/tests.py (2)

2290-2291: LGTM! Appropriate use of skip decorator.

The @skip_on_windows decorator is correctly used as this test is OS-agnostic and doesn't need to run on Windows.


2292-2297: LGTM! Test implementation validates error handling.

The test appropriately validates that the new assume_checkpoint_safe_temp_files flag handles error cases correctly by:

  1. Using an existing test case that exercises checkpoint functionality
  2. Enabling the new flag
  3. Expecting the test to fail via shouldfail=True
snakemake/cli.py (3)

Line range hint 850-857: LGTM: Shadow prefix argument is well implemented

The --shadow-prefix argument is properly implemented with clear documentation explaining its purpose and default behavior.


861-872: LGTM: Checkpoint-safe temp files flag is well designed

The --assume-checkpoint-safe-temp-files flag is well implemented with thorough documentation that clearly explains:

  • The default safer behavior of keeping temp files until checkpoints complete
  • The conditions under which it's safe to use this flag
  • The requirement to declare temp files as inputs if needed after checkpoints

2106-2106: LGTM: DAG settings integration is correct

The new flag is properly integrated into the DAG settings.

snakemake/dag.py (7)

115-116: Add Type Annotation for self.workflow

Adding the type annotation :_workflow.Workflow for self.workflow enhances code readability and maintainability.


118-118: Clarify self._dependencies with Comment

The added comment clearly describes the purpose of self._dependencies, improving code understanding.


121-121: Add Comment for self.depending Mapping

The comment provides a concise explanation of self.depending, aiding in code comprehension.


125-127: Initialize assume_checkpoint_safe_temp_files Attribute

Correctly initializing self.assume_checkpoint_safe_temp_files enhances temporary file management, especially in the context of checkpoint jobs.


196-197: Add Docstring for dependencies Property

The docstring effectively documents the dependencies property, clarifying its return value and purpose.


202-204: Provide Docstring for batch Property

Adding the docstring to the batch property improves documentation and explains its functionality clearly.


1002-1007: Document update Method with Docstring

The added docstring for the update method concisely describes its purpose and exceptions, enhancing code readability.

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
docs/snakefiles/rules.rst (1)

1659-1666: Documentation enhancement looks good with a minor formatting suggestion.

The added documentation clearly explains:

  • The limitation that temporary files cannot be deleted until all checkpoint jobs complete
  • The new --assume-checkpoint-safe-temp-files option to override this behavior
  • Important safety considerations when using this option

Consider applying this formatting improvement to make the documentation more readable:

-When using checkpoint rules, Snakemake cannot determine whether temporary files are no longer needed until 
-all checkpoint jobs have been completed. This delay in removal may result in unexpectedly high storage usage.
-This behavior can be disabled by using the option ``--assume-checkpoint-safe-temp-files``. 
-However, if any temporary files might still be needed after future checkpoints, you must ensure they are not 
-deleted prematurely. To do so, include them as input of the checkpoint job (or any jobs dependent on the 
-checkpoint job)."
+When using checkpoint rules, Snakemake cannot determine whether temporary files are no longer needed until 
+all checkpoint jobs have been completed. This delay in removal may result in unexpectedly high storage usage.
+
+This behavior can be disabled by using the option ``--assume-checkpoint-safe-temp-files``. 
+However, if any temporary files might still be needed after future checkpoints, you must ensure they are not 
+deleted prematurely. To do so, include them as input of the checkpoint job (or any jobs dependent on the 
+checkpoint job).
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 956fae4 and aae688f.

📒 Files selected for processing (2)
  • docs/snakefiles/rules.rst (1 hunks)
  • snakemake/dag.py (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • snakemake/dag.py

@cmeesters
Copy link
Copy Markdown
Member

@mhulsman and @johanneskoester I just merged the main into this PR. And as I just have the usecase for this, I will check out this PR next week and will try to apply it in a complex scenario. Thanks for this contribution!

At quick glance, the review is not overwhelming. Can do, if you want me to, Johannes.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
docs/snakefiles/rules.rst (1)

1827-1833: Documentation Clarity and Formatting Improvement

The new paragraph effectively explains that when using checkpoint rules, temporary files cannot be removed until all checkpoint jobs have completed—which might lead to excessive storage usage—and that this behavior can be disabled with the new --assume-checkpoint-safe-temp-files option. However, there is a minor formatting issue: a stray closing double-quote at the end of the paragraph. Additionally, consider a slight rephrasing for enhanced clarity and consistency.

Suggested diff for polishing the text:

-When using checkpoint rules, Snakemake cannot determine whether temporary files are no longer needed until 
-all checkpoint jobs have been completed. This delay in removal may result in unexpectedly high storage usage.
-This behavior can be disabled by using the option ``--assume-checkpoint-safe-temp-files``. 
-However, if any temporary files might still be needed after future checkpoints, you must ensure they are not 
-deleted prematurely. To do so, include them as input of the checkpoint job (or any jobs dependent on the 
-checkpoint job)."
+When using checkpoint rules, Snakemake cannot determine whether temporary files are no longer needed until all checkpoint jobs have been completed, which may delay their removal and result in unexpectedly high storage usage. This behavior can be disabled by using the ``--assume-checkpoint-safe-temp-files`` option. However, if any temporary files might still be required after future checkpoints, ensure they are not deleted prematurely by including them as input to the checkpoint job (or to any jobs that depend on the checkpoint).
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between aae688f and 5bc7e07.

📒 Files selected for processing (5)
  • docs/snakefiles/rules.rst (1 hunks)
  • src/snakemake/cli.py (3 hunks)
  • src/snakemake/dag.py (4 hunks)
  • src/snakemake/settings/types.py (1 hunks)
  • tests/common.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • tests/common.py
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.py`: Do not try to improve formatting. Do not suggest type annotations for functions that are defined inside of functions or methods. Do not suggest type annotation of the `s...

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

  • src/snakemake/settings/types.py
  • src/snakemake/cli.py
  • src/snakemake/dag.py
🧬 Code Definitions (1)
src/snakemake/dag.py (1)
src/snakemake/workflow.py (1)
  • Workflow (140-2396)
⏰ Context from checks skipped due to timeout of 90000ms (35)
  • GitHub Check: tests (10, windows-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py312)
  • GitHub Check: tests (10, ubuntu-latest, py311)
  • GitHub Check: tests (9, windows-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py312)
  • GitHub Check: tests (9, ubuntu-latest, py311)
  • GitHub Check: tests (8, macos-latest, py312)
  • GitHub Check: tests (8, windows-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py312)
  • GitHub Check: tests (8, ubuntu-latest, py311)
  • GitHub Check: tests (7, macos-latest, py312)
  • GitHub Check: tests (7, windows-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py312)
  • GitHub Check: tests (7, ubuntu-latest, py311)
  • GitHub Check: tests (6, macos-latest, py312)
  • GitHub Check: tests (6, windows-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py312)
  • GitHub Check: tests (6, ubuntu-latest, py311)
  • GitHub Check: tests (5, macos-latest, py312)
  • GitHub Check: tests (5, windows-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py312)
  • GitHub Check: tests (5, ubuntu-latest, py311)
  • GitHub Check: tests (4, windows-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py312)
  • GitHub Check: tests (4, ubuntu-latest, py311)
  • GitHub Check: tests (3, windows-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py312)
  • GitHub Check: tests (3, ubuntu-latest, py311)
  • GitHub Check: tests (2, windows-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py312)
  • GitHub Check: tests (2, ubuntu-latest, py311)
  • GitHub Check: tests (1, macos-latest, py312)
  • GitHub Check: tests (1, windows-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py312)
  • GitHub Check: tests (1, ubuntu-latest, py311)
🔇 Additional comments (10)
src/snakemake/settings/types.py (1)

211-211: Well-implemented addition for temporary file management.

The new boolean attribute assume_checkpoint_safe_temp_files is well-named and clearly relates to the PR's objective. The default value of False is appropriate as it maintains backward compatibility, ensuring that users must explicitly opt-in to the new behavior.

src/snakemake/cli.py (2)

855-867: Well-documented new command-line option.

The new --assume-checkpoint-safe-temp-files option is clearly described with appropriate warnings about potential risks when using this flag. The documentation effectively explains the tradeoff between storage efficiency and ensuring necessary files are available for downstream jobs after checkpoints.


2116-2116: Correct integration with DAG settings.

The new command-line option is properly passed to the DAG settings, which will make it available to the appropriate components that manage temporary file cleanup.

src/snakemake/dag.py (7)

120-121: Type annotation added, improving code readability.

The explicit type annotation for self.workflow makes the code more maintainable by clearly indicating the expected type.


123-127: Improved documentation with descriptive comments.

Adding clear comments about the purpose of the dependency dictionaries enhances code readability and makes it easier for developers to understand the data structures.


130-132: Good implementation of new configuration flag.

The new instance variable self.assume_checkpoint_safe_temp_files is properly initialized from the workflow's DAG settings, enabling the new feature to control temporary file deletion behavior in the presence of checkpoint jobs.


202-204: Enhanced property documentation with docstring.

Adding a descriptive docstring to the dependencies property improves API documentation and makes the codebase more maintainable.


208-211: Added descriptive docstring to the batch property.

The docstring clearly explains the purpose of the batch property, making the API more self-documenting.


1072-1077: Improved method documentation with clarified docstring.

The updated docstring for the update method better explains its functionality and the exception it raises, making the code more accessible to contributors.


1159-1163: Implemented the core feature change for checkpoint-safe temp files.

The modified condition now allows temporary files to be deleted either when there are no checkpoint jobs remaining OR when the user has explicitly requested to assume checkpoint-safe temp files, implementing the feature described in the PR objectives.

The comments clearly explain that normally, temp files can't be safely deleted while checkpoint jobs exist, but the new flag allows overriding this behavior.

mhulsman added a commit to holstegelab/short_read_analyzing_pipeline_Snakemake that referenced this pull request Apr 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants