Skip to content

feat: allow to mark input files of rules as ancient via the API or command line interface (and thereby also via workflow specific profiles). Putting this into a workflow specific profile (or specifying as argument) allows to overrule rerun triggers caused by file modification dates where the user knows better.#3171

Merged
johanneskoester merged 15 commits intomainfrom
feat/ancient-cli
Oct 29, 2024

Conversation

@johanneskoester
Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester commented Oct 24, 2024

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • New Features

    • Introduced the --consider-ancient command-line argument for enhanced control over input item handling.
    • Added a new field in WorkflowSettings to store mappings for "ancient" items.
    • Implemented a new Snakemake workflow with rules for processing files.
    • Added a new FAQ section explaining how to prevent unnecessary rule re-execution using the --consider-ancient option.
  • Bug Fixes

    • Improved error handling for input item parsing, ensuring correct formats are enforced.
  • Enhancements

    • Updated input handling in the Rule class to support marking items as ancient.
    • Enhanced logging for clearer feedback on input and output processing.
    • Improved type hinting in WorkflowSettings for better data handling capabilities.
    • Added new test cases to validate the handling of ancient files in the CLI.

…mmand line interface (and thereby also via workflow specific profiles). Putting this into a workflow specific profile (or specifying as argument) allows to overrule rerun triggers caused by file modification dates where the user knows better.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Oct 24, 2024

📝 Walkthrough
📝 Walkthrough

Walkthrough

This pull request introduces enhancements to the Snakemake command-line interface and the internal handling of input items within the Rule class. A new function, parse_consider_ancient, is added to manage the --consider-ancient argument, allowing users to specify input items as "ancient." The Rule class's set_input and _set_inoutput_item methods are modified to include a mark_ancient parameter, enabling more nuanced input processing. Additionally, the WorkflowSettings class is updated with a new field to store mappings related to the ancient input items.

Changes

File Change Summary
snakemake/cli.py - Added function parse_consider_ancient(args) to handle --consider-ancient command-line argument.
- Updated get_argument_parser to include new argument group for --consider-ancient.
snakemake/rules.py - Updated set_input method to include mark_ancient parameter.
- Modified _set_inoutput_item method to include mark_ancient parameter and adjust item processing accordingly.
snakemake/settings/types.py - Added field consider_ancient: Mapping[str, AnySet[Union[str, int]]] in WorkflowSettings class. Added type hints for better type safety.
tests/test_ancient_cli/Snakefile - Introduced a new Snakemake workflow with rules A and B for processing input files and generating outputs.
tests/test_ancient_cli/expected-results/1 - New empty file created.
tests/test_ancient_cli/expected-results/2 - Two lines added to update expected results for tests related to ancient CLI functionality.
tests/test_ancient_cli/expected-results/3 - New empty file created.
tests/test_ancient_cli/expected-results/4 - New empty file created.
tests/tests.py - Added new test function test_ancient_cli() to validate the behavior of the Snakemake workflow with the --consider-ancient flag.

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (3)
snakemake/settings/types.py (1)

202-204: Add documentation for the new field.

The consider_ancient field lacks documentation explaining its purpose and expected values. Since this is a public API change, please add a docstring describing:

  • Purpose of the field
  • Format of the mapping (what the keys and values represent)
  • How it integrates with the ancient file marking feature
     consider_ancient: Mapping[str, AnySet[Union[str, int]]] = field(
         default_factory=dict
-    )
+    )
+    """Mapping of rule names to sets of input items that should be considered as ancient.
+    
+    Keys are rule names and values are sets containing either string paths or integer indices
+    of input items that should be marked as ancient, thereby preventing rerun triggers
+    based on modification times.
+    """
snakemake/rules.py (1)

408-408: Add type annotation for mark_ancient parameter.

Consider adding type annotation for better code clarity:

-    def _set_inoutput_item(self, item, output=False, name=None, mark_ancient=False):
+    def _set_inoutput_item(self, item, output: bool = False, name: str | None = None, mark_ancient: bool = False):
snakemake/cli.py (1)

760-771: Enhance help text with format examples.

The argument is well-structured and placed appropriately. Consider enhancing the help text with explicit format examples to make it more user-friendly.

     group_exec.add_argument(
         "--consider-ancient",
         metavar="RULE=INPUTITEMS",
         nargs="+",
         default=dict(),
         parse_func=parse_consider_ancient,
-        help="Consider given input items of given rules as ancient, i.e. not triggering "
-        "re-runs if they are newer than the output files. "
-        "Putting this into a workflow specific profile (or specifying as argument) "
-        "allows to overrule rerun triggers caused by file modification dates where the "
-        "user knows better.",
+        help="Consider given input items of given rules as ancient, i.e. not triggering "
+        "re-runs if they are newer than the output files. "
+        "Format: RULE=ITEM1,ITEM2,... where ITEM can be either an input name or "
+        "0-based index (e.g. 'myrule=input1,0,input2'). "
+        "Putting this into a workflow specific profile (or specifying as argument) "
+        "allows to overrule rerun triggers caused by file modification dates where the "
+        "user knows better.",
     )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between c98b2e7 and 84f4e79.

📒 Files selected for processing (3)
  • snakemake/cli.py (3 hunks)
  • snakemake/rules.py (5 hunks)
  • snakemake/settings/types.py (2 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
snakemake/cli.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/rules.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/settings/types.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🪛 Ruff
snakemake/cli.py

120-120: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

snakemake/settings/types.py

6-6: typing.List imported but unused

Remove unused import: typing.List

(F401)

🔇 Additional comments (2)
snakemake/rules.py (2)

275-283: LGTM: Input handling properly integrates ancient file marking.

The implementation correctly retrieves ancient file settings from the workflow configuration and properly passes the mark_ancient flag to the underlying item processing.


490-491: LGTM: Ancient flag is properly applied to input items.

The implementation correctly applies the ancient flag to input items when specified, maintaining consistency with the existing codebase while adding the new functionality.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (1)
snakemake/settings/types.py (1)

127-129: Add documentation for the consider_ancient field.

The new field would benefit from documentation in the WorkflowSettings class docstring explaining:

  • Purpose: How it's used to mark input files as ancient
  • Key format: What the string keys represent (rule names?)
  • Value format: What the set of strings/integers represent
  • Usage: How it integrates with the API/CLI functionality

Example docstring addition:

"""
Parameters
----------
consider_ancient: Mapping[str, AnySet[Union[str, int]]]
    Mapping of rule names to sets of input items that should be considered as ancient.
    Ancient inputs do not trigger reruns based on their modification time.
"""
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 84f4e79 and 120a602.

📒 Files selected for processing (3)
  • snakemake/cli.py (4 hunks)
  • snakemake/rules.py (5 hunks)
  • snakemake/settings/types.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • snakemake/rules.py
🧰 Additional context used
📓 Path-based instructions (2)
snakemake/cli.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/settings/types.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🪛 Ruff
snakemake/cli.py

120-120: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

snakemake/settings/types.py

6-6: typing.List imported but unused

Remove unused import: typing.List

(F401)

🔇 Additional comments (3)
snakemake/settings/types.py (1)

6-6: Skipping comment about unused List import.

A previous review already correctly identified the unused List import issue.

🧰 Tools
🪛 Ruff

6-6: typing.List imported but unused

Remove unused import: typing.List

(F401)

snakemake/cli.py (2)

760-771: LGTM: CLI argument is well-defined.

The --consider-ancient argument is properly implemented with:

  • Clear help text explaining purpose and usage
  • Appropriate default value and parser function
  • Consistent style with other similar arguments

2032-2032: LGTM: Proper integration with workflow settings.

The consider_ancient parameter is correctly added to the WorkflowSettings constructor call.

johanneskoester and others added 3 commits October 25, 2024 09:31
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
@github-actions
Copy link
Copy Markdown
Contributor

Please format your code with black: black snakemake tests/*.py.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (2)
tests/test_ancient_cli/Snakefile (1)

10-17: Consider using named inputs for consistency.

While the rule works correctly, consider using named inputs (like in rule B) for consistency across the test file.

 rule A:
     input:
-        "1"
+        x="1"
     output:
         "2",
     shell:
-        "cat {input} > {output}"
+        "cat {input.x} > {output}"
tests/tests.py (1)

241-246: Add docstring and consider additional test cases.

The test implementation looks good but could be improved with:

  1. A docstring explaining the test's purpose and expected behavior
  2. Additional test cases covering edge cases like:
    • Invalid input format
    • Multiple ancient file specifications
    • Interaction with other CLI flags

Example docstring and additional test cases:

 def test_ancient_cli():
+    """Test the --consider-ancient CLI flag for marking input files as ancient.
+    
+    Validates that input files can be marked as ancient through the command line
+    interface, which prevents them from triggering reruns based on timestamps.
+    """
     run(
         dpath("test_ancient_cli"),
         shellcmd="snakemake --consider-ancient A=0 B=x",
     )

+def test_ancient_cli_invalid():
+    """Test error handling for invalid --consider-ancient arguments."""
+    run(
+        dpath("test_ancient_cli"),
+        shellcmd="snakemake --consider-ancient invalid_format",
+        shouldfail=True
+    )
+
+def test_ancient_cli_multiple():
+    """Test multiple --consider-ancient specifications."""
+    run(
+        dpath("test_ancient_cli"),
+        shellcmd="snakemake --consider-ancient A=0 --consider-ancient B=x",
+    )
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 120a602 and c97e319.

📒 Files selected for processing (9)
  • snakemake/cli.py (4 hunks)
  • snakemake/rules.py (5 hunks)
  • snakemake/settings/types.py (2 hunks)
  • tests/test_ancient_cli/Snakefile (1 hunks)
  • tests/test_ancient_cli/expected-results/1 (1 hunks)
  • tests/test_ancient_cli/expected-results/2 (1 hunks)
  • tests/test_ancient_cli/expected-results/3 (1 hunks)
  • tests/test_ancient_cli/expected-results/4 (1 hunks)
  • tests/tests.py (1 hunks)
✅ Files skipped from review due to trivial changes (4)
  • tests/test_ancient_cli/expected-results/1
  • tests/test_ancient_cli/expected-results/2
  • tests/test_ancient_cli/expected-results/3
  • tests/test_ancient_cli/expected-results/4
🚧 Files skipped from review as they are similar to previous changes (1)
  • snakemake/settings/types.py
🧰 Additional context used
📓 Path-based instructions (3)
snakemake/cli.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/rules.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

tests/tests.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🪛 Ruff
snakemake/cli.py

134-134: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (10)
tests/test_ancient_cli/Snakefile (4)

1-3: Clever test setup for validating ancient file behavior.

The initialization creates output files before input files with a time delay, which is an excellent test pattern for validating the --consider-ancient functionality. This ensures that input files (1,3) are newer than output files (2,4), which would typically trigger a rebuild without the ancient marking.


5-9: LGTM! Clear target definition.

The rule correctly specifies the final targets that correspond to the outputs of rules A and B.


18-24: LGTM! Well-structured rule definition.

The rule correctly uses named inputs and properly defines the file operation.


1-24: Verify test function implementation.

This Snakefile provides good coverage for the --consider-ancient feature, but let's ensure there's proper test function implementation.

✅ Verification successful

Test implementation is properly in place

The test function test_ancient_cli is implemented in tests/tests.py and correctly executes the Snakefile with the --consider-ancient flag, validating the functionality through the provided test case.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for corresponding test function
echo "Searching for test_ancient_cli function in test files..."
rg -l "def test_ancient_cli" tests/

Length of output: 160


Script:

#!/bin/bash
# Let's try a broader search to find test files and their content in the test_ancient_cli directory
echo "Searching for test files in test_ancient_cli directory..."
fd . tests/test_ancient_cli/ -t f -x cat {}

# Also search for any test functions that might be related to ancient functionality
echo -e "\nSearching for test functions related to ancient..."
rg "def test.*ancient" tests/

Length of output: 687


Script:

#!/bin/bash
# Let's check the parent directory structure and any test files that might contain the tests
echo "Checking parent directory structure..."
ls -la tests/test_ancient_cli/

# Let's also check for any pytest files in the parent directory
echo -e "\nChecking for test files in parent directory..."
ls -la tests/

# Search for any conftest.py that might contain the test implementation
echo -e "\nSearching for conftest.py files..."
fd conftest.py tests/

Length of output: 24879


Script:

#!/bin/bash
# Let's check the content of common.py and conftest.py as they might contain the test implementation
echo "Checking common.py for test implementation..."
cat tests/common.py

echo -e "\nChecking conftest.py for test implementation..."
cat tests/conftest.py

# Also check tests.py as it might contain the test implementation
echo -e "\nChecking tests.py for test implementation..."
rg "def test.*ancient" tests/tests.py

Length of output: 18997


Script:

#!/bin/bash
# Let's check the implementation of the ancient test functions in tests.py
echo "Checking test_ancient_cli implementation in tests.py..."
rg "def test_ancient_cli" tests/tests.py -A 20

Length of output: 594

snakemake/rules.py (4)

275-285: LGTM: Well-structured implementation of ancient file marking

The implementation correctly retrieves and applies the "consider_ancient" settings from workflow settings, handling both positional and keyword arguments appropriately.


410-410: LGTM: Clean implementation of ancient flag setting

The mark_ancient parameter is well integrated into the method, and the ancient flag is set at the appropriate point in the execution flow.

Also applies to: 492-493


473-475: LGTM: Proper handling of dependencies and temp files

The code correctly manages rule dependencies and implements conditional temp file marking based on workflow settings.

Also applies to: 478-480


275-285: Verify the ancient flag handling across the codebase

Let's verify the consistency of ancient flag handling and its usage across the codebase.

Also applies to: 410-410, 492-493

✅ Verification successful

Ancient flag handling is properly integrated and consistent across the codebase

The verification shows that the ancient flag implementation is well-integrated:

  • The mark_ancient parameter in _set_inoutput_item is used consistently for both positional and keyword inputs
  • Ancient flag handling is properly implemented in:
    • CLI parsing via --consider-ancient argument
    • Workflow settings via WorkflowSettings.consider_ancient
    • DAG execution logic in dag.py that correctly handles ancient files
    • IO system with proper is_ancient property checks
  • Documentation in docs/snakefiles/rules.rst accurately describes the behavior
  • Tests exist to verify the functionality (test_ancient, test_ancient_cli, test_ancient_dag)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage and handling of ancient flags

# Test 1: Check for other occurrences of ancient flag handling
echo "Checking for ancient flag handling..."
rg "ancient" -A 3 -B 3

# Test 2: Check for potential mark_ancient parameter usage
echo "Checking for mark_ancient parameter usage..."
ast-grep --pattern 'mark_ancient'

Length of output: 12876

snakemake/cli.py (2)

776-787: LGTM! Well-documented CLI argument.

The argument definition is clear, follows the established pattern, and includes comprehensive help text that explains both the functionality and its practical use case.


2048-2048: LGTM! Proper integration with workflow settings.

The consider_ancient parameter is correctly integrated into the WorkflowSettings constructor.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (2)
docs/project_info/faq.rst (2)

684-687: Enhance explanation with specific examples.

The explanation of consistency checks is comprehensive, but could be more helpful with specific examples of each case (e.g., what constitutes a "semantic level" change).

Consider adding examples like:

 This is based on file modification dates (input files may not be newer than output files of the same job), as well as execution metadata like the used software stack (e.g. conda env or container image), the non-file parameters, the set of input files, and the code of the rule.
+For example:
+- If you update a conda environment specification, Snakemake will rerun affected jobs
+- If you modify a rule's parameters, Snakemake will rerun that rule's jobs
+- If you change the set of input files, Snakemake will rerun the affected jobs

701-707: Add a note about wildcard support in CLI example.

The CLI usage example is clear, but it would be helpful to mention if wildcards are supported in the rule name or input specification.

Consider adding:

 This will mark the file ``inputfile.txt`` as ancient for the rule ``myrule``.
+Note that both rule names and input names support wildcards, e.g., ``snakemake --consider-ancient "align_*=*.fastq"``.
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 1ad5627 and 42fad41.

📒 Files selected for processing (2)
  • docs/project_info/faq.rst (1 hunks)
  • snakemake/profiles.py (1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
snakemake/profiles.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🔇 Additional comments (5)
snakemake/profiles.py (1)

56-56: LGTM! The addition of "consider-ancient" is well-integrated.

The change correctly adds "consider-ancient" to the list of keys that receive one-level dictionary formatting, consistent with similar configuration keys like "set-threads" and "resources". This enables the feature to mark input files as ancient via workflow profiles.

Let's verify the consistent handling of this new configuration key across the codebase:

✅ Verification successful

The "consider-ancient" configuration is properly integrated across the codebase

The verification confirms that the feature is consistently implemented:

  • CLI: Properly defined with appropriate argument handling and error messages
  • Settings: Type-safe implementation using a mapping type in settings/types.py
  • Documentation: Included in FAQ with both CLI and profile configuration examples
  • Profile: Correctly added to the special keys list (from original review)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the integration of the "consider-ancient" configuration
# Expected: Should find corresponding handling code in CLI and workflow settings

# Check for CLI argument handling
echo "Checking CLI argument handling..."
rg -l "consider-ancient" "snakemake/cli.py"

# Check for settings/configuration handling
echo "Checking settings handling..."
rg -l "consider_ancient" "snakemake/settings"

# Check for documentation
echo "Checking documentation updates..."
rg -l "consider-ancient" "docs/"

Length of output: 410


Script:

#!/bin/bash
# Let's examine the actual implementation and documentation to ensure consistency

# Check CLI implementation
echo "=== CLI Implementation ==="
rg "consider-ancient" -B 2 -A 2 snakemake/cli.py

# Check settings type definition
echo -e "\n=== Settings Type Definition ==="
rg "consider_ancient" -B 2 -A 2 snakemake/settings/types.py

# Check documentation content
echo -e "\n=== Documentation Content ==="
rg "consider-ancient" -B 2 -A 2 docs/project_info/faq.rst

Length of output: 1175

docs/project_info/faq.rst (4)

679-682: LGTM: Section reference and title are properly formatted.

The RST section reference and title follow the standard documentation format with proper underline length.


691-700: LGTM: Example rule is clear and well-formatted.

The example rule effectively demonstrates the concept with a realistic use case.


708-713: LGTM: Profile configuration example is clear and accurate.

The YAML configuration example correctly demonstrates how to persist the setting in a workflow-specific profile.


677-713: Documentation aligns well with PR objectives.

The new FAQ section effectively documents the feature for marking input files as ancient, which directly addresses the PR's objective of allowing users to override rerun triggers via the API and command line interface.

Let's verify the documentation matches the implementation:

✅ Verification successful

Documentation accurately reflects the implementation

The documentation in the FAQ section correctly describes both the command-line interface and profile configuration for marking input files as ancient. This is verified by:

  • CLI implementation in snakemake/cli.py matches the documented syntax --consider-ancient myrule=foo
  • Profile configuration in snakemake/profiles.py supports the documented YAML format with consider-ancient key
  • Rule implementation in snakemake/rules.py properly handles the ancient marking via _set_inoutput_item
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check if the CLI option is properly implemented
rg --type py "consider.?ancient"

# Check if the profile configuration is properly implemented
rg --type py "consider.?ancient.*config"

Length of output: 1122


Script:

#!/bin/bash
# Let's verify the implementation details more thoroughly

# Check the CLI parser implementation
rg -A 10 "def parse_consider_ancient" snakemake/cli.py

# Check the profile loading implementation
rg -A 5 "consider-ancient" snakemake/profiles.py

# Check how ancient marking is handled in rules
rg -A 5 "_set_inoutput_item.*mark_ancient" snakemake/rules.py

Length of output: 1282

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (1)
docs/project_info/faq.rst (1)

688-726: Consider enhancing the example with multiple input scenarios

While the documentation effectively explains how to mark input files as ancient both via CLI and workflow profiles, it could be more comprehensive by including:

  1. An example with multiple input files
  2. A note about the impact on downstream rules

Consider adding this additional example:

    rule myrule:
        input:
            foo="inputfile.txt",
+           bar="another.txt"
        output:
            "outputfile.txt"
        shell:
            "somecommand {input.foo} {input.bar} > {output}"

And the corresponding CLI example:

    $ snakemake --consider-ancient myrule=foo,bar

And profile example:

    consider-ancient:
-       myrule: foo
+       myrule: [foo, bar]
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 42fad41 and 9f2f4fb.

📒 Files selected for processing (2)
  • docs/project_info/faq.rst (1 hunks)
  • snakemake/cli.py (4 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
snakemake/cli.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

📓 Learnings (1)
snakemake/cli.py (1)
Learnt from: johanneskoester
PR: snakemake/snakemake#3171
File: snakemake/cli.py:106-145
Timestamp: 2024-10-29T09:26:26.636Z
Learning: In the `snakemake` codebase, exception chaining (using `raise ... from ...`) is avoided due to backward compatibility reasons. Do not suggest using exception chaining in `snakemake` codebase.
🪛 Ruff
snakemake/cli.py

134-134: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

🔇 Additional comments (6)
docs/project_info/faq.rst (3)

677-683: LGTM: Well-structured section header and placement

The new FAQ section is properly placed and follows the existing document structure with appropriate RST formatting.


684-687: LGTM: Clear explanation of Snakemake's consistency checks

The explanation accurately describes all factors that Snakemake considers when determining whether to rerun a rule:

  • File modification dates
  • Execution metadata
  • Software stack
  • Non-file parameters
  • Input file set
  • Rule code

679-680: LGTM: Proper RST reference label

The RST reference label consider_ancient is correctly formatted and placed, allowing cross-referencing from other documentation files.

snakemake/cli.py (3)

776-788: LGTM! Well documented CLI argument.

The implementation follows the established pattern for CLI arguments and provides clear documentation about the purpose and format of the --consider-ancient option.


2049-2049: LGTM! Proper integration with workflow settings.

The consider_ancient setting is correctly integrated into the WorkflowSettings class.


106-145: 🛠️ Refactor suggestion

Consider validating empty items in the comma-separated list.

The implementation is solid, but it could benefit from validating empty or whitespace-only items in the comma-separated list of input items. Currently, items.split(",") could produce empty strings that might cause issues.

Consider this improvement:

 def parse_consider_ancient(
     args: Optional[List[str]],
 ) -> Mapping[str, Set[Union[str, int]]]:
     """Parse command line arguments for marking input files as ancient.
 
     Args:
         args: List of RULE=INPUTITEMS pairs, where INPUTITEMS is a comma-separated list
               of input item names or indices (0-based).
 
     Returns:
         A mapping of rules to sets of their ancient input items.
 
     Raises:
         ValueError: If the format is invalid or values cannot be parsed.
     """
     errmsg = (
         "Invalid --consider-ancient definition: entries have to be defined as "
         "RULE=INPUTITEMS pairs, with INPUTITEMS being a list of input items of the "
         "rule (given as name or index (0-based)), separated by commas."
     )
 
     def parse_item(item: str) -> Union[str, int]:
         item = item.strip()
         if not item:
             raise ValueError(f"{errmsg} (Empty input item)")
         try:
             return int(item)
         except ValueError:
             if item.isidentifier():
                 return item
             else:
                 raise ValueError(f"{errmsg} (Unparsable value: {repr(item)})")
 
     consider_ancient = defaultdict(set)
 
     if args is not None:
         for entry in args:
             rule, items = parse_key_value_arg(entry, errmsg=errmsg, strip_quotes=True)
             if not rule.isidentifier():
                 raise ValueError(f"{errmsg} (Invalid rule name: {repr(rule)})")
-            items = items.split(",")
+            items = [item for item in items.split(",") if item.strip()]
+            if not items:
+                raise ValueError(f"{errmsg} (No valid input items provided)")
             consider_ancient[rule] = {parse_item(item) for item in items}
     return consider_ancient

Likely invalid or redundant comment.

🧰 Tools
🪛 Ruff

134-134: Within an except clause, raise exceptions with raise ... from err or raise ... from None to distinguish them from errors in exception handling

(B904)

@johanneskoester johanneskoester merged commit 6f3aed3 into main Oct 29, 2024
@johanneskoester johanneskoester deleted the feat/ancient-cli branch October 29, 2024 10:57
johanneskoester pushed a commit that referenced this pull request Oct 29, 2024
🤖 I have created a release *beep* *boop*
---


##
[8.25.0](v8.24.1...v8.25.0)
(2024-10-29)


### Features

* add first 5 rules to group name (used e.g. when naming cluster/cloud
jobs or logfiles)
([#3168](#3168))
([5657122](5657122))
* allow to mark input files of rules as ancient via the API or command
line interface (and thereby also via workflow specific profiles).
Putting this into a workflow specific profile (or specifying as
argument) allows to overrule rerun triggers caused by file modification
dates where the user knows better.
([#3171](#3171))
([6f3aed3](6f3aed3))


### Bug Fixes

* skip storage object when cloning flags for shadowed IO
([#3174](#3174))
([d733fed](d733fed))
* use permission safe copying when hidden conda files are already
present in a workdir. This avoids problems in case multiple people use
the same workdir and workflow.
([#3169](#3169))
([c98b2e7](c98b2e7))


### Documentation

* add tutorial references and small syntax fix
([#3172](#3172))
([6bee12a](6bee12a))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants