fix: ignore derived parameters when storing job metadata and inferring rerun necessity of jobs (this gets rid of spurious triggers caused by e.g. changed resources, threads, remote storage configuration) by johanneskoester · Pull Request #3154 · snakemake/snakemake

johanneskoester · 2024-10-18T07:20:39Z

QC

The PR contains a test case for the changes or the changes are already covered by an existing test case.
The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

New Features
- Enhanced parameter management in the Job class for better handling of non-derived parameters.
- Improved input function handling in the Rule class, allowing for differentiation between derived and non-derived parameters.
Bug Fixes
- Updated methods in the Persistence class to ensure type safety and clarity.
Documentation
- Added type hints to method signatures for improved code understanding.

…g rerun necessity of jobs (this gets rid of spurious triggers caused by e.g. changed resources, threads, remote storage configuration)

coderabbitai · 2024-10-18T07:20:45Z

Caution

Review failed

The head commit changed during the review from 1ab3bf0 to debc2ef.

Walkthrough

The pull request introduces enhancements to parameter management across several classes in the Snakemake framework. In snakemake/jobs.py, a new private attribute _non_derived_params is added to the Job class, along with a method _expand_params to manage parameters more effectively. The Persistence class in snakemake/persistence.py sees improvements in type hinting and method signatures, enhancing clarity and type safety. Lastly, the Rule class in snakemake/rules.py gains a new method to identify deriving functions and updates to handle non-derived items in input functions.

Changes

File	Change Summary
snakemake/jobs.py	- Added private attribute `_non_derived_params` in `Job` class. - Introduced method `_expand_params` in `Job` class. - Added property `non_derived_params`.
snakemake/persistence.py	- Updated method signatures with type hints for `_serialize_param_builtin`, `_serialize_param_pandas`, `_params`, and `record_format_version`.
snakemake/rules.py	- Added class method `_is_deriving_function` in `Rule` class. - Updated `apply_input_function` to include `non_derived_items` parameter. - Modified `expand_params` to return a tuple with non-derived parameters.

Possibly related PRs

feat: more robust parameter and code change detection and transparent reporting of detected changes #3140: This PR enhances the detection of parameter changes within the Snakemake framework, which is closely related to the changes made in the main PR that also focuses on parameter management in the Job class.
fix: fix bug in code change detection leading to spurious code change reporting when relying on older snakemake metadata #3144: This PR addresses a bug in code change detection that relates to how parameters are managed and reported, which connects to the main PR's updates to the Job class's parameter handling.
fix: more robust handling of input metadata that avoids storing user or typoe specific local paths, and properly handles pipe or service files #3150: This PR improves the handling of input metadata and updates the Persistence class, which is relevant to the changes in the main PR that also involve parameter management and storage.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 756dc70 and 972b1ca.

📒 Files selected for processing (3)

snakemake/jobs.py (3 hunks)
snakemake/persistence.py (4 hunks)
snakemake/rules.py (9 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

snakemake/jobs.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/persistence.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/rules.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🪛 Ruff

snakemake/persistence.py

15-15: functools.partial imported but unused

Remove unused import: functools.partial

(F401)

586-586: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks

(B019)

🔇 Additional comments (17)

snakemake/persistence.py (1)
587-587: ⚠️ Potential issue

Avoid using functools.lru_cache on instance methods to prevent memory leaks.

Applying @lru_cache to instance methods can lead to memory leaks because the self instance is part of the cache key, and the cache can grow indefinitely with different instances. Consider refactoring _params to avoid using @lru_cache on an instance method.

Apply this diff to remove the decorator:
-    @lru_cache()
     def _params(self, job: Job):
If caching is necessary, consider using an instance-level caching mechanism or limiting the cache size.

Likely invalid or redundant comment.
snakemake/rules.py (12)
699-699: Correct Initialization of non_derived_items Parameter

The optional parameter non_derived_items: typing.List[typing.Any] = None is properly initialized to None, which is a good practice to avoid mutable default arguments.

709-709: Initialization of is_derived Variable

The variable is_derived is appropriately initialized to False, ensuring that it has a defined state before usage.

722-723: Conditional Check for Deriving Functions

The code correctly checks if non_derived_items is not None before determining if the item is a deriving function:
if non_derived_items is not None:
    is_derived = self._is_deriving_function(item)
746-751: Proper Use of zip for Unpacking

The use of zip to construct apply_results ensures that each list is correctly aligned, facilitating the unpacking process in subsequent iterations:
apply_results = zip(
    [None] * len(item),
    item,
    [_is_callable] * len(item),
    [is_derived] * len(item),
)
754-757: Building apply_results from Dictionary Items

When item is a dictionary, the comprehension accurately constructs apply_results by pairing keys and values along with the callable and derived status:
apply_results = [
    (name, item, _is_callable, is_derived)
    for name, item in item.items()
]
759-759: Handling Single Items in apply_results

For single items, apply_results is correctly formed as a list containing a single tuple:
apply_results = [(name, item, _is_callable, is_derived)]
761-761: Correct Unpacking in Loop

The for-loop accurately unpacks name, item, from_callable, and is_derived from apply_results, which is essential for processing each element:
for name, item, from_callable, is_derived in apply_results:
785-786: Appending Non-Derived Items

Non-derived items are properly appended to non_derived_items when they are identified as such:
if not is_derived and non_derived_items is not None:
    non_derived_items.append(concrete)
848-859: Addition of _is_deriving_function Class Method

The new class method _is_deriving_function effectively identifies deriving functions based on the presence of specific parameters, enhancing the classification of function types:
@classmethod
def _is_deriving_function(cls, func):
    if is_callable(func):
        func_params = get_function_params(func)
        return (
            "input" in func_params
            or "output" in func_params
            or "threads" in func_params
            or "resources" in func_params
        )
    return False
891-891: Initialization of non_derived_params List

The list non_derived_params is correctly initialized to collect non-derived parameters during parameter expansion:
non_derived_params = []
912-912: Passing non_derived_items to _apply_wildcards

The inclusion of non_derived_items=non_derived_params in the _apply_wildcards method call allows for tracking of non-derived parameters:
non_derived_items=non_derived_params,
925-925: Returning Non-Derived Parameters

The expand_params method now returns a tuple containing both params and non_derived_params, which enhances the method's functionality:
return params, non_derived_params
snakemake/jobs.py (4)

225-225: Addition of _non_derived_params to __slots__ and initialization

The attribute _non_derived_params is appropriately added to __slots__ and initialized to None in the constructor. This maintains consistency and optimizes memory usage in classes using __slots__.

Also applies to: 279-279

424-424: Lazy initialization of params using _expand_params

The modification to the params property to lazily initialize _params by calling _expand_params() enhances code maintainability and ensures that parameters are expanded only when needed.

427-432: Introduction of non_derived_params property for better encapsulation

The new non_derived_params property provides access to _non_derived_params with lazy initialization, following the same pattern as the params property. This improves code clarity and encapsulation.

433-437: ⚠️ Potential issue

Ensure Rule.expand_params returns both params and non_derived_params

The _expand_params method expects self.rule.expand_params() to return a tuple of (params, non_derived_params). Please verify that the expand_params method in the Rule class has been updated accordingly to return both values to prevent a TypeError.

To verify, check the implementation of expand_params in the Rule class to ensure it returns a tuple containing both params and non_derived_params.

snakemake/persistence.py

sonarqubecloud · 2024-10-18T07:45:59Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

🤖 I have created a release *beep* *boop* --- ## [8.23.2](v8.23.1...v8.23.2) (2024-10-18) ### Bug Fixes * ignore derived parameters when storing job metadata and inferring rerun necessity of jobs (this gets rid of spurious triggers caused by e.g. changed resources, threads, remote storage configuration) ([#3154](#3154)) ([73ce212](73ce212)) * more robust handling of input metadata that avoids storing user or typoe specific local paths, and properly handles pipe or service files ([#3150](#3150)) ([756dc70](756dc70)) ### Documentation * Fix missing spaces in CLI help text ([#3146](#3146)) ([6416c36](6416c36)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

johanneskoester added 4 commits October 18, 2024 06:48

fix: ignore derived parameters when storing job metadata and inferrin…

072c0f2

…g rerun necessity of jobs (this gets rid of spurious triggers caused by e.g. changed resources, threads, remote storage configuration)

improve code

253389d

fix method

85d2ee5

fix method

972b1ca

johanneskoester changed the title ~~Fix/non derived param change detection~~ fix: ignore derived parameters when storing job metadata and inferring rerun necessity of jobs (this gets rid of spurious triggers caused by e.g. changed resources, threads, remote storage configuration) Oct 18, 2024

coderabbitai bot reviewed Oct 18, 2024

View reviewed changes

snakemake/persistence.py Outdated Show resolved Hide resolved

snakemake/persistence.py Outdated Show resolved Hide resolved

fix

1ab3bf0

remove unused import

debc2ef

johanneskoester mentioned this pull request Oct 18, 2024

feat: add derived flag for marking params as being derived from e.g. input or output files. For such params, Snakemake does not trigger reruns upon change (since the original (e.g. input) value change is already enough) #3153

Closed

2 tasks

johanneskoester merged commit 73ce212 into main Oct 18, 2024

johanneskoester deleted the fix/non-derived-param-change-detection branch October 18, 2024 09:49

github-actions bot mentioned this pull request Oct 18, 2024

chore(main): release 8.23.2 #3149

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: ignore derived parameters when storing job metadata and inferring rerun necessity of jobs (this gets rid of spurious triggers caused by e.g. changed resources, threads, remote storage configuration)#3154

fix: ignore derived parameters when storing job metadata and inferring rerun necessity of jobs (this gets rid of spurious triggers caused by e.g. changed resources, threads, remote storage configuration)#3154
johanneskoester merged 6 commits intomainfrom
fix/non-derived-param-change-detection

johanneskoester commented Oct 18, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 18, 2024 •

edited

Loading

Review failed

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Oct 18, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

johanneskoester commented Oct 18, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 18, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

sonarqubecloud bot commented Oct 18, 2024

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

johanneskoester commented Oct 18, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 18, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)