Skip to content

fix: ignore derived parameters when storing job metadata and inferring rerun necessity of jobs (this gets rid of spurious triggers caused by e.g. changed resources, threads, remote storage configuration)#3154

Merged
johanneskoester merged 6 commits intomainfrom
fix/non-derived-param-change-detection
Oct 18, 2024

Conversation

@johanneskoester
Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester commented Oct 18, 2024

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • New Features

    • Enhanced parameter management in the Job class for better handling of non-derived parameters.
    • Improved input function handling in the Rule class, allowing for differentiation between derived and non-derived parameters.
  • Bug Fixes

    • Updated methods in the Persistence class to ensure type safety and clarity.
  • Documentation

    • Added type hints to method signatures for improved code understanding.

…g rerun necessity of jobs (this gets rid of spurious triggers caused by e.g. changed resources, threads, remote storage configuration)
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Oct 18, 2024

Caution

Review failed

The head commit changed during the review from 1ab3bf0 to debc2ef.

Walkthrough

The pull request introduces enhancements to parameter management across several classes in the Snakemake framework. In snakemake/jobs.py, a new private attribute _non_derived_params is added to the Job class, along with a method _expand_params to manage parameters more effectively. The Persistence class in snakemake/persistence.py sees improvements in type hinting and method signatures, enhancing clarity and type safety. Lastly, the Rule class in snakemake/rules.py gains a new method to identify deriving functions and updates to handle non-derived items in input functions.

Changes

File Change Summary
snakemake/jobs.py - Added private attribute _non_derived_params in Job class.
- Introduced method _expand_params in Job class.
- Added property non_derived_params.
snakemake/persistence.py - Updated method signatures with type hints for _serialize_param_builtin, _serialize_param_pandas, _params, and record_format_version.
snakemake/rules.py - Added class method _is_deriving_function in Rule class.
- Updated apply_input_function to include non_derived_items parameter.
- Modified expand_params to return a tuple with non-derived parameters.

Possibly related PRs


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@johanneskoester johanneskoester changed the title Fix/non derived param change detection fix: ignore derived parameters when storing job metadata and inferring rerun necessity of jobs (this gets rid of spurious triggers caused by e.g. changed resources, threads, remote storage configuration) Oct 18, 2024
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 756dc70 and 972b1ca.

📒 Files selected for processing (3)
  • snakemake/jobs.py (3 hunks)
  • snakemake/persistence.py (4 hunks)
  • snakemake/rules.py (9 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
snakemake/jobs.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/persistence.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/rules.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🪛 Ruff
snakemake/persistence.py

15-15: functools.partial imported but unused

Remove unused import: functools.partial

(F401)


586-586: Use of functools.lru_cache or functools.cache on methods can lead to memory leaks

(B019)

🔇 Additional comments (17)
snakemake/persistence.py (1)

587-587: ⚠️ Potential issue

Avoid using functools.lru_cache on instance methods to prevent memory leaks.

Applying @lru_cache to instance methods can lead to memory leaks because the self instance is part of the cache key, and the cache can grow indefinitely with different instances. Consider refactoring _params to avoid using @lru_cache on an instance method.

Apply this diff to remove the decorator:

-    @lru_cache()
     def _params(self, job: Job):

If caching is necessary, consider using an instance-level caching mechanism or limiting the cache size.

Likely invalid or redundant comment.

snakemake/rules.py (12)

699-699: Correct Initialization of non_derived_items Parameter

The optional parameter non_derived_items: typing.List[typing.Any] = None is properly initialized to None, which is a good practice to avoid mutable default arguments.


709-709: Initialization of is_derived Variable

The variable is_derived is appropriately initialized to False, ensuring that it has a defined state before usage.


722-723: Conditional Check for Deriving Functions

The code correctly checks if non_derived_items is not None before determining if the item is a deriving function:

if non_derived_items is not None:
    is_derived = self._is_deriving_function(item)

746-751: Proper Use of zip for Unpacking

The use of zip to construct apply_results ensures that each list is correctly aligned, facilitating the unpacking process in subsequent iterations:

apply_results = zip(
    [None] * len(item),
    item,
    [_is_callable] * len(item),
    [is_derived] * len(item),
)

754-757: Building apply_results from Dictionary Items

When item is a dictionary, the comprehension accurately constructs apply_results by pairing keys and values along with the callable and derived status:

apply_results = [
    (name, item, _is_callable, is_derived)
    for name, item in item.items()
]

759-759: Handling Single Items in apply_results

For single items, apply_results is correctly formed as a list containing a single tuple:

apply_results = [(name, item, _is_callable, is_derived)]

761-761: Correct Unpacking in Loop

The for-loop accurately unpacks name, item, from_callable, and is_derived from apply_results, which is essential for processing each element:

for name, item, from_callable, is_derived in apply_results:

785-786: Appending Non-Derived Items

Non-derived items are properly appended to non_derived_items when they are identified as such:

if not is_derived and non_derived_items is not None:
    non_derived_items.append(concrete)

848-859: Addition of _is_deriving_function Class Method

The new class method _is_deriving_function effectively identifies deriving functions based on the presence of specific parameters, enhancing the classification of function types:

@classmethod
def _is_deriving_function(cls, func):
    if is_callable(func):
        func_params = get_function_params(func)
        return (
            "input" in func_params
            or "output" in func_params
            or "threads" in func_params
            or "resources" in func_params
        )
    return False

891-891: Initialization of non_derived_params List

The list non_derived_params is correctly initialized to collect non-derived parameters during parameter expansion:

non_derived_params = []

912-912: Passing non_derived_items to _apply_wildcards

The inclusion of non_derived_items=non_derived_params in the _apply_wildcards method call allows for tracking of non-derived parameters:

non_derived_items=non_derived_params,

925-925: Returning Non-Derived Parameters

The expand_params method now returns a tuple containing both params and non_derived_params, which enhances the method's functionality:

return params, non_derived_params
snakemake/jobs.py (4)

225-225: Addition of _non_derived_params to __slots__ and initialization

The attribute _non_derived_params is appropriately added to __slots__ and initialized to None in the constructor. This maintains consistency and optimizes memory usage in classes using __slots__.

Also applies to: 279-279


424-424: Lazy initialization of params using _expand_params

The modification to the params property to lazily initialize _params by calling _expand_params() enhances code maintainability and ensures that parameters are expanded only when needed.


427-432: Introduction of non_derived_params property for better encapsulation

The new non_derived_params property provides access to _non_derived_params with lazy initialization, following the same pattern as the params property. This improves code clarity and encapsulation.


433-437: ⚠️ Potential issue

Ensure Rule.expand_params returns both params and non_derived_params

The _expand_params method expects self.rule.expand_params() to return a tuple of (params, non_derived_params). Please verify that the expand_params method in the Rule class has been updated accordingly to return both values to prevent a TypeError.

To verify, check the implementation of expand_params in the Rule class to ensure it returns a tuple containing both params and non_derived_params.

@sonarqubecloud
Copy link
Copy Markdown

@johanneskoester johanneskoester merged commit 73ce212 into main Oct 18, 2024
@johanneskoester johanneskoester deleted the fix/non-derived-param-change-detection branch October 18, 2024 09:49
johanneskoester pushed a commit that referenced this pull request Oct 18, 2024
🤖 I have created a release *beep* *boop*
---


##
[8.23.2](v8.23.1...v8.23.2)
(2024-10-18)


### Bug Fixes

* ignore derived parameters when storing job metadata and inferring
rerun necessity of jobs (this gets rid of spurious triggers caused by
e.g. changed resources, threads, remote storage configuration)
([#3154](#3154))
([73ce212](73ce212))
* more robust handling of input metadata that avoids storing user or
typoe specific local paths, and properly handles pipe or service files
([#3150](#3150))
([756dc70](756dc70))


### Documentation

* Fix missing spaces in CLI help text
([#3146](#3146))
([6416c36](6416c36))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant