Skip to content

feat: show info on missing metadata#3014

Merged
johanneskoester merged 12 commits intomainfrom
fix/notify-of-missing-provenance-info
Aug 14, 2024
Merged

feat: show info on missing metadata#3014
johanneskoester merged 12 commits intomainfrom
fix/notify-of-missing-provenance-info

Conversation

@johanneskoester
Copy link
Copy Markdown
Contributor

@johanneskoester johanneskoester commented Aug 13, 2024

In dryrun, if there is no metadata recorded for certain jobs, we now print an info.

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • New Features

    • Introduced a function to extract and sort rule names from job instances, enhancing reporting capabilities.
    • Added a method to verify the presence of metadata for job outputs, improving job handling robustness.
    • Implemented logging for jobs that lack recorded metadata or provenance, providing better visibility.
  • Bug Fixes

    • Enhanced control flow in job metadata evaluation to prevent unnecessary checks when metadata is absent.
  • Chores

    • Improved type safety and clarity through the addition of type hinting.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Aug 13, 2024

Walkthrough

The updates to Snakemake enhance job metadata handling and logging capabilities. A new function, jobs_to_rulenames, facilitates the extraction of unique rule names from job objects. The update_needrun function now includes checks for job metadata before further evaluations. Additionally, the Reason class incorporates a no_metadata attribute to manage the absence of metadata. These changes improve clarity, efficiency, and robustness in managing workflow executions.

Changes

Files Change Summary
snakemake/common/__init__.py Added jobs_to_rulenames function for extracting and sorting rule names from job objects.
snakemake/dag.py Modified update_needrun to check for job metadata before further evaluations.
snakemake/jobs.py Introduced no_metadata attribute in Reason class for handling job metadata absence; added jobs_to_rulenames function.
snakemake/persistence.py Added has_metadata method to check if all outputs of a job contain required metadata.
snakemake/workflow.py Enhanced execute function with logging for missing job metadata, including a new helper function.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 7e2d075 and 74171fc.

Files selected for processing (5)
  • snakemake/common/init.py (2 hunks)
  • snakemake/dag.py (1 hunks)
  • snakemake/jobs.py (2 hunks)
  • snakemake/persistence.py (1 hunks)
  • snakemake/workflow.py (4 hunks)
Additional context used
Ruff
snakemake/common/__init__.py

354-354: Missing type annotation for function argument jobs

(ANN001)


355-355: Unnecessary generator (rewrite as a set comprehension)

Rewrite as a set comprehension

(C401)

snakemake/persistence.py

375-375: Missing return type annotation for public function has_metadata

(ANN201)


375-375: Missing type annotation for self in method

(ANN101)


375-375: Missing type annotation for function argument job

(ANN001)

snakemake/workflow.py

1147-1147: Missing return type annotation for private function log_missing_metadata_info

Add return type annotation: None

(ANN202)


1156-1156: Trailing comma missing

Add trailing comma

(COM812)


1247-1247: Trailing comma missing

Add trailing comma

(COM812)

snakemake/dag.py

1266-1266: Trailing comma missing

Add trailing comma

(COM812)


1270-1270: Trailing comma missing

Add trailing comma

(COM812)


1274-1277: Unnecessary list comprehension

Remove unnecessary comprehension

(C419)


1277-1277: Trailing comma missing

Add trailing comma

(COM812)


1279-1279: Trailing comma missing

Add trailing comma

(COM812)


1286-1286: Trailing comma missing

Add trailing comma

(COM812)


1288-1288: Trailing comma missing

Add trailing comma

(COM812)

Additional comments not posted (7)
snakemake/jobs.py (1)

Line range hint 1638-1659: LGTM! The addition of no_metadata is consistent.

The new attribute no_metadata in the Reason class is initialized to False. This change is consistent with the class's purpose and enhances its functionality.

Tools
Ruff

1641-1641: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


1641-1641: Missing type annotation for self in method

(ANN101)

snakemake/workflow.py (2)

1247-1247: Add a trailing comma in the logger message.

The logger message in log_provenance_info is missing a trailing comma for better formatting.

logger.info(
    "Rules with provenance triggered jobs: "
-   + " ".join(jobs_to_rulenames(provenance_triggered_jobs))
+   + " ".join(jobs_to_rulenames(provenance_triggered_jobs)),
)

Likely invalid or redundant comment.

Tools
Ruff

1247-1247: Trailing comma missing

Add trailing comma

(COM812)


1218-1219: Integration of log_missing_metadata_info looks good.

The call to log_missing_metadata_info is appropriately placed to log missing metadata when no jobs are triggered.

Ensure that this function is invoked correctly in all necessary scenarios.

Verification successful

Integration of log_missing_metadata_info is correct and localized.

The function log_missing_metadata_info is defined and used within snakemake/workflow.py, indicating proper integration without external dependencies.

  • File: snakemake/workflow.py
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify the usage of `log_missing_metadata_info` in the codebase.

# Test: Search for the function usage. Expect: Only occurrences of the correct usage.
rg --type python $'log_missing_metadata_info'

Length of output: 75


Script:

#!/bin/bash
# Description: Verify the usage of `log_missing_metadata_info` in the codebase.

# Test: Search for the function usage. Expect: Only occurrences of the correct usage.
rg --type py $'log_missing_metadata_info'

Length of output: 177

Tools
Ruff

1219-1219: Do not implicitly return None in function able to return non-None value

Add explicit None return value

(RET502)

snakemake/dag.py (4)

1261-1262: Set reason.no_metadata earlier.

The reason.no_metadata flag is set immediately when metadata is absent, which is a good practice to avoid unnecessary checks. This change enhances the control flow and logic of the function.


1269-1270: Add a trailing comma in input_changed.

A trailing comma is missing in the list comprehension for input_changed. Adding it improves code consistency and readability.

- self.workflow.persistence.input_changed(job)
+ self.workflow.persistence.input_changed(job),

Likely invalid or redundant comment.

Tools
Ruff

1270-1270: Trailing comma missing

Add trailing comma

(COM812)


1285-1286: Add a trailing comma in conda_env_changed.

A trailing comma is missing in the list comprehension for conda_env_changed. Adding it improves code consistency and readability.

- self.workflow.persistence.conda_env_changed(job)
+ self.workflow.persistence.conda_env_changed(job),

Likely invalid or redundant comment.

Tools
Ruff

1286-1286: Trailing comma missing

Add trailing comma

(COM812)


1279-1279: Add a trailing comma in code_changed.

A trailing comma is missing in the list comprehension for code_changed. Adding it improves code consistency and readability.

- self.workflow.persistence.code_changed(job)
+ self.workflow.persistence.code_changed(job),

Likely invalid or redundant comment.

Tools
Ruff

1279-1279: Trailing comma missing

Add trailing comma

(COM812)

johanneskoester and others added 2 commits August 13, 2024 16:01
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 74171fc and 23fa1bc.

Files selected for processing (1)
  • snakemake/workflow.py (4 hunks)
Additional context used
Ruff
snakemake/workflow.py

1147-1147: Missing return type annotation for private function log_missing_metadata_info

Add return type annotation: None

(ANN202)


1158-1158: Trailing comma missing

Add trailing comma

(COM812)


1249-1249: Trailing comma missing

Add trailing comma

(COM812)

Additional comments not posted (2)
snakemake/workflow.py (2)

1220-1221: Integration of log_missing_metadata_info in dry run logic looks good.

The addition of log_missing_metadata_info during a dry run enhances the visibility of missing metadata. The integration appears correct.

Tools
Ruff

1221-1221: Do not implicitly return None in function able to return non-None value

Add explicit None return value

(RET502)


1158-1159: Add a trailing comma for better formatting.

A trailing comma is missing in the logger message for better formatting and to prevent potential issues when extending the list.

- f"metadata: {' '.join(jobs_to_rulenames(no_metadata_jobs))}"
+ f"metadata: {' '.join(jobs_to_rulenames(no_metadata_jobs))}",

Likely invalid or redundant comment.

Tools
Ruff

1158-1158: Trailing comma missing

Add trailing comma

(COM812)

@github-actions
Copy link
Copy Markdown
Contributor

Please format your code with black: black snakemake tests/*.py.

@sonarqubecloud
Copy link
Copy Markdown

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: ASSERTIVE

Commits

Files that changed from the base of the PR and between 23fa1bc and 94124b4.

Files selected for processing (4)
  • snakemake/dag.py (1 hunks)
  • snakemake/jobs.py (4 hunks)
  • snakemake/persistence.py (2 hunks)
  • snakemake/workflow.py (4 hunks)
Additional context used
Path-based instructions (4)
snakemake/persistence.py (1)

Pattern **/*.py: Do not suggest to add trailing commas.
Do not suggest type annotations for functions that are defined inside of functions or methods.

snakemake/jobs.py (1)

Pattern **/*.py: Do not suggest to add trailing commas.
Do not suggest type annotations for functions that are defined inside of functions or methods.

snakemake/workflow.py (1)

Pattern **/*.py: Do not suggest to add trailing commas.
Do not suggest type annotations for functions that are defined inside of functions or methods.

snakemake/dag.py (1)

Pattern **/*.py: Do not suggest to add trailing commas.
Do not suggest type annotations for functions that are defined inside of functions or methods.

Learnings (2)
snakemake/workflow.py (2)
Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1250-1251
Timestamp: 2024-08-13T16:22:58.430Z
Learning: Avoid commenting on trailing commas in code reviews for the Snakemake repository as per the user's preference.
Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1147-1147
Timestamp: 2024-08-13T16:22:09.641Z
Learning: Avoid suggesting type annotations for functions that are inside methods in the Snakemake codebase.
snakemake/dag.py (3)
Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/dag.py:1265-1266
Timestamp: 2024-08-13T16:09:09.726Z
Learning: For the Snakemake project, avoid suggesting trailing commas as they use Black as an autoformatter.
Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1250-1251
Timestamp: 2024-08-13T16:22:58.430Z
Learning: Avoid commenting on trailing commas in code reviews for the Snakemake repository as per the user's preference.
Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:0-0
Timestamp: 2024-08-13T16:07:33.369Z
Learning: In the Snakemake project, trailing commas are not needed in logger messages if Black formatting is used, as it doesn't require them in such contexts.
Ruff
snakemake/persistence.py

375-375: Missing type annotation for self in method

(ANN101)

snakemake/workflow.py

1147-1147: Missing return type annotation for private function log_missing_metadata_info

Add return type annotation: None

(ANN202)


1159-1159: Trailing comma missing

Add trailing comma

(COM812)


1250-1250: Trailing comma missing

Add trailing comma

(COM812)

snakemake/dag.py

1266-1266: Trailing comma missing

Add trailing comma

(COM812)


1270-1270: Trailing comma missing

Add trailing comma

(COM812)


1277-1280: Unnecessary list comprehension

Remove unnecessary comprehension

(C419)


1280-1280: Trailing comma missing

Add trailing comma

(COM812)


1282-1282: Trailing comma missing

Add trailing comma

(COM812)


1289-1289: Trailing comma missing

Add trailing comma

(COM812)


1291-1291: Trailing comma missing

Add trailing comma

(COM812)

Additional comments not posted (11)
snakemake/persistence.py (1)

375-376: Add type annotations for has_metadata method.

The method has_metadata is missing type annotations for its self parameter. Consider adding these annotations for better clarity.

- def has_metadata(self, job: Job) -> bool:
+ def has_metadata(self: 'Persistence', job: Job) -> bool:
Tools
Ruff

375-375: Missing type annotation for self in method

(ANN101)

snakemake/jobs.py (2)

Line range hint 1638-1659: Initialize no_metadata attribute in Reason class.

The no_metadata attribute is initialized to False in the Reason class. This addition allows the class to manage scenarios where metadata is missing. Ensure that this attribute is used consistently across the codebase to handle such cases.

Tools
Ruff

1641-1641: Missing return type annotation for special method __init__

Add return type annotation: None

(ANN204)


1641-1641: Missing type annotation for self in method

(ANN101)


1805-1806: Function jobs_to_rulenames is well-defined.

The function jobs_to_rulenames correctly extracts and returns a sorted list of unique rule names from an iterable of Job objects. This utility function enhances job handling by organizing rule names effectively.

snakemake/workflow.py (3)

1221-1222: Integration of log_missing_metadata_info is appropriate.

The call to log_missing_metadata_info during a dry run effectively enhances the visibility of jobs missing metadata.

Tools
Ruff

1222-1222: Do not implicitly return None in function able to return non-None value

Add explicit None return value

(RET502)


1250-1251: LGTM! The function log_provenance_info is implemented correctly.

The use of jobs_to_rulenames enhances readability and aligns with the objective of improving logging clarity.

Tools
Ruff

1250-1250: Trailing comma missing

Add trailing comma

(COM812)


1147-1160: LGTM! Verify the function usage in the codebase.

The function log_missing_metadata_info is implemented correctly and aligns with the PR objectives to improve metadata visibility.

Ensure that this function is called in the appropriate contexts to achieve its intended purpose.

Verification successful

Let's correct the file type and re-run the search to verify the usage of the log_missing_metadata_info function.


Function log_missing_metadata_info is correctly utilized within the codebase.

The function is called within snakemake/workflow.py, ensuring that jobs without metadata are logged as intended.

  • File: snakemake/workflow.py
  • Function Call: Verified within the same file.
Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify all function calls to `log_missing_metadata_info`.

# Test: Search for the function usage. Expect: Occurrences of the function being called in appropriate contexts.
rg --type python -A 5 $'log_missing_metadata_info'

Length of output: 80


Script:

#!/bin/bash
# Description: Verify all function calls to `log_missing_metadata_info`.

# Test: Search for the function usage. Expect: Occurrences of the function being called in appropriate contexts.
rg --type py -A 5 $'log_missing_metadata_info'

Length of output: 799

Tools
Ruff

1147-1147: Missing return type annotation for private function log_missing_metadata_info

Add return type annotation: None

(ANN202)


1159-1159: Trailing comma missing

Add trailing comma

(COM812)

snakemake/dag.py (5)

1261-1262: LGTM: Metadata check added.

The addition of the metadata check before further evaluations is a logical improvement. This prevents unnecessary checks when metadata is absent.


1264-1267: LGTM: Conditional parameter change check.

The code correctly checks for parameter changes only if the corresponding rerun trigger is set. This ensures evaluations are performed only when necessary.

Tools
Ruff

1266-1266: Trailing comma missing

Add trailing comma

(COM812)


1269-1271: LGTM: Conditional input change check.

The code follows the pattern of checking for input changes only if the rerun trigger is set, maintaining consistency and efficiency.

Tools
Ruff

1270-1270: Trailing comma missing

Add trailing comma

(COM812)


1276-1283: LGTM: Code change check with list comprehension.

The list comprehension is necessary to collect async generator items before applying any(). This ensures proper evaluation of all items.

Tools
Ruff

1277-1280: Unnecessary list comprehension

Remove unnecessary comprehension

(C419)


1280-1280: Trailing comma missing

Add trailing comma

(COM812)


1282-1282: Trailing comma missing

Add trailing comma

(COM812)


1288-1292: LGTM: Conditional software environment change check.

The code correctly checks for changes in the software environment only if the rerun trigger is set, maintaining consistency with other checks.

Tools
Ruff

1289-1289: Trailing comma missing

Add trailing comma

(COM812)


1291-1291: Trailing comma missing

Add trailing comma

(COM812)

@johanneskoester johanneskoester merged commit e502312 into main Aug 14, 2024
@johanneskoester johanneskoester deleted the fix/notify-of-missing-provenance-info branch August 14, 2024 06:16
johanneskoester pushed a commit that referenced this pull request Aug 14, 2024
🤖 I have created a release *beep* *boop*
---


##
[8.18.0](v8.17.0...v8.18.0)
(2024-08-14)


### Features

* show info on missing metadata
([#3014](#3014))
([e502312](e502312))


### Performance Improvements

* cache mtime of scripts and notebooks
([#2965](#2965))
([405a056](405a056))

---
This PR was generated with [Release
Please](https://github.com/googleapis/release-please). See
[documentation](https://github.com/googleapis/release-please#release-please).

<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **New Features**
- Introduced detailed reporting for missing metadata during workflow
execution, enhancing visibility and debugging.
- **Performance Improvements**
- Implemented caching for script and notebook modification times,
improving workflow execution efficiency.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Comment on lines +1261 to +1263
if not self.workflow.persistence.has_metadata(job):
reason.no_metadata = True
else:
Copy link
Copy Markdown
Contributor

@laf070810 laf070810 Oct 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change causes a bug: when using --drop-metadata (for filesystem speed), the mtime of script/notebook no longer triggers the workflow. I think we should handle this case, or make no_metadata a choice of RerunTrigger.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Proposed a fix in #3148

johanneskoester added a commit that referenced this pull request Dec 21, 2024
…data (#3148)

<!--Add a description of your PR here-->

See
#3014 (review)

### QC
<!-- Make sure that you can tick the boxes below. -->

* [ ] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [ ] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Improved logic for determining job execution conditions, ensuring
accurate evaluations.
- **Refactor**
- Streamlined the `update_needrun` method for better readability and
maintainability.
- Consolidated checks for job metadata and conditions to eliminate
redundancy.
- **Style**
	- Minor adjustments to comments and formatting for enhanced clarity.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Anfeng Li <anfeng.li@cern.ch>
Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
…data (snakemake#3148)

<!--Add a description of your PR here-->

See
snakemake#3014 (review)

### QC
<!-- Make sure that you can tick the boxes below. -->

* [ ] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [ ] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->

## Summary by CodeRabbit

- **Bug Fixes**
- Improved logic for determining job execution conditions, ensuring
accurate evaluations.
- **Refactor**
- Streamlined the `update_needrun` method for better readability and
maintainability.
- Consolidated checks for job metadata and conditions to eliminate
redundancy.
- **Style**
	- Minor adjustments to comments and formatting for enhanced clarity.

<!-- end of auto-generated comment: release notes by coderabbit.ai -->

Co-authored-by: Anfeng Li <anfeng.li@cern.ch>
Co-authored-by: Johannes Köster <johannes.koester@tu-dortmund.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants