fix: unable to ignore code change for scripts by jasonho1308 · Pull Request #3706 · snakemake/snakemake

jasonho1308 · 2025-08-25T13:16:08Z

Fix the issue mentioned in #3456, which code change cannot be ignored when using script. The issue in the original code is the following snippets will return true even if rerun_triggers do not include RerunTrigger.CODE

reason.code_changed = any(
    [
            f
            async for f in job.outputs_older_than_script_or_notebook()
    ]
)
if not self.workflow.persistence.has_metadata(job):
...

My approch should be able to handle #3148 as well.

QC

The PR contains a test case for the changes or the changes are already covered by an existing test case.
The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

Bug Fixes
- Improved detection of when tasks need rerunning by refining metadata, code-change and output-age checks, reducing unnecessary or missed reruns in edge cases.
Style
- Clarified internal comments and cleaned up related log/message formatting for clearer, more maintainable behavior.

coderabbitai · 2025-08-25T13:16:17Z

📝 Walkthrough

Walkthrough

Reworks updated_input in src/snakemake/dag.py: separates has_metadata / has_outdated_metadata branches, relocates and changes computation/assignment of reason.code_changed, and adjusts branching for rerun triggers (PARAMS, INPUT, CODE, SOFTWARE_ENV). No public API signature changes.

Changes

Cohort / File(s)	Summary
Rerun trigger & updated_input `src/snakemake/dag.py`	Reorganized `updated_input` control flow: added explicit `has_outdated_metadata` branch; moved computation of `reason.code_changed` into the no-metadata path and added a metadata/age-based CODE path; replaced in-place OR (`
Metadata flags & branching `src/snakemake/dag.py`	Added `has_outdated_metadata(job)` handling to set `reason.outdated_metadata = True`; adjusted ordering of `has_metadata` / `no_metadata` branches and related flag assignments.
Minor refactors & clarity edits `src/snakemake/dag.py`	Small idiomatic membership-test rewrites, minor formatting/logging string consolidations, and slight conditional clarifications (e.g., target file checks); no behavioral changes beyond `updated_input` logic.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant S as Scheduler
  participant D as DAG.updated_input
  participant M as Metadata
  participant R as Reason

  Note over S,D: New flow (post-change)
  S->>D: Evaluate job inputs
  alt has_metadata
    D->>M: read metadata
    alt has_outdated_metadata
      D->>R: set outdated_metadata = true
    else metadata present and not outdated
      alt CODE trigger path
        D->>D: evaluate outputs_older_than_script_or_notebook
        D->>R: reason.code_changed = age_signal OR persisted_code_changed
      end
    end
  else no_metadata
    D->>D: compute code_changed from file/state
    D->>R: set reason.code_changed
  end
  D-->>S: return Reason (flags)

sequenceDiagram
  autonumber
  participant S as Scheduler
  participant D as DAG.updated_input
  participant M as Metadata
  participant R as Reason

  Note over S,D: Prior flow (pre-change)
  S->>D: Evaluate job inputs
  alt has_metadata
    D->>M: read metadata
    opt CODE-trigger
      D->>R: reason.code_changed |= persisted_code_changed (in-place OR)
    end
  else no_metadata
    D->>D: code_changed computed on alternate path
  end
  D-->>S: return Reason

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: mtime of script or notebook not triggering workflow without metadata #3148: Modifies src/snakemake/dag.py rerun and metadata/mtime/code-change logic; directly related to the same updated_input/control-flow concerns.

Suggested reviewers

johanneskoester

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7e29695 and 8808ff4.

📒 Files selected for processing (1)

src/snakemake/dag.py (2 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

src/snakemake/dag.py

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/snakemake/dag.py (2)

383-385: Clarify assertion message: it checks registration, not pulling.

The assertion verifies that the image URL is present in self.container_imgs. The message suggests images must be “pulled”, but at this point we only require they have been registered via update_container_imgs(). Tighten the message to avoid misleading hints during debugging.

-                assert simg_url in self.container_imgs, (
-                    "bug: must first pull singularity images"
-                )
+                assert simg_url in self.container_imgs, (
+                    "bug: container image URL not registered; call update_container_imgs() before update_conda_envs()"
+                )

1450-1461: Consider gating CODE-related reason under rerun triggers even for no-metadata cases.

Functionally, reason.no_metadata = True already causes the job to run. Setting reason.code_changed here unconditionally can surface “code changed” in reasons even when RerunTrigger.CODE is not selected. If you want reasons to strictly reflect selected triggers (and to avoid confusion in logs), guard this as well.

-                            if not self.workflow.persistence.has_metadata(job):
-                                reason.no_metadata = True
-                                # The list comprehension is needed below in order to
-                                # collect all the async generator items before
-                                # applying any().
-                                reason.code_changed = any(
-                                    [
-                                        f
-                                        async for f in job.outputs_older_than_script_or_notebook()
-                                    ]
-                                )
+                            if not self.workflow.persistence.has_metadata(job):
+                                reason.no_metadata = True
+                                # Reflect CODE-related reasons only if enabled.
+                                if RerunTrigger.CODE in self.workflow.rerun_triggers:
+                                    # The list comprehension is needed to collect all async items before any().
+                                    reason.code_changed = any(
+                                        [
+                                            f
+                                            async for f in job.outputs_older_than_script_or_notebook()
+                                        ]
+                                    )

Would you like me to add a test asserting that, with --rerun-triggers not including CODE and no metadata present, reasons do not include “code changed”?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 26fcd38 and d13db0a.

📒 Files selected for processing (1)

src/snakemake/dag.py (8 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

src/snakemake/dag.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.

Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

📚 Learning: 2024-10-14T09:42:11.571Z

Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.

Applied to files:

src/snakemake/dag.py

📚 Learning: 2024-11-12T12:08:20.342Z

Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

Applied to files:

src/snakemake/dag.py

🧬 Code graph analysis (1)

src/snakemake/dag.py (2)

src/snakemake/persistence.py (1)

code_changed (462-464)

src/snakemake/jobs.py (1)

outputs_older_than_script_or_notebook (402-417)

🔇 Additional comments (6)

src/snakemake/dag.py (6)

1048-1052: Good safeguard: avoid pruning target files from local storage.

Adding and f not in self.targetfiles ensures we don't remove local copies of storage-backed outputs that are explicit targets. Prevents surprising “missing target” UX. LGTM.

1473-1481: CODE changes are now properly gated and comprehensive.

This restricts script/notebook-mtime-driven changes to when RerunTrigger.CODE is active and combines them with persisted code-change detection. Matches the PR goal and covers both sources of code-change signals. LGTM.

2486-2489: Minor traversal tweak reads well.

Using the explicit membership test on visited is standard and clear. LGTM.

2503-2506: Consistent, idiomatic membership test in level-order BFS.

Matches the style used elsewhere and keeps the traversal correct. LGTM.

2702-2711: Neater mermaid block construction.

Returning the dedented header and formatting the three sections improves readability without changing behavior. LGTM.

3016-3018: Minor log message cleanup is fine.

Single-line info message reads better and keeps the intent. LGTM.

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (2)

src/snakemake/dag.py (2)

1452-1460: Gate code-changed reason behind RerunTrigger.CODE in the no-metadata path

Right now, reason.code_changed is computed even if CODE is not in rerun_triggers when metadata is missing. The job will already be needrun due to reason.no_metadata, but surfacing code_changed may confuse users who explicitly disabled CODE-triggered reruns.

Recommend setting reason.code_changed here only if RerunTrigger.CODE is enabled. This keeps the reason reporting consistent with rerun_triggers while preserving the behavior that missing metadata triggers execution.

Apply this focused change:

-                                # The list comprehension is needed below in order to
-                                # collect all the async generator items before
-                                # applying any().
-                                reason.code_changed = any(
-                                    [
-                                        f
-                                        async for f in job.outputs_older_than_script_or_notebook()
-                                    ]
-                                )
+                                if RerunTrigger.CODE in self.workflow.rerun_triggers:
+                                    # The list comprehension is needed below in order to
+                                    # collect all the async generator items before
+                                    # applying any().
+                                    reason.code_changed = any(
+                                        [
+                                            f
+                                            async for f in job.outputs_older_than_script_or_notebook()
+                                        ]
+                                    )

Verification ideas:

Ensure the new/updated test still passes when CODE is enabled.
Add/adjust a test asserting that with CODE disabled and missing metadata, reason.get_names() does not include "code_changed" (but still includes "no_metadata"). Want me to draft that test?

1473-1480: Combine age-based and persisted code-change signals — consider DRY + early-exit helper

The new assignment (not |=) correctly reflects the current evaluation in this pass and respects rerun_triggers. However, the async-generator-to-list pattern is duplicated here and above, and it forces evaluation of the entire generator.

Optional: factor an inner helper that short-circuits on first hit. This reduces duplication and avoids building a list.

Example (helper to add near the top of update_needrun(job) inner function):

# place inside async def update_needrun(job): near other local helpers
async def _has_outputs_older_than_script_or_notebook(job: Job) -> bool:
    async for _ in job.outputs_older_than_script_or_notebook():
        return True
    return False

Then this line can become:

-                                    reason.code_changed = any(
-                                        [
-                                            f
-                                            async for f in job.outputs_older_than_script_or_notebook()
-                                        ]
-                                    ) or any(
+                                    reason.code_changed = await _has_outputs_older_than_script_or_notebook(job) or any(
                                         self.workflow.persistence.code_changed(job)
                                     )

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d13db0a and 7e29695.

📒 Files selected for processing (1)

src/snakemake/dag.py (6 hunks)

🧰 Additional context used

📓 Path-based instructions (1)

**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

src/snakemake/dag.py

🧠 Learnings (3)

📓 Common learnings

Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

📚 Learning: 2024-11-12T12:08:20.342Z

Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

Applied to files:

src/snakemake/dag.py

📚 Learning: 2024-10-14T09:42:11.571Z

Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.

Applied to files:

src/snakemake/dag.py

🧬 Code graph analysis (1)

src/snakemake/dag.py (2)

src/snakemake/persistence.py (1)

code_changed (462-464)

src/snakemake/jobs.py (1)

outputs_older_than_script_or_notebook (402-417)

🔇 Additional comments (4)

src/snakemake/dag.py (4)

1050-1050: Idiomatic membership test — semantics unchanged. LGTM

Switching to "f not in self.targetfiles" reads better and is functionally equivalent.

2486-2489: Visited-membership check now idiomatic. LGTM

Switch to "if job_ not in visited" improves readability, no behavioral change.

2503-2506: Another membership-test cleanup. LGTM

Same readability win here; behavior remains the same.

3020-3021: Single-line archive log message — clear and concise. LGTM

No functional impact; message reads better in logs.

jasonho1308 · 2025-08-26T07:05:41Z

#3707 got a better approach

This PR is inspired by #3706, resolves issues #3456 and #3474, and addresses the concern raised in #3148 (comment) In brief, commit e502312 made changes so that the modification time (mtime) of script and notebook files no longer triggers a rerun of the workflow, later, commit e8a0b83 reversed this, making mtime changes always trigger reruns -- even when RerunTrigger.CODE is not specified in `--rerun-triggers`. This created an undesirable situation: legitimate and common scenarios (e.g., copying scripts or pulling them from a Git repository) could unnecessarily trigger reruns — as reported in #3474. To make the behavior more intuitive, the `outputs_older_than_script_or_notebook` condition should take precedence over metadata checks. However, this condition should still be governed by the presence of `RerunTrigger.CODE`. ### QC  * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ ] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).  ## Summary by CodeRabbit * **Bug Fixes** * Code-change reruns now only run when the CODE rerun trigger is enabled; code-change detection correctly accumulates signals from multiple sources to avoid missed or spurious reruns. * **Tests** * Added an MTIME-triggered rerun test validating that outputs retain timestamps when only input mtimes change. * **Chores** * Fixed a test import path and broadened connectivity error handling to include timeouts.

This PR is inspired by snakemake#3706, resolves issues snakemake#3456 and snakemake#3474, and addresses the concern raised in snakemake#3148 (comment) In brief, commit e502312 made changes so that the modification time (mtime) of script and notebook files no longer triggers a rerun of the workflow, later, commit e8a0b83 reversed this, making mtime changes always trigger reruns -- even when RerunTrigger.CODE is not specified in `--rerun-triggers`. This created an undesirable situation: legitimate and common scenarios (e.g., copying scripts or pulling them from a Git repository) could unnecessarily trigger reruns — as reported in snakemake#3474. To make the behavior more intuitive, the `outputs_older_than_script_or_notebook` condition should take precedence over metadata checks. However, this condition should still be governed by the presence of `RerunTrigger.CODE`. ### QC  * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ ] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).  ## Summary by CodeRabbit * **Bug Fixes** * Code-change reruns now only run when the CODE rerun trigger is enabled; code-change detection correctly accumulates signals from multiple sources to avoid missed or spurious reruns. * **Tests** * Added an MTIME-triggered rerun test validating that outputs retain timestamps when only input mtimes change. * **Chores** * Fixed a test import path and broadened connectivity error handling to include timeouts.

jasonho1308 requested a review from johanneskoester as a code owner August 25, 2025 13:16

jasonho1308 mentioned this pull request Aug 25, 2025

Code rerun trigger cannot be ignored when using a script file. #3456

Open

Hocnonsense changed the title ~~Fix unable to ignore code change for scripts~~ fix: unable to ignore code change for scripts Aug 25, 2025

coderabbitai bot reviewed Aug 25, 2025

View reviewed changes

fix: unable to ignore code change for scripts

8808ff4

jasonho1308 force-pushed the main branch from 7e29695 to 8808ff4 Compare August 25, 2025 13:25

coderabbitai bot reviewed Aug 25, 2025

View reviewed changes

Hocnonsense mentioned this pull request Aug 25, 2025

fix: only trigger script with CODE label #3707

Merged

2 tasks

jasonho1308 closed this Aug 26, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: unable to ignore code change for scripts#3706

fix: unable to ignore code change for scripts#3706
jasonho1308 wants to merge 1 commit intosnakemake:mainfrom
jasonho1308:main

jasonho1308 commented Aug 25, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Aug 25, 2025 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot left a comment

Uh oh!

jasonho1308 commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jasonho1308 commented Aug 25, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

jasonho1308 commented Aug 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

jasonho1308 commented Aug 25, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 25, 2025 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)