Skip to content

fix: unable to ignore code change for scripts#3706

Closed
jasonho1308 wants to merge 1 commit intosnakemake:mainfrom
jasonho1308:main
Closed

fix: unable to ignore code change for scripts#3706
jasonho1308 wants to merge 1 commit intosnakemake:mainfrom
jasonho1308:main

Conversation

@jasonho1308
Copy link
Copy Markdown

@jasonho1308 jasonho1308 commented Aug 25, 2025

Fix the issue mentioned in #3456, which code change cannot be ignored when using script. The issue in the original code is the following snippets will return true even if rerun_triggers do not include RerunTrigger.CODE

reason.code_changed = any(
    [
            f
            async for f in job.outputs_older_than_script_or_notebook()
    ]
)
if not self.workflow.persistence.has_metadata(job):
...

My approch should be able to handle #3148 as well.

QC

  • The PR contains a test case for the changes or the changes are already covered by an existing test case.
  • The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

  • Bug Fixes

    • Improved detection of when tasks need rerunning by refining metadata, code-change and output-age checks, reducing unnecessary or missed reruns in edge cases.
  • Style

    • Clarified internal comments and cleaned up related log/message formatting for clearer, more maintainable behavior.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Aug 25, 2025

📝 Walkthrough

Walkthrough

Reworks updated_input in src/snakemake/dag.py: separates has_metadata / has_outdated_metadata branches, relocates and changes computation/assignment of reason.code_changed, and adjusts branching for rerun triggers (PARAMS, INPUT, CODE, SOFTWARE_ENV). No public API signature changes.

Changes

Cohort / File(s) Summary
Rerun trigger & updated_input
src/snakemake/dag.py
Reorganized updated_input control flow: added explicit has_outdated_metadata branch; moved computation of reason.code_changed into the no-metadata path and added a metadata/age-based CODE path; replaced in-place OR (`
Metadata flags & branching
src/snakemake/dag.py
Added has_outdated_metadata(job) handling to set reason.outdated_metadata = True; adjusted ordering of has_metadata / no_metadata branches and related flag assignments.
Minor refactors & clarity edits
src/snakemake/dag.py
Small idiomatic membership-test rewrites, minor formatting/logging string consolidations, and slight conditional clarifications (e.g., target file checks); no behavioral changes beyond updated_input logic.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant S as Scheduler
  participant D as DAG.updated_input
  participant M as Metadata
  participant R as Reason

  Note over S,D: New flow (post-change)
  S->>D: Evaluate job inputs
  alt has_metadata
    D->>M: read metadata
    alt has_outdated_metadata
      D->>R: set outdated_metadata = true
    else metadata present and not outdated
      alt CODE trigger path
        D->>D: evaluate outputs_older_than_script_or_notebook
        D->>R: reason.code_changed = age_signal OR persisted_code_changed
      end
    end
  else no_metadata
    D->>D: compute code_changed from file/state
    D->>R: set reason.code_changed
  end
  D-->>S: return Reason (flags)
Loading
sequenceDiagram
  autonumber
  participant S as Scheduler
  participant D as DAG.updated_input
  participant M as Metadata
  participant R as Reason

  Note over S,D: Prior flow (pre-change)
  S->>D: Evaluate job inputs
  alt has_metadata
    D->>M: read metadata
    opt CODE-trigger
      D->>R: reason.code_changed |= persisted_code_changed (in-place OR)
    end
  else no_metadata
    D->>D: code_changed computed on alternate path
  end
  D-->>S: return Reason
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested reviewers

  • johanneskoester

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 7e29695 and 8808ff4.

📒 Files selected for processing (1)
  • src/snakemake/dag.py (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/snakemake/dag.py
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@Hocnonsense Hocnonsense changed the title Fix unable to ignore code change for scripts fix: unable to ignore code change for scripts Aug 25, 2025
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/snakemake/dag.py (2)

383-385: Clarify assertion message: it checks registration, not pulling.

The assertion verifies that the image URL is present in self.container_imgs. The message suggests images must be “pulled”, but at this point we only require they have been registered via update_container_imgs(). Tighten the message to avoid misleading hints during debugging.

-                assert simg_url in self.container_imgs, (
-                    "bug: must first pull singularity images"
-                )
+                assert simg_url in self.container_imgs, (
+                    "bug: container image URL not registered; call update_container_imgs() before update_conda_envs()"
+                )

1450-1461: Consider gating CODE-related reason under rerun triggers even for no-metadata cases.

Functionally, reason.no_metadata = True already causes the job to run. Setting reason.code_changed here unconditionally can surface “code changed” in reasons even when RerunTrigger.CODE is not selected. If you want reasons to strictly reflect selected triggers (and to avoid confusion in logs), guard this as well.

-                            if not self.workflow.persistence.has_metadata(job):
-                                reason.no_metadata = True
-                                # The list comprehension is needed below in order to
-                                # collect all the async generator items before
-                                # applying any().
-                                reason.code_changed = any(
-                                    [
-                                        f
-                                        async for f in job.outputs_older_than_script_or_notebook()
-                                    ]
-                                )
+                            if not self.workflow.persistence.has_metadata(job):
+                                reason.no_metadata = True
+                                # Reflect CODE-related reasons only if enabled.
+                                if RerunTrigger.CODE in self.workflow.rerun_triggers:
+                                    # The list comprehension is needed to collect all async items before any().
+                                    reason.code_changed = any(
+                                        [
+                                            f
+                                            async for f in job.outputs_older_than_script_or_notebook()
+                                        ]
+                                    )

Would you like me to add a test asserting that, with --rerun-triggers not including CODE and no metadata present, reasons do not include “code changed”?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 26fcd38 and d13db0a.

📒 Files selected for processing (1)
  • src/snakemake/dag.py (8 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/dag.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
📚 Learning: 2024-10-14T09:42:11.571Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.

Applied to files:

  • src/snakemake/dag.py
📚 Learning: 2024-11-12T12:08:20.342Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

Applied to files:

  • src/snakemake/dag.py
🧬 Code graph analysis (1)
src/snakemake/dag.py (2)
src/snakemake/persistence.py (1)
  • code_changed (462-464)
src/snakemake/jobs.py (1)
  • outputs_older_than_script_or_notebook (402-417)
🔇 Additional comments (6)
src/snakemake/dag.py (6)

1048-1052: Good safeguard: avoid pruning target files from local storage.

Adding and f not in self.targetfiles ensures we don't remove local copies of storage-backed outputs that are explicit targets. Prevents surprising “missing target” UX. LGTM.


1473-1481: CODE changes are now properly gated and comprehensive.

This restricts script/notebook-mtime-driven changes to when RerunTrigger.CODE is active and combines them with persisted code-change detection. Matches the PR goal and covers both sources of code-change signals. LGTM.


2486-2489: Minor traversal tweak reads well.

Using the explicit membership test on visited is standard and clear. LGTM.


2503-2506: Consistent, idiomatic membership test in level-order BFS.

Matches the style used elsewhere and keeps the traversal correct. LGTM.


2702-2711: Neater mermaid block construction.

Returning the dedented header and formatting the three sections improves readability without changing behavior. LGTM.


3016-3018: Minor log message cleanup is fine.

Single-line info message reads better and keeps the intent. LGTM.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
src/snakemake/dag.py (2)

1452-1460: Gate code-changed reason behind RerunTrigger.CODE in the no-metadata path

Right now, reason.code_changed is computed even if CODE is not in rerun_triggers when metadata is missing. The job will already be needrun due to reason.no_metadata, but surfacing code_changed may confuse users who explicitly disabled CODE-triggered reruns.

Recommend setting reason.code_changed here only if RerunTrigger.CODE is enabled. This keeps the reason reporting consistent with rerun_triggers while preserving the behavior that missing metadata triggers execution.

Apply this focused change:

-                                # The list comprehension is needed below in order to
-                                # collect all the async generator items before
-                                # applying any().
-                                reason.code_changed = any(
-                                    [
-                                        f
-                                        async for f in job.outputs_older_than_script_or_notebook()
-                                    ]
-                                )
+                                if RerunTrigger.CODE in self.workflow.rerun_triggers:
+                                    # The list comprehension is needed below in order to
+                                    # collect all the async generator items before
+                                    # applying any().
+                                    reason.code_changed = any(
+                                        [
+                                            f
+                                            async for f in job.outputs_older_than_script_or_notebook()
+                                        ]
+                                    )

Verification ideas:

  • Ensure the new/updated test still passes when CODE is enabled.
  • Add/adjust a test asserting that with CODE disabled and missing metadata, reason.get_names() does not include "code_changed" (but still includes "no_metadata"). Want me to draft that test?

1473-1480: Combine age-based and persisted code-change signals — consider DRY + early-exit helper

The new assignment (not |=) correctly reflects the current evaluation in this pass and respects rerun_triggers. However, the async-generator-to-list pattern is duplicated here and above, and it forces evaluation of the entire generator.

Optional: factor an inner helper that short-circuits on first hit. This reduces duplication and avoids building a list.

Example (helper to add near the top of update_needrun(job) inner function):

# place inside async def update_needrun(job): near other local helpers
async def _has_outputs_older_than_script_or_notebook(job: Job) -> bool:
    async for _ in job.outputs_older_than_script_or_notebook():
        return True
    return False

Then this line can become:

-                                    reason.code_changed = any(
-                                        [
-                                            f
-                                            async for f in job.outputs_older_than_script_or_notebook()
-                                        ]
-                                    ) or any(
+                                    reason.code_changed = await _has_outputs_older_than_script_or_notebook(job) or any(
                                         self.workflow.persistence.code_changed(job)
                                     )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between d13db0a and 7e29695.

📒 Files selected for processing (1)
  • src/snakemake/dag.py (6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py

⚙️ CodeRabbit configuration file

**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

Files:

  • src/snakemake/dag.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
📚 Learning: 2024-11-12T12:08:20.342Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.

Applied to files:

  • src/snakemake/dag.py
📚 Learning: 2024-10-14T09:42:11.571Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.

Applied to files:

  • src/snakemake/dag.py
🧬 Code graph analysis (1)
src/snakemake/dag.py (2)
src/snakemake/persistence.py (1)
  • code_changed (462-464)
src/snakemake/jobs.py (1)
  • outputs_older_than_script_or_notebook (402-417)
🔇 Additional comments (4)
src/snakemake/dag.py (4)

1050-1050: Idiomatic membership test — semantics unchanged. LGTM

Switching to "f not in self.targetfiles" reads better and is functionally equivalent.


2486-2489: Visited-membership check now idiomatic. LGTM

Switch to "if job_ not in visited" improves readability, no behavioral change.


2503-2506: Another membership-test cleanup. LGTM

Same readability win here; behavior remains the same.


3020-3021: Single-line archive log message — clear and concise. LGTM

No functional impact; message reads better in logs.

@jasonho1308
Copy link
Copy Markdown
Author

#3707 got a better approach

johanneskoester pushed a commit that referenced this pull request Sep 4, 2025
This PR is inspired by #3706, resolves issues #3456 and #3474, and
addresses the concern raised in
#3148 (comment)

In brief, commit e502312 made changes so that the modification time
(mtime) of script and notebook files no longer triggers a rerun of the
workflow, later, commit e8a0b83 reversed this, making mtime changes
always trigger reruns -- even when RerunTrigger.CODE is not specified in
`--rerun-triggers`.
This created an undesirable situation: legitimate and common scenarios
(e.g., copying scripts or pulling them from a Git repository) could
unnecessarily trigger reruns — as reported in #3474.

To make the behavior more intuitive, the
`outputs_older_than_script_or_notebook` condition should take precedence
over metadata checks. However, this condition should still be governed
by the presence of `RerunTrigger.CODE`.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [ ] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Code-change reruns now only run when the CODE rerun trigger is
enabled; code-change detection correctly accumulates signals from
multiple sources to avoid missed or spurious reruns.

* **Tests**
* Added an MTIME-triggered rerun test validating that outputs retain
timestamps when only input mtimes change.

* **Chores**
* Fixed a test import path and broadened connectivity error handling to
include timeouts.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
kjohnsen pushed a commit to kjohnsen/snakemake that referenced this pull request Dec 15, 2025
This PR is inspired by snakemake#3706, resolves issues snakemake#3456 and snakemake#3474, and
addresses the concern raised in
snakemake#3148 (comment)

In brief, commit e502312 made changes so that the modification time
(mtime) of script and notebook files no longer triggers a rerun of the
workflow, later, commit e8a0b83 reversed this, making mtime changes
always trigger reruns -- even when RerunTrigger.CODE is not specified in
`--rerun-triggers`.
This created an undesirable situation: legitimate and common scenarios
(e.g., copying scripts or pulling them from a Git repository) could
unnecessarily trigger reruns — as reported in snakemake#3474.

To make the behavior more intuitive, the
`outputs_older_than_script_or_notebook` condition should take precedence
over metadata checks. However, this condition should still be governed
by the presence of `RerunTrigger.CODE`.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] The PR contains a test case for the changes or the changes are
already covered by an existing test case.
* [ ] The documentation (`docs/`) is updated to reflect the changes or
this is not necessary (e.g. if the change does neither modify the
language nor the behavior or functionalities of Snakemake).


<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit

* **Bug Fixes**
* Code-change reruns now only run when the CODE rerun trigger is
enabled; code-change detection correctly accumulates signals from
multiple sources to avoid missed or spurious reruns.

* **Tests**
* Added an MTIME-triggered rerun test validating that outputs retain
timestamps when only input mtimes change.

* **Chores**
* Fixed a test import path and broadened connectivity error handling to
include timeouts.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant