fix: unable to ignore code change for scripts#3706
fix: unable to ignore code change for scripts#3706jasonho1308 wants to merge 1 commit intosnakemake:mainfrom
Conversation
📝 WalkthroughWalkthroughReworks updated_input in Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant S as Scheduler
participant D as DAG.updated_input
participant M as Metadata
participant R as Reason
Note over S,D: New flow (post-change)
S->>D: Evaluate job inputs
alt has_metadata
D->>M: read metadata
alt has_outdated_metadata
D->>R: set outdated_metadata = true
else metadata present and not outdated
alt CODE trigger path
D->>D: evaluate outputs_older_than_script_or_notebook
D->>R: reason.code_changed = age_signal OR persisted_code_changed
end
end
else no_metadata
D->>D: compute code_changed from file/state
D->>R: set reason.code_changed
end
D-->>S: return Reason (flags)
sequenceDiagram
autonumber
participant S as Scheduler
participant D as DAG.updated_input
participant M as Metadata
participant R as Reason
Note over S,D: Prior flow (pre-change)
S->>D: Evaluate job inputs
alt has_metadata
D->>M: read metadata
opt CODE-trigger
D->>R: reason.code_changed |= persisted_code_changed (in-place OR)
end
else no_metadata
D->>D: code_changed computed on alternate path
end
D-->>S: return Reason
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested reviewers
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. 📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/snakemake/dag.py (2)
383-385: Clarify assertion message: it checks registration, not pulling.The assertion verifies that the image URL is present in
self.container_imgs. The message suggests images must be “pulled”, but at this point we only require they have been registered viaupdate_container_imgs(). Tighten the message to avoid misleading hints during debugging.- assert simg_url in self.container_imgs, ( - "bug: must first pull singularity images" - ) + assert simg_url in self.container_imgs, ( + "bug: container image URL not registered; call update_container_imgs() before update_conda_envs()" + )
1450-1461: Consider gating CODE-related reason under rerun triggers even for no-metadata cases.Functionally,
reason.no_metadata = Truealready causes the job to run. Settingreason.code_changedhere unconditionally can surface “code changed” in reasons even whenRerunTrigger.CODEis not selected. If you want reasons to strictly reflect selected triggers (and to avoid confusion in logs), guard this as well.- if not self.workflow.persistence.has_metadata(job): - reason.no_metadata = True - # The list comprehension is needed below in order to - # collect all the async generator items before - # applying any(). - reason.code_changed = any( - [ - f - async for f in job.outputs_older_than_script_or_notebook() - ] - ) + if not self.workflow.persistence.has_metadata(job): + reason.no_metadata = True + # Reflect CODE-related reasons only if enabled. + if RerunTrigger.CODE in self.workflow.rerun_triggers: + # The list comprehension is needed to collect all async items before any(). + reason.code_changed = any( + [ + f + async for f in job.outputs_older_than_script_or_notebook() + ] + )Would you like me to add a test asserting that, with
--rerun-triggersnot including CODE and no metadata present, reasons do not include “code changed”?
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/snakemake/dag.py(8 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/dag.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
📚 Learning: 2024-10-14T09:42:11.571Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.
Applied to files:
src/snakemake/dag.py
📚 Learning: 2024-11-12T12:08:20.342Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
Applied to files:
src/snakemake/dag.py
🧬 Code graph analysis (1)
src/snakemake/dag.py (2)
src/snakemake/persistence.py (1)
code_changed(462-464)src/snakemake/jobs.py (1)
outputs_older_than_script_or_notebook(402-417)
🔇 Additional comments (6)
src/snakemake/dag.py (6)
1048-1052: Good safeguard: avoid pruning target files from local storage.Adding
and f not in self.targetfilesensures we don't remove local copies of storage-backed outputs that are explicit targets. Prevents surprising “missing target” UX. LGTM.
1473-1481: CODE changes are now properly gated and comprehensive.This restricts script/notebook-mtime-driven changes to when
RerunTrigger.CODEis active and combines them with persisted code-change detection. Matches the PR goal and covers both sources of code-change signals. LGTM.
2486-2489: Minor traversal tweak reads well.Using the explicit membership test on
visitedis standard and clear. LGTM.
2503-2506: Consistent, idiomatic membership test in level-order BFS.Matches the style used elsewhere and keeps the traversal correct. LGTM.
2702-2711: Neater mermaid block construction.Returning the dedented header and formatting the three sections improves readability without changing behavior. LGTM.
3016-3018: Minor log message cleanup is fine.Single-line info message reads better and keeps the intent. LGTM.
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
src/snakemake/dag.py (2)
1452-1460: Gate code-changed reason behind RerunTrigger.CODE in the no-metadata pathRight now, reason.code_changed is computed even if CODE is not in rerun_triggers when metadata is missing. The job will already be needrun due to reason.no_metadata, but surfacing code_changed may confuse users who explicitly disabled CODE-triggered reruns.
Recommend setting reason.code_changed here only if RerunTrigger.CODE is enabled. This keeps the reason reporting consistent with rerun_triggers while preserving the behavior that missing metadata triggers execution.
Apply this focused change:
- # The list comprehension is needed below in order to - # collect all the async generator items before - # applying any(). - reason.code_changed = any( - [ - f - async for f in job.outputs_older_than_script_or_notebook() - ] - ) + if RerunTrigger.CODE in self.workflow.rerun_triggers: + # The list comprehension is needed below in order to + # collect all the async generator items before + # applying any(). + reason.code_changed = any( + [ + f + async for f in job.outputs_older_than_script_or_notebook() + ] + )Verification ideas:
- Ensure the new/updated test still passes when CODE is enabled.
- Add/adjust a test asserting that with CODE disabled and missing metadata, reason.get_names() does not include "code_changed" (but still includes "no_metadata"). Want me to draft that test?
1473-1480: Combine age-based and persisted code-change signals — consider DRY + early-exit helperThe new assignment (not |=) correctly reflects the current evaluation in this pass and respects rerun_triggers. However, the async-generator-to-list pattern is duplicated here and above, and it forces evaluation of the entire generator.
Optional: factor an inner helper that short-circuits on first hit. This reduces duplication and avoids building a list.
Example (helper to add near the top of update_needrun(job) inner function):
# place inside async def update_needrun(job): near other local helpers async def _has_outputs_older_than_script_or_notebook(job: Job) -> bool: async for _ in job.outputs_older_than_script_or_notebook(): return True return FalseThen this line can become:
- reason.code_changed = any( - [ - f - async for f in job.outputs_older_than_script_or_notebook() - ] - ) or any( + reason.code_changed = await _has_outputs_older_than_script_or_notebook(job) or any( self.workflow.persistence.code_changed(job) )
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/snakemake/dag.py(6 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit configuration file
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/dag.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
📚 Learning: 2024-11-12T12:08:20.342Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In `snakemake/dag.py`, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
Applied to files:
src/snakemake/dag.py
📚 Learning: 2024-10-14T09:42:11.571Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In `snakemake/dag.py`, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.
Applied to files:
src/snakemake/dag.py
🧬 Code graph analysis (1)
src/snakemake/dag.py (2)
src/snakemake/persistence.py (1)
code_changed(462-464)src/snakemake/jobs.py (1)
outputs_older_than_script_or_notebook(402-417)
🔇 Additional comments (4)
src/snakemake/dag.py (4)
1050-1050: Idiomatic membership test — semantics unchanged. LGTMSwitching to "f not in self.targetfiles" reads better and is functionally equivalent.
2486-2489: Visited-membership check now idiomatic. LGTMSwitch to "if job_ not in visited" improves readability, no behavioral change.
2503-2506: Another membership-test cleanup. LGTMSame readability win here; behavior remains the same.
3020-3021: Single-line archive log message — clear and concise. LGTMNo functional impact; message reads better in logs.
|
#3707 got a better approach |
This PR is inspired by #3706, resolves issues #3456 and #3474, and addresses the concern raised in #3148 (comment) In brief, commit e502312 made changes so that the modification time (mtime) of script and notebook files no longer triggers a rerun of the workflow, later, commit e8a0b83 reversed this, making mtime changes always trigger reruns -- even when RerunTrigger.CODE is not specified in `--rerun-triggers`. This created an undesirable situation: legitimate and common scenarios (e.g., copying scripts or pulling them from a Git repository) could unnecessarily trigger reruns — as reported in #3474. To make the behavior more intuitive, the `outputs_older_than_script_or_notebook` condition should take precedence over metadata checks. However, this condition should still be governed by the presence of `RerunTrigger.CODE`. ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ ] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Code-change reruns now only run when the CODE rerun trigger is enabled; code-change detection correctly accumulates signals from multiple sources to avoid missed or spurious reruns. * **Tests** * Added an MTIME-triggered rerun test validating that outputs retain timestamps when only input mtimes change. * **Chores** * Fixed a test import path and broadened connectivity error handling to include timeouts. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
This PR is inspired by snakemake#3706, resolves issues snakemake#3456 and snakemake#3474, and addresses the concern raised in snakemake#3148 (comment) In brief, commit e502312 made changes so that the modification time (mtime) of script and notebook files no longer triggers a rerun of the workflow, later, commit e8a0b83 reversed this, making mtime changes always trigger reruns -- even when RerunTrigger.CODE is not specified in `--rerun-triggers`. This created an undesirable situation: legitimate and common scenarios (e.g., copying scripts or pulling them from a Git repository) could unnecessarily trigger reruns — as reported in snakemake#3474. To make the behavior more intuitive, the `outputs_older_than_script_or_notebook` condition should take precedence over metadata checks. However, this condition should still be governed by the presence of `RerunTrigger.CODE`. ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [ ] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **Bug Fixes** * Code-change reruns now only run when the CODE rerun trigger is enabled; code-change detection correctly accumulates signals from multiple sources to avoid missed or spurious reruns. * **Tests** * Added an MTIME-triggered rerun test validating that outputs retain timestamps when only input mtimes change. * **Chores** * Fixed a test import path and broadened connectivity error handling to include timeouts. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
Fix the issue mentioned in #3456, which code change cannot be ignored when using script. The issue in the original code is the following snippets will return
trueeven ifrerun_triggersdo not includeRerunTrigger.CODEMy approch should be able to handle #3148 as well.
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
Bug Fixes
Style