feat: lambda functions for priority in rules#3253
feat: lambda functions for priority in rules#3253johanneskoester merged 11 commits intosnakemake:mainfrom
Conversation
|
Feedback: Very much looking forward to the new version! I want to use this feature to selectively fast-track the analysis of samples from tumor patients who are in a particularly bad shape and may miss their treatment option (=become too weak to be treated and succumb to disease) if the analysis and hence recommendation of a therapy takes too long. I have a dictionary with all sample names and their priorities and wanted to dynamically retrieve the priority for each given sample from the dictionary using lambda wildcards. Great this is already underway! |
5dd8ac3 to
5475036
Compare
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughRule priorities now accept numeric values or callables with signature callable(wildcards). Workflow validation allows callables; the DAG invokes callables per job to compute per-wildcard priorities. Documentation, tests, and a profile config were added to exercise this behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant Scheduler
participant DAG
participant Rule
participant Job
Scheduler->>DAG: update_priority(needrun_job)
DAG->>Job: inspect job.rule.priority
alt priority is callable
DAG->>Rule: call rule.priority(job.wildcards)
Rule-->>DAG: return computed priority (int/float)
DAG-->>Scheduler: set job priority to computed value
else
DAG-->>Scheduler: set job priority to numeric value
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (7)
snakemake/dag.py (1)
1499-1503: Consider future extensibility for priority functions.The current implementation supports using wildcards in priority functions. As mentioned in the PR objectives, there is interest in utilizing other inputs (e.g., input resources). Consider designing the interface to be extensible for supporting additional inputs in the future.
Example future enhancement:
# Instead of just wildcards, consider passing a context object self._priority[job] = ( job.rule.priority(PriorityContext( wildcards=job.wildcards, resources=job.resources, # ... other future inputs )) if callable(job.rule.priority) else job.rule.priority )docs/snakefiles/rules.rst (1)
Line range hint
1058-1069: Documentation looks good but could be enhanced with examples.The documentation clearly explains the priority feature and its impact on scheduling. However, it would be more helpful to include:
- An example of using a callable priority function
- A practical example demonstrating the
--prioritizeflag usageHere's a suggested addition to the documentation:
rule: input: ... output: ... priority: 50 shell: ... +# Example with callable priority +def get_priority(wildcards): + return 100 if wildcards.sample.startswith("urgent_") else 0 + +rule: + input: "{sample}.in" + output: "{sample}.out" + priority: get_priority + shell: ... + +# Example of using --prioritize: +# snakemake --prioritize urgent_sample.out other_sample.out -ntests/test_priority/profiles/default/config.yaml (1)
1-1: Add newline at end of file.While the configuration is valid, it's recommended to add a newline at the end of YAML files to follow best practices and avoid potential issues with some tools.
cores: 1 +🧰 Tools
🪛 YAMLlint (1.35.1)
[error] 1-1: no new line character at the end of file
(new-line-at-end-of-file)
tests/test_priority/Snakefile (4)
1-4: Consider removing or improving the debug loop.The loop printing character indices appears to be debug code and may interfere with test output parsing. Additionally,
word.index()only returns the first occurrence of a character, which could mask issues with repeated characters.Consider either:
- Removing the debug loop, or
- Capturing the output for test verification:
-for i in word: - print(word.index(i) + 1) +expected_priorities = {} +for i, char in enumerate(word, 1): + expected_priorities[char] = i
9-12: Consider enhancing the priority function's robustness.While the lambda function demonstrates the basic functionality, there are a few potential improvements:
- Add input validation:
- priority: lambda wc: word.index(wc.i) + 1 if wc.i in word else 0 + priority: lambda wc: (word.index(wc.i) + 1 if wc.i in word else 0) if len(wc.i) == 1 else 0
- Consider using a more atomic approach for order.out to prevent potential race conditions:
- shell: 'echo {wildcards.i} > {output} && echo {wildcards.i} >> order.out' + shell: ''' + echo {wildcards.i} > {output} + echo "{wildcards.i}" | flock order.out -c 'cat >> order.out' + '''
14-17: Consider consistent atomic writes across rules.For consistency with the suggested improvements to the
echorule, consider using the same atomic write approach here.- shell: 'echo a > a.out && echo a >> order.out' + shell: ''' + echo a > a.out + echo "a" | flock order.out -c 'cat >> order.out' + '''
1-17: Consider adding test cases for medical priority scenarios.Based on the PR objectives and JohnAchenbach's use case, consider adding test cases that demonstrate priority assignment using a dictionary of sample priorities:
# Example additional test case SAMPLE_PRIORITIES = { 'critical_patient_1': 100, 'stable_patient_1': 50, 'routine_sample_1': 10 } rule process_sample: output: '{sample}.processed' priority: lambda wc: SAMPLE_PRIORITIES.get(wc.sample, 0) shell: 'echo "Processing {wildcards.sample}" > {output}'This would help validate the feature for real-world medical scenarios where sample processing order is critical.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (7)
tests/test_priority/expected-results/a.outis excluded by!**/*.outtests/test_priority/expected-results/e.outis excluded by!**/*.outtests/test_priority/expected-results/k.outis excluded by!**/*.outtests/test_priority/expected-results/l.outis excluded by!**/*.outtests/test_priority/expected-results/n.outis excluded by!**/*.outtests/test_priority/expected-results/order.outis excluded by!**/*.outtests/test_priority/expected-results/s.outis excluded by!**/*.out
📒 Files selected for processing (6)
docs/snakefiles/rules.rst(2 hunks)snakemake/dag.py(1 hunks)snakemake/workflow.py(1 hunks)tests/test_priority/Snakefile(1 hunks)tests/test_priority/profiles/default/config.yaml(1 hunks)tests/tests.py(1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
tests/tests.py (1)
Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.
snakemake/workflow.py (1)
Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.
snakemake/dag.py (1)
Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.
🪛 YAMLlint (1.35.1)
tests/test_priority/profiles/default/config.yaml
[error] 1-1: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (5)
tests/tests.py (1)
1967-1968: LGTM!The test function correctly verifies the priority functionality by running the workflow with a single core.
snakemake/workflow.py (1)
1744-1751: LGTM! Priority validation now supports callable functions.The validation logic has been correctly updated to allow three types of priority values:
- Integer
- Float
- Callable function
This change enables dynamic priority assignment based on runtime conditions.
snakemake/dag.py (2)
1499-1503: LGTM! The priority handling is correctly implemented.The changes properly handle both callable and non-callable priority values:
- If the priority is callable (lambda function), it's called with the job's wildcards.
- If the priority is not callable, the priority value is used directly.
This implementation aligns with the PR objectives of allowing lambda functions to specify priority based on input wildcards.
1499-1503: Verify priority function behavior.Let's verify that priority functions are called correctly and return valid values:
✅ Verification successful
Priority handling implementation is correct.
The implementation properly handles both static priority values and dynamic priority functions:
- For static priorities (e.g.,
priority: 50), it uses the value directly- For dynamic priorities (e.g.,
priority: lambda wc: ...), it calls the function with wildcards🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Find all rules with priority functions and verify their usage. # Find rules with priority functions echo "Searching for rules with priority functions..." rg -A 5 'priority\s*=\s*lambda' . # Find potential validation of priority values echo "Searching for priority value validation..." rg 'priority' . | rg -i 'valid|check|assert'Length of output: 278
Script:
#!/bin/bash # Search for Rule class definition and priority-related code echo "=== Searching for Rule class and priority handling ===" rg -A 5 "class.*Rule" . echo -e "\n=== Searching for priority-related tests ===" fd test_ . | xargs rg "priority" echo -e "\n=== Searching for DAG priority handling ===" rg -B 3 -A 3 "_priority\[" .Length of output: 84634
tests/test_priority/Snakefile (1)
6-7: LGTM! Well-structured aggregation rule.The rule correctly uses
expand()to generate the list of target outputs.
5475036 to
a723767
Compare
|
Rebased with main branch to make automatic merging possible. |
|
I have applied the changes Raphael committed to dag.py and workflow.py month ago across three different versions of snakemake (7, 8, 9) and it works flawlessly in a context like this: rule mapping_and_sorting: To us (Institut für Pathologie & Hämatopathologie Hamburg), this is a vitally important update, as it adds the possibility to dynamically prioritize NGS analyses where cancer patients urgently need results to inform their further treatment over those who can wait for another weekend. Overall, we analyse several thousand cancer samples annually. While this branch is not merged with the main branch, each re-install or update of snakemake breaks the functionality of the pipeline I am building until I manually re-add the changes. Once the pipeline goes productive, this is too much of a risk in terms of portability across multiple nodes. Since I see the finish line, I need to either know soon that I can rely on this function in the future or I'll have to recode everything. Your work is essentially done, it lives up to its purpose, I cannot spot any spelling mistakes, documentation update is intelligible and formatting looks good to me. Therefore, I kindly ask you to prioritize merging this branch with the main. Thanks a lot! Best regards, |
0a45d8e to
2e20f8a
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/test_priority/profiles/default/config.yaml (1)
1-1: Add a trailing newline to satisfy YAMLlint and POSIX toolsThe file lacks a terminating newline, which triggers YAMLlint’s
new-line-at-end-of-filewarning and can confuse some command-line utilities when concatenating files.cores: 1 +
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (7)
tests/test_priority/expected-results/a.outis excluded by!**/*.outtests/test_priority/expected-results/e.outis excluded by!**/*.outtests/test_priority/expected-results/k.outis excluded by!**/*.outtests/test_priority/expected-results/l.outis excluded by!**/*.outtests/test_priority/expected-results/n.outis excluded by!**/*.outtests/test_priority/expected-results/order.outis excluded by!**/*.outtests/test_priority/expected-results/s.outis excluded by!**/*.out
📒 Files selected for processing (6)
docs/snakefiles/rules.rst(2 hunks)src/snakemake/dag.py(1 hunks)src/snakemake/workflow.py(1 hunks)tests/test_priority/Snakefile(1 hunks)tests/test_priority/profiles/default/config.yaml(1 hunks)tests/tests.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
- src/snakemake/workflow.py
- src/snakemake/dag.py
- tests/tests.py
- tests/test_priority/Snakefile
- docs/snakefiles/rules.rst
🧰 Additional context used
🧠 Learnings (1)
📓 Common learnings
Learnt from: leoschwarz
PR: snakemake/snakemake#3176
File: tests/test_output_index.py:99-157
Timestamp: 2025-01-17T12:00:09.368Z
Learning: New test dependencies for Snakemake should be introduced in separate PRs rather than being added as part of feature or refactoring PRs.
Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1250-1251
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Avoid commenting on trailing commas in code reviews for the Snakemake repository as per the user's preference.
Learnt from: johanneskoester
PR: snakemake/snakemake#3014
File: snakemake/workflow.py:1250-1251
Timestamp: 2024-08-13T16:22:58.430Z
Learning: Avoid commenting on trailing commas in code reviews for the Snakemake repository as per the user's preference.
Learnt from: johanneskoester
PR: snakemake/snakemake#3117
File: tests/test_wrapper/Snakefile:11-11
Timestamp: 2024-10-06T14:09:54.370Z
Learning: Changes made within test cases, such as in `tests/test_wrapper/Snakefile`, are for testing purposes and do not require updates to the project documentation.
Learnt from: johanneskoester
PR: snakemake/snakemake#2927
File: snakemake/ioutils/evaluate.py:13-17
Timestamp: 2024-12-14T13:05:08.032Z
Learning: The Snakemake project supports Python 3.7, so code contributions must remain compatible with Python 3.7 and avoid features introduced in later Python versions.
Learnt from: johanneskoester
PR: snakemake/snakemake#2927
File: snakemake/ioutils/branch.py:44-51
Timestamp: 2024-12-14T13:13:09.273Z
Learning: Snakemake needs to maintain compatibility with Python 3.7; avoid using syntax or features introduced after Python 3.7.
Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-08-13T11:05:23.821Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.
Learnt from: johanneskoester
PR: snakemake/snakemake#2960
File: CHANGELOG.md:6-6
Timestamp: 2024-10-08T17:41:54.542Z
Learning: Do not review pull requests generated by Release Please for the Snakemake project.
🪛 YAMLlint (1.37.1)
tests/test_priority/profiles/default/config.yaml
[error] 1-1: no new line character at the end of file
(new-line-at-end-of-file)
|
I would also love to see this merged! Perhaps we can get eyes on this @johanneskoester? The code change is quite small but already opens up quite a bit of functionality and comes with tests + documentation changes. My use case If possible, it would be great to get advice about how to proceed (perhaps in a second pull request) on how to achieve also passing resources into a priority call such that one could prioritize jobs based on their expected. resource usage. |
Good idea! With the functionality you built, you could already achieve that - at least in a simple version. All it takes is a python function accepting e.g. config["rulename_memory"] and/or config["rulename_threads"] as arguments and returning an integer. Then make a lambda call of the function with the required arguments in the prio section and its done ;) |
|
I found this to be trickier in practice, since resources can be dynamically assigned, e.g. based on input file size, and therefore calculated quite often whereas the priorities for the DAG are computed less frequently, only on DAG updates. The functionality in this PR was straightforward to implement and perhaps we could get a version of what you are suggesting implemented in the future, maybe under simplifying assumptions, but I believe its currently beyond the scope of this PR. But perhaps we can start a draft and experiment once this one is merged? :-) |
Absolutely! I can help testing, but I would not know how to make the priority section "see" what's inside the resources and threads sections. However, if I were to implement a rule with dynamic resource assignment and resource-dependent priority right now, I'd do dynamic rule generation in a for-loop, do the math for calculating resources and priority there and just hand the results to the respective sections (maybe useful also for a unit test for however you think this can be developed). |
2e20f8a to
f0e7178
Compare
|
Rebased to make merging easier. |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
tests/test_priority/profiles/default/config.yaml (1)
1-1: Add a trailing newline to satisfy linters.YAMLlint flagged “no new line character at the end of file”. Append a newline at EOF.
tests/test_priority/Snakefile (1)
3-4: Remove debug prints and simplify priority computation.Top-level prints add noise to test output. Also, avoid repeated O(n) scans by precomputing positions and using dict lookup in the lambda.
Apply this diff:
-for i in word: - print(word.index(i) + 1) +pos = {c: i + 1 for i, c in enumerate(word)} @@ - priority: lambda wc: word.index(wc.i) + 1 if wc.i in word else 0 + priority: lambda wc: pos.get(wc.i, 0)Also applies to: 11-11
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (7)
tests/test_priority/expected-results/a.outis excluded by!**/*.outtests/test_priority/expected-results/e.outis excluded by!**/*.outtests/test_priority/expected-results/k.outis excluded by!**/*.outtests/test_priority/expected-results/l.outis excluded by!**/*.outtests/test_priority/expected-results/n.outis excluded by!**/*.outtests/test_priority/expected-results/order.outis excluded by!**/*.outtests/test_priority/expected-results/s.outis excluded by!**/*.out
📒 Files selected for processing (6)
docs/snakefiles/rules.rst(2 hunks)src/snakemake/dag.py(1 hunks)src/snakemake/workflow.py(1 hunks)tests/test_priority/Snakefile(1 hunks)tests/test_priority/profiles/default/config.yaml(1 hunks)tests/tests.py(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
- src/snakemake/dag.py
- src/snakemake/workflow.py
- docs/snakefiles/rules.rst
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit Configuration File
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
tests/tests.py
🧬 Code Graph Analysis (1)
tests/tests.py (2)
src/snakemake/workflow.py (2)
run(2289-2290)cores(583-593)tests/common.py (1)
run(151-495)
🪛 YAMLlint (1.37.1)
tests/test_priority/profiles/default/config.yaml
[error] 1-1: no new line character at the end of file
(new-line-at-end-of-file)
🔇 Additional comments (2)
tests/tests.py (1)
1882-1883: Verified expected-results for test_priority includes order.out
The directorytests/test_priority/expected-resultsexists and containsorder.outwith the expected deterministic ordering:a e k n s lNo further changes are needed here.
tests/test_priority/Snakefile (1)
9-17: Dynamic priority usage is correct and exercises the feature.Using lambda wildcards to compute priorities and a competing static rule (“a”) is a solid way to validate both callable priorities and disambiguation between rules.
f0e7178 to
6806a02
Compare
5c2dd1d to
5c0314f
Compare
5c0314f to
0219182
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Fix all issues with AI agents
In `@src/snakemake/dag.py`:
- Around line 1677-1681: The priority callable invocation (where
self._priority[job] is set from job.rule.priority with job.wildcards) should be
wrapped in a try/except to catch any exceptions raised by a user-supplied
priority function and re-raise a clearer error that includes the rule name and
wildcards; locate the block setting self._priority[job] and surround the
callable call (job.rule.priority(job.wildcards)) with a try: ... except
Exception as e: raise SomeRelevantError(f"Error evaluating priority for rule
'{job.rule.name}' with wildcards {job.wildcards}: {e}") from e so you preserve
original context via exception chaining (use the same exception-wrapping pattern
used for the checksum callable).
|
I have now fully implemented the functionality in #2994. It would be really great to get eyes on this. The docs fail to build but for reasons unrelated to the content added. Seems to be a versioning issue with sphinx? I get the same build errors on the main branch. |
e39d55a to
32baaa7
Compare
🤖 I have created a release *beep* *boop* --- ## [9.17.0](v9.16.3...v9.17.0) (2026-03-13) ### Features * Allow storing snakemake metadata in files or databases ([#4012](#4012)) ([dd75f31](dd75f31)) * Allow to specify comparison command per-unit test ([#3956](#3956)) ([b88171c](b88171c)) * job table orderd topological when run is started ([#4018](#4018)) ([75cf506](75cf506)) * lambda functions for priority in rules ([#3253](#3253)) ([d2aa226](d2aa226)) * Make on... directive of modules accessible ([#4050](#4050)) ([e9f2e1c](e9f2e1c)) ### Bug Fixes * adjust conda tests to not fail on apple silicon; fix [#4040](#4040) ([#4049](#4049)) ([f5b0142](f5b0142)) * allow "--containerize apptainer" to output apptainer format instead of dockerfile ([#4030](#4030)) ([f5cac30](f5cac30)) * apptainer command not recognized when singularity is absent ([#4010](#4010)) ([b8162e2](b8162e2)) * capture stderr when tests fail ([#3995](#3995)) ([97d74ba](97d74ba)) * **docs:** make Data-dependent conditional execution a complete example ([#4043](#4043)) ([3a1d7f2](3a1d7f2)) * don't build the DAG when running unlock. Fixes [#4000](#4000) and [#198](#198) ([#4007](#4007)) ([acf79fd](acf79fd)) * Ensure pixi tasks may be run as advertised ([#4046](#4046)) ([88253c2](88253c2)) * fix checkpoint handling corner cases ([#3870](#3870) and [#3559](#3559)) ([#4015](#4015)) ([63f4257](63f4257)) * issue 3642 ([#4054](#4054)) ([76e6fc2](76e6fc2)) * issue 3815 ([#4026](#4026)) ([b0eec96](b0eec96)) * logging None in shellcmd context causes error ([#4064](#4064)) ([d0652cd](d0652cd)) * lookup function returns default value for empty DataFrame queries ([#4056](#4056)) ([f71de97](f71de97)) * make `cache: omit-software` a rule specific property ([#4085](#4085)) ([034a9e7](034a9e7)) * reduce number of tests leaving temporary files behind ([#4033](#4033)) ([a3a1c97](a3a1c97)) * regression in dynamic resource handling ([#4038](#4038)) ([f2c554a](f2c554a)) * somewhat shorter announce message ([#4080](#4080)) ([57efc71](57efc71)) ### Performance Improvements * switch reretry with tenacity; decouple container classes (with Python 3.7 compat for old scripts) from rest of the codebase (enabling moving to newer python versions) ([#4032](#4032)) ([ffb19e7](ffb19e7)) ### Documentation * Add AI-assisted contributions policy to contributing guidelines ([#4051](#4051)) ([dd70526](dd70526)) * **codebase:** Update & simplify plugin architecture section ([#4052](#4052)) ([176cf63](176cf63)) * Correct workflow.source_path() description in documentation ([#4036](#4036)) ([45883c5](45883c5)) * fixed wrong code example for collect() function ([#4037](#4037)) ([5c85ed8](5c85ed8)) * Minor docs improvements ([#4089](#4089)) ([29ea226](29ea226)) * switch to sphinx_design for tabs ([#3976](#3976)) ([9674614](9674614)) * typo in the migration table breaking a pip install command ([#4024](#4024)) ([66f9dda](66f9dda)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).



Fixes #2994. Lambda functions can now be used to specify priority based on input wildcards and inputs. Happy to receive feedback.
I've added some tests to cover priorities as well and updated the docs.
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit