feat: migrate to scheduler plugin interface and scheduler plugins#3676
feat: migrate to scheduler plugin interface and scheduler plugins#3676johanneskoester merged 19 commits intomainfrom
Conversation
📝 WalkthroughWalkthroughIntroduces a plugin-based scheduler system: adds MILP and greedy scheduler plugins, refactors JobScheduler to delegate selection to plugins with async input-size prefetch and validation, extends Workflow/API/CLI to accept scheduler plugins and settings, and adjusts jobs/DAG/I/O typing and small APIs to support scheduling needs. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI
participant API
participant Registry
participant Workflow
participant SchedulerPlugin
participant JobScheduler
CLI->>API: args_to_api(args)
API->>Registry: get(scheduler_name)
Registry-->>API: SchedulerPlugin
API->>Workflow: execute(..., scheduler_plugin, scheduler_settings, greedy_scheduler_settings)
Workflow->>SchedulerPlugin: instantiate scheduler(dag, settings, logger)
Workflow->>JobScheduler: new JobScheduler(..., scheduler, greedy_scheduler_settings)
JobScheduler->>SchedulerPlugin: select_jobs(selectable, remaining, resources, input_sizes)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. ✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
|
Please format your code with black: |
There was a problem hiding this comment.
Actionable comments posted: 8
🔭 Outside diff range comments (1)
src/snakemake/io/__init__.py (1)
932-935: Fix naming inconsistency: method name suggests megabytes but returns bytes.The property name
temp_size_mbimplies the result is in megabytes, but it actually returns the sum ofsize_tempfileswhich is in bytes. This creates confusion and inconsistency with the naming convention.Apply this fix to align the name with the actual return value:
- def temp_size_mb(self): - return sum(self.size_tempfiles) + def temp_size(self): + return sum(self.size_tempfiles)Alternatively, if you want to keep the
mbsuffix, convert to megabytes:- def temp_size_mb(self): - return sum(self.size_tempfiles) + def temp_size_mb(self): + return sum(self.size_tempfiles) / (1024 * 1024)
🧹 Nitpick comments (3)
src/snakemake/settings/types.py (1)
307-314: Consider updating the docstring to match the new default scheduler valueThe docstring still mentions "default ilp" but the actual default value has been changed to "milp" on line 320.
- scheduler: - Select scheduling algorithm (default ilp, allowed: ilp, greedy or any scheduler plugin name). + scheduler: + Select scheduling algorithm (default milp, allowed: any scheduler plugin name).src/snakemake/scheduling/job_scheduler.py (1)
195-208: Good implementation of concurrent input size caching.The async implementation efficiently retrieves file sizes. Consider a more readable approach for cache eviction.
For better readability, consider:
if len(self._input_sizes) > 10000: - for path in list(filterfalse(paths.__contains__, self._input_sizes)): - del self._input_sizes[path] + # Remove paths not in current job inputs + for path in list(self._input_sizes.keys()): + if path not in paths: + del self._input_sizes[path]src/snakemake/jobs.py (1)
299-310: Good improvements to auxiliary resource management.The validation and cache invalidation logic is sound.
Minor style improvement:
- if name in self.resources.keys(): + if name in self.resources:
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
pyproject.tomlis excluded by!pyproject.toml
📒 Files selected for processing (11)
src/snakemake/api.py(5 hunks)src/snakemake/cli.py(7 hunks)src/snakemake/dag.py(7 hunks)src/snakemake/io/__init__.py(2 hunks)src/snakemake/jobs.py(12 hunks)src/snakemake/scheduling/greedy.py(1 hunks)src/snakemake/scheduling/job_scheduler.py(8 hunks)src/snakemake/scheduling/milp.py(1 hunks)src/snakemake/settings/types.py(2 hunks)src/snakemake/workflow.py(4 hunks)test.txt(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
Instructions used from:
Sources:
⚙️ CodeRabbit Configuration File
🧠 Learnings (10)
📓 Common learnings
Learnt from: leoschwarz
PR: snakemake/snakemake#3176
File: tests/test_output_index.py:99-157
Timestamp: 2025-01-17T12:00:09.368Z
Learning: New test dependencies for Snakemake should be introduced in separate PRs rather than being added as part of feature or refactoring PRs.
Learnt from: lczech
PR: snakemake/snakemake#3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In `snakemake/scheduler.py`, avoid suggesting the use of `asyncio.gather` in the `jobs_rewards` method due to overhead concerns and the need for immediate results.
Learnt from: cademirch
PR: snakemake/snakemake#3107
File: snakemake/logging.py:20-20
Timestamp: 2025-01-13T18:50:40.791Z
Learning: Dependencies for Snakemake are managed through test-environment.yml file, which includes snakemake-interface-logger-plugins (>= 0.1.0) and other required packages.
src/snakemake/io/__init__.py (1)
Learnt from: johanneskoester
PR: #3600
File: src/snakemake/jobs.py:960-964
Timestamp: 2025-05-23T09:40:24.474Z
Learning: In the cleanup method of the Job class in src/snakemake/jobs.py, files in the to_remove list should be formatted with fmt_iofile without specifying as_output=True or as_input=True parameters, as these files should be displayed as generic files rather than specifically as output files.
src/snakemake/workflow.py (5)
Learnt from: cademirch
PR: #3107
File: snakemake/logging.py:20-20
Timestamp: 2025-01-13T18:50:40.791Z
Learning: Dependencies for Snakemake are managed through test-environment.yml file, which includes snakemake-interface-logger-plugins (>= 0.1.0) and other required packages.
Learnt from: johanneskoester
PR: #3114
File: snakemake/cli.py:1707-1707
Timestamp: 2024-10-05T09:55:00.509Z
Learning: ReportPluginRegistry is a singleton; multiple instantiations are acceptable and do not need optimization in cli.py.
Learnt from: johanneskoester
PR: #3114
File: snakemake/cli.py:1707-1707
Timestamp: 2024-10-08T17:41:54.542Z
Learning: ReportPluginRegistry is a singleton; multiple instantiations are acceptable and do not need optimization in cli.py.
Learnt from: johanneskoester
PR: #3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In snakemake/dag.py, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.
Learnt from: lczech
PR: #3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In snakemake/scheduler.py, avoid suggesting the use of asyncio.gather in the jobs_rewards method due to overhead concerns and the need for immediate results.
src/snakemake/scheduling/greedy.py (1)
Learnt from: lczech
PR: #3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In snakemake/scheduler.py, avoid suggesting the use of asyncio.gather in the jobs_rewards method due to overhead concerns and the need for immediate results.
src/snakemake/dag.py (5)
Learnt from: johanneskoester
PR: #3140
File: snakemake/dag.py:1308-1308
Timestamp: 2024-10-14T09:42:11.571Z
Learning: In snakemake/dag.py, avoid flagging missing lines or indentation issues when there is no clear syntax or logical error to prevent false positives.
Learnt from: johanneskoester
PR: #3148
File: snakemake/dag.py:1332-1336
Timestamp: 2024-11-12T12:08:20.342Z
Learning: In snakemake/dag.py, when code is outdated and will disappear upon resolving merge conflicts, avoid making code review suggestions on that code.
Learnt from: lczech
PR: #3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In snakemake/scheduler.py, avoid suggesting the use of asyncio.gather in the jobs_rewards method due to overhead concerns and the need for immediate results.
Learnt from: johanneskoester
PR: #3114
File: snakemake/cli.py:708-708
Timestamp: 2024-10-13T14:10:37.796Z
Learning: In the snakemake/cli.py file, ExecutorPluginRegistry() is a singleton class, so multiple instantiations are acceptable and do not cause unnecessary overhead.
Learnt from: cademirch
PR: #3107
File: snakemake/logging.py:20-20
Timestamp: 2025-01-13T18:50:40.791Z
Learning: Dependencies for Snakemake are managed through test-environment.yml file, which includes snakemake-interface-logger-plugins (>= 0.1.0) and other required packages.
src/snakemake/scheduling/milp.py (1)
Learnt from: lczech
PR: #3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In snakemake/scheduler.py, avoid suggesting the use of asyncio.gather in the jobs_rewards method due to overhead concerns and the need for immediate results.
src/snakemake/cli.py (4)
Learnt from: johanneskoester
PR: #3114
File: snakemake/cli.py:708-708
Timestamp: 2024-10-13T14:10:37.796Z
Learning: In the snakemake/cli.py file, ExecutorPluginRegistry() is a singleton class, so multiple instantiations are acceptable and do not cause unnecessary overhead.
Learnt from: cademirch
PR: #3107
File: snakemake/logging.py:20-20
Timestamp: 2025-01-13T18:50:40.791Z
Learning: Dependencies for Snakemake are managed through test-environment.yml file, which includes snakemake-interface-logger-plugins (>= 0.1.0) and other required packages.
Learnt from: johanneskoester
PR: #3114
File: snakemake/cli.py:1707-1707
Timestamp: 2024-10-05T09:55:00.509Z
Learning: ReportPluginRegistry is a singleton; multiple instantiations are acceptable and do not need optimization in cli.py.
Learnt from: johanneskoester
PR: #3114
File: snakemake/cli.py:1707-1707
Timestamp: 2024-10-08T17:41:54.542Z
Learning: ReportPluginRegistry is a singleton; multiple instantiations are acceptable and do not need optimization in cli.py.
src/snakemake/scheduling/job_scheduler.py (2)
Learnt from: lczech
PR: #3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In snakemake/scheduler.py, avoid suggesting the use of asyncio.gather in the jobs_rewards method due to overhead concerns and the need for immediate results.
Learnt from: johanneskoester
PR: #3114
File: snakemake/cli.py:708-708
Timestamp: 2024-10-13T14:10:37.796Z
Learning: In the snakemake/cli.py file, ExecutorPluginRegistry() is a singleton class, so multiple instantiations are acceptable and do not cause unnecessary overhead.
src/snakemake/api.py (5)
Learnt from: lczech
PR: #3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In snakemake/scheduler.py, avoid suggesting the use of asyncio.gather in the jobs_rewards method due to overhead concerns and the need for immediate results.
Learnt from: johanneskoester
PR: #3114
File: snakemake/cli.py:708-708
Timestamp: 2024-10-13T14:10:37.796Z
Learning: In the snakemake/cli.py file, ExecutorPluginRegistry() is a singleton class, so multiple instantiations are acceptable and do not cause unnecessary overhead.
Learnt from: cademirch
PR: #3107
File: snakemake/logging.py:20-20
Timestamp: 2025-01-13T18:50:40.791Z
Learning: Dependencies for Snakemake are managed through test-environment.yml file, which includes snakemake-interface-logger-plugins (>= 0.1.0) and other required packages.
Learnt from: johanneskoester
PR: #3114
File: snakemake/cli.py:1707-1707
Timestamp: 2024-10-05T09:55:00.509Z
Learning: ReportPluginRegistry is a singleton; multiple instantiations are acceptable and do not need optimization in cli.py.
Learnt from: johanneskoester
PR: #3114
File: snakemake/cli.py:1707-1707
Timestamp: 2024-10-08T17:41:54.542Z
Learning: ReportPluginRegistry is a singleton; multiple instantiations are acceptable and do not need optimization in cli.py.
src/snakemake/jobs.py (3)
Learnt from: lczech
PR: #3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In snakemake/scheduler.py, avoid suggesting the use of asyncio.gather in the jobs_rewards method due to overhead concerns and the need for immediate results.
Learnt from: johanneskoester
PR: #3114
File: snakemake/cli.py:708-708
Timestamp: 2024-10-13T14:10:37.796Z
Learning: In the snakemake/cli.py file, ExecutorPluginRegistry() is a singleton class, so multiple instantiations are acceptable and do not cause unnecessary overhead.
Learnt from: cademirch
PR: #3107
File: snakemake/logging.py:20-20
Timestamp: 2025-01-13T18:50:40.791Z
Learning: Dependencies for Snakemake are managed through test-environment.yml file, which includes snakemake-interface-logger-plugins (>= 0.1.0) and other required packages.
🧬 Code Graph Analysis (3)
src/snakemake/io/__init__.py (1)
src/snakemake/common/__init__.py (1)
async_run(91-108)
src/snakemake/scheduling/greedy.py (5)
src/snakemake/jobs.py (10)
jobs(1362-1363)jobs(1366-1367)scheduler_resources(513-514)scheduler_resources(1525-1526)priority(592-593)priority(1281-1282)priority(1666-1667)input(337-338)input(341-342)input(1529-1534)src/snakemake/dag.py (2)
jobs(607-609)priority(637-639)src/snakemake/scheduling/milp.py (3)
SchedulerSettings(24-35)Scheduler(38-201)select_jobs(39-176)src/snakemake/scheduling/job_scheduler.py (1)
remaining_jobs(185-193)src/snakemake/io/__init__.py (2)
get(1873-1878)is_flagged(955-958)
src/snakemake/jobs.py (1)
src/snakemake/dag.py (2)
jobs(607-609)toposorted(3148-3211)
🪛 Ruff (0.12.2)
src/snakemake/scheduling/greedy.py
2-2: itertools.filterfalse imported but unused
Remove unused import: itertools.filterfalse
(F401)
3-3: typing.Iterable imported but unused
Remove unused import: typing.Iterable
(F401)
src/snakemake/scheduling/milp.py
1-1: collections.defaultdict imported but unused
Remove unused import: collections.defaultdict
(F401)
6-6: typing.Iterable imported but unused
Remove unused import: typing.Iterable
(F401)
7-7: snakemake.common.async_run imported but unused
Remove unused import: snakemake.common.async_run
(F401)
src/snakemake/scheduling/job_scheduler.py
8-8: collections.defaultdict imported but unused
Remove unused import: collections.defaultdict
(F401)
9-9: math imported but unused
Remove unused import: math
(F401)
10-10: os imported but unused
Remove unused import: os
(F401)
src/snakemake/jobs.py
304-304: Use key in dict instead of key in dict.keys()
Remove .keys()
(SIM118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: tests (5, windows-latest, py312)
🔇 Additional comments (28)
src/snakemake/io/__init__.py (1)
901-906: LGTM! Well-implemented temporary file size calculation.The implementation correctly follows the established pattern, uses
async_runappropriately to handle the asynchronous size operations, and properly filters temporary files usingis_flagged(f, "temp").src/snakemake/workflow.py (5)
55-60: LGTM: Scheduler plugin interface imports added correctly.The new imports for
SchedulerSettingsBaseandSchedulerPluginare consistent with the existing plugin architecture pattern and necessary for the scheduler plugin interface migration.
62-62: LGTM: Greedy scheduler settings import is properly structured.The aliased import maintains clarity and follows proper naming conventions for the scheduler plugin architecture.
76-76: LGTM: JobScheduler import updated for plugin architecture.The import path change reflects the refactoring to support scheduler plugins and is consistent with the architectural migration.
1204-1206: LGTM: Method signature properly updated for scheduler plugins.The new parameters follow the established plugin architecture pattern with appropriate type hints. The
Optional[SchedulerSettingsBase]typing correctly indicates that scheduler settings may not always be required.
1320-1325: LGTM: JobScheduler instantiation properly updated for plugin architecture.The new instantiation pattern correctly delegates scheduler creation to the plugin while maintaining clean separation of concerns. The scheduler plugin creates the actual scheduler instance with appropriate parameters (DAG, settings, logger), which is consistent with the plugin architecture design.
src/snakemake/dag.py (7)
18-18: LGTM: Import additions support scheduler plugin interfaceThe new imports (
IterableandDAGSchedulerInterface) are correctly added to support the scheduler plugin interface implementation.Also applies to: 34-34
105-105: LGTM: DAG class implements scheduler plugin interfaceThe addition of
DAGSchedulerInterfaceto the class inheritance is correct and aligns with the PR objectives to introduce scheduler plugin support.
127-129: LGTM: Improved type annotation for dependenciesThe refined type annotation
Mapping[Job, Mapping[Job, Set[str]]]provides better type safety and clearly describes the nested mapping structure of job dependencies.
204-205: LGTM: Added return type annotation for dependenciesThe return type annotation
Mapping[Job, Mapping[Job, Set[str]]]is consistent with the refined type of the_dependenciesattribute.
207-210: LGTM: New job_dependencies method with appropriate type handlingThe new method correctly provides access to job dependencies through the mapping keys. The type ignore comment is justified as the type checker doesn't recognize that
keys()returns an iterable, which is a known limitation.
611-611: LGTM: Added return type annotation for needrun_jobsThe return type annotation
Iterable[Job]accurately describes what the method returns.
652-652: LGTM: Added return type annotation for finished methodThe return type annotation
boolcorrectly describes the method's return value.src/snakemake/api.py (2)
40-40: LGTM!The new imports for scheduler plugin interfaces are properly organized and necessary for the scheduler plugin integration.
Also applies to: 51-52
460-461: LGTM!The new scheduler settings parameters are properly typed and follow the existing parameter pattern.
src/snakemake/scheduling/job_scheduler.py (3)
60-66: LGTM!The constructor changes properly integrate the scheduler plugin architecture with appropriate caching mechanisms.
Also applies to: 81-82, 122-122, 149-157
542-583: Well-structured job selector refactoring.The refactored job_selector properly integrates scheduler plugins with validation, input size awareness, and fallback mechanisms.
592-611: Excellent job validation implementation.The validate_job method provides clear error messages and efficient caching. The error message correctly identifies common scenarios like pipe connections that require simultaneous execution.
src/snakemake/jobs.py (5)
31-35: LGTM!The scheduler interface additions properly integrate the job classes with the new scheduler plugin architecture.
Also applies to: 85-85, 183-187, 1323-1323
100-120: Good adaptation of resource handling for scheduler plugins.The method correctly returns a dictionary suitable for scheduler plugins while filtering out TBDString values that can't be evaluated yet.
513-514: LGTM!The return type annotation correctly reflects the dictionary return type for scheduler compatibility.
1354-1359: Good addition of type annotations.The type annotations improve code clarity and the assertion in
toposortedensures proper initialization.Also applies to: 1362-1363, 1374-1376, 1525-1525
195-220: Good refactoring of input size caching.Moving input size caching from job classes to the JobScheduler improves separation of concerns, as this data is only needed during job scheduling.
src/snakemake/cli.py (5)
11-11: Import changes look good.The addition of
Listto the typing imports and the newSchedulerPluginRegistryimport are appropriate for the scheduler plugin migration.Also applies to: 19-19
352-352: Good addition of return type annotation.The
Optional[Tuple[Path, Path]]return type correctly documents that this function returns either None or a tuple of profile and config file paths.
868-876: Scheduler argument properly migrated to plugin architecture.The changes correctly implement the plugin-based scheduler selection:
- Default "ilp" maintains backward compatibility
- Dynamic choices from the registry enable extensibility
- Help text clearly documents the builtin plugins (greedy and milp)
1490-1490: Clear deprecation notice added.The deprecation message properly guides users to the new plugin-specific argument
--scheduler-greedy-greediness.
1776-1776: Scheduler plugin CLI registration properly integrated.The scheduler plugin registry is correctly added alongside other plugin registries, enabling scheduler plugins to register their specific CLI arguments.
…e/snakemake into feat/scheduler-plugin-interface
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
src/snakemake/jobs.py (1)
299-310: Apply static analysis suggestion and improve resource cache management.The auxiliary resource management logic is sound, but there are opportunities for improvement:
- Use the more Pythonic
inoperator instead ofin dict.keys()- The cache reset logic correctly handles both
_resourcesand_scheduler_resourcesApply this diff to address the static analysis hint:
- if name in self.resources.keys(): + if name in self.resources: return # do not add aux_resources that are already in resources
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/snakemake/jobs.py(13 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit Configuration File
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/jobs.py
🧠 Learnings (2)
📓 Common learnings
Learnt from: leoschwarz
PR: snakemake/snakemake#3176
File: tests/test_output_index.py:99-157
Timestamp: 2025-01-17T12:00:09.368Z
Learning: New test dependencies for Snakemake should be introduced in separate PRs rather than being added as part of feature or refactoring PRs.
Learnt from: lczech
PR: snakemake/snakemake#3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In `snakemake/scheduler.py`, avoid suggesting the use of `asyncio.gather` in the `jobs_rewards` method due to overhead concerns and the need for immediate results.
Learnt from: cademirch
PR: snakemake/snakemake#3107
File: snakemake/logging.py:20-20
Timestamp: 2025-01-13T18:50:40.791Z
Learning: Dependencies for Snakemake are managed through test-environment.yml file, which includes snakemake-interface-logger-plugins (>= 0.1.0) and other required packages.
src/snakemake/jobs.py (4)
Learnt from: lczech
PR: #3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In snakemake/scheduler.py, avoid suggesting the use of asyncio.gather in the jobs_rewards method due to overhead concerns and the need for immediate results.
Learnt from: johanneskoester
PR: #3600
File: src/snakemake/jobs.py:960-964
Timestamp: 2025-05-23T09:40:24.474Z
Learning: In the cleanup method of the Job class in src/snakemake/jobs.py, files in the to_remove list should be formatted with fmt_iofile without specifying as_output=True or as_input=True parameters, as these files should be displayed as generic files rather than specifically as output files.
Learnt from: johanneskoester
PR: #3114
File: snakemake/cli.py:708-708
Timestamp: 2024-10-13T14:10:37.796Z
Learning: In the snakemake/cli.py file, ExecutorPluginRegistry() is a singleton class, so multiple instantiations are acceptable and do not cause unnecessary overhead.
Learnt from: cademirch
PR: #3107
File: snakemake/logging.py:20-20
Timestamp: 2025-01-13T18:50:40.791Z
Learning: Dependencies for Snakemake are managed through test-environment.yml file, which includes snakemake-interface-logger-plugins (>= 0.1.0) and other required packages.
🪛 Ruff (0.12.2)
src/snakemake/jobs.py
304-304: Use key in dict instead of key in dict.keys()
Remove .keys()
(SIM118)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (56)
- GitHub Check: tests (9, windows-latest, py313)
- GitHub Check: tests (10, ubuntu-latest, py313)
- GitHub Check: tests (9, ubuntu-latest, py313)
- GitHub Check: tests (10, windows-latest, py312)
- GitHub Check: tests (10, windows-latest, py313)
- GitHub Check: tests (10, ubuntu-latest, py311)
- GitHub Check: tests (10, ubuntu-latest, py312)
- GitHub Check: tests (9, windows-latest, py312)
- GitHub Check: tests (8, windows-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py311)
- GitHub Check: tests (9, ubuntu-latest, py311)
- GitHub Check: tests (8, windows-latest, py313)
- GitHub Check: tests (6, windows-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py312)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (7, windows-latest, py313)
- GitHub Check: tests (7, ubuntu-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py313)
- GitHub Check: tests (7, windows-latest, py312)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (7, ubuntu-latest, py313)
- GitHub Check: tests (6, windows-latest, py313)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (5, ubuntu-latest, py313)
- GitHub Check: tests (6, ubuntu-latest, py313)
- GitHub Check: tests (5, macos-latest, py313)
- GitHub Check: tests (5, windows-latest, py313)
- GitHub Check: tests (5, ubuntu-latest, py312)
- GitHub Check: tests (5, windows-latest, py312)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (4, macos-latest, py313)
- GitHub Check: tests (3, ubuntu-latest, py313)
- GitHub Check: tests (4, windows-latest, py312)
- GitHub Check: tests (4, windows-latest, py313)
- GitHub Check: tests (4, ubuntu-latest, py313)
- GitHub Check: tests (3, windows-latest, py312)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (3, windows-latest, py313)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (2, macos-latest, py313)
- GitHub Check: tests (3, ubuntu-latest, py311)
- GitHub Check: tests (1, windows-latest, py313)
- GitHub Check: tests (2, windows-latest, py313)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (1, macos-latest, py313)
- GitHub Check: tests (1, windows-latest, py312)
- GitHub Check: tests (2, windows-latest, py312)
- GitHub Check: tests (1, macos-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py311)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py313)
- GitHub Check: apidocs
🔇 Additional comments (9)
src/snakemake/jobs.py (9)
19-19: LGTM!The addition of
Dictto typing imports is necessary for the newscheduler_resourcesproperty return type annotations.
31-35: LGTM!The import of scheduler interface classes (
JobSchedulerInterface,SingleJobSchedulerInterface,GroupJobSchedulerInterface) is correctly added to support the new scheduler plugin architecture.
85-85: LGTM!The
AbstractJobclass now correctly inherits from bothJobExecutorInterfaceandJobSchedulerInterface, properly integrating with the scheduler plugin system.
100-119: LGTM!The
_get_scheduler_resourcesmethod correctly returns a plain dictionary instead of aResourcesobject, which aligns with the scheduler plugin interface requirements. The logic for filtering resources based on execution context (local vs remote) and excludingTBDStringvalues is appropriate.
183-188: LGTM!The
Jobclass inheritance is correctly updated to include both executor and scheduler interfaces (SingleJobExecutorInterface,SingleJobSchedulerInterface), maintaining compatibility with both systems.
513-514: LGTM!The
scheduler_resourcesproperty correctly delegates to_get_scheduler_resources()and has proper type annotation returningDict[str, Union[int, str]].
323-323: LGTM!The
GroupJobclass inheritance is correctly updated to include both executor and scheduler interfaces (GroupJobExecutorInterface,GroupJobSchedulerInterface).
354-406: LGTM!The property type annotations and the new
is_toposortedproperty enhance type safety and provide clear state checking capabilities. The logic correctly tracks whether the job group has been topologically sorted, which is important for proper execution order.
529-530: LGTM!The
GroupJob.scheduler_resourcesproperty correctly mirrors theJobimplementation, delegating to_get_scheduler_resources()with proper type annotation.
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
src/snakemake/scheduling/job_scheduler.py (1)
6-10: Unused import cleanup addressed — good.The previously flagged unused imports have been cleaned up or are now used (asyncio, deque, signal, sys, etc.). Nothing further to do here.
Also applies to: 13-16, 23-26, 31-31, 33-33
🧹 Nitpick comments (5)
src/snakemake/dag.py (2)
127-129: Fix type accuracy for _dependencies to avoid mutability/type-checker mismatches.
_dependenciesis mutated throughout the class, but it’s annotated as a read-onlyMapping. This is likely to trigger type-checker complaints (e.g., ondel, item assignment) and obscures the intent. Since the actual object is a nested defaultdict, annotate it accordingly.Apply these diffs:
- from typing import Iterable, List, Mapping, Optional, Set, Union, Dict + from typing import Iterable, List, Mapping, Optional, Set, Union, Dict, DefaultDict- self._dependencies: Mapping[Job, Mapping[Job, Set[str]]] = defaultdict( - partial(defaultdict, set) - ) + self._dependencies: DefaultDict[Job, DefaultDict[Job, Set[str]]] = defaultdict( + partial(defaultdict, set) + )This keeps the public-facing
dependenciesproperty asMapping[...]while accurately reflecting internal mutability and preventing false positives during static analysis.Also applies to: 18-18
209-212: Remove the type: ignore by avoiding the untyped default in .get().The current implementation relies on
type: ignorebecause{}defaults toDict[Any, Any]. Avoid the ignore and prevent accidental defaultdict creation:- def job_dependencies(self, job: Job) -> Iterable[Job]: - # keys() returns an iterable, but somehow the type checker does not recognize it - return self._dependencies.get(job, {}).keys() # type: ignore[return-value] + def job_dependencies(self, job: Job) -> Iterable[Job]: + # Avoid creating a new inner defaultdict for unknown jobs. + return self._dependencies[job].keys() if job in self._dependencies else ()This returns a KeysView when available (cheap and iterable) and an empty tuple otherwise, satisfying the Iterable contract without suppressing type checks.
src/snakemake/scheduling/job_scheduler.py (3)
197-206: Avoid double round-trips per path; make size fetching more robust.
await path.exists()followed byawait path.size()can incur two I/O calls per file and still race. Prefer a single call tosize()with exception handling to reduce overhead and handle transient errors cleanly.Apply this diff:
- async def update_input_sizes(self, jobs: Iterable[AbstractJob]): - async def get_size(path: _IOFile): - return path, await path.size() if await path.exists() else None + async def update_input_sizes(self, jobs: Iterable[AbstractJob]): + async def get_size(path: _IOFile): + try: + return path, await path.size() + except Exception: + # If the path does not exist (yet) or storage backends error out, keep None. + return path, None
207-210: Pruning strategy may be overly aggressive.When the cache exceeds 10k entries, this drops all entries not in the current selection’s paths. That can cause cache thrash if the ready set alternates. Consider pruning only as many as needed to get back to the cap, preferring to drop entries not in current paths first.
Apply this diff:
- if len(self._input_sizes) > 10000: - for path in list(filterfalse(paths.__contains__, self._input_sizes)): - del self._input_sizes[path] + max_entries = 10000 + if len(self._input_sizes) > max_entries: + # Drop entries not needed for current selection first, but only as many as necessary. + drop_candidates = [p for p in self._input_sizes.keys() if p not in paths] + n_to_drop = len(self._input_sizes) - max_entries + for p in drop_candidates[:n_to_drop]: + del self._input_sizes[p]
549-566: Don’t hold the scheduler lock while running plugin selection.
run_selectorholdsself._lockacross the potentially expensive plugin selector (e.g., MILP). This can block executor callbacks (_proceed, error handling) and delay frees/postprocessing. Capture the necessary snapshots under the lock, run selection without the lock, then re-acquire the lock to update resources.Apply this refactor:
- def run_selector(job_selector) -> Sequence[AbstractJob]: - with self._lock: - if self.resources["_cores"] == 0: - return [] - if len(jobs) == 1: - return self.job_selector_greedy( - jobs, self.remaining_jobs, self.resources, self._input_sizes - ) - selected = job_selector( - jobs, self.remaining_jobs, self.resources, self._input_sizes - ) - if selected is None: - selected = self.job_selector_greedy( - jobs, self.remaining_jobs, self.resources, self._input_sizes - ) - self.update_available_resources(selected) - return selected + def run_selector(job_selector) -> Sequence[AbstractJob]: + # Take snapshots under lock. + with self._lock: + if self.resources["_cores"] == 0: + return [] + remaining_snapshot = tuple(self.remaining_jobs) + resources_snapshot = dict(self.resources) + + # Run selection outside of the lock to avoid blocking callbacks. + if len(jobs) == 1: + selected = self.job_selector_greedy( + jobs, remaining_snapshot, resources_snapshot, self._input_sizes + ) + else: + selected = job_selector( + jobs, remaining_snapshot, resources_snapshot, self._input_sizes + ) + if selected is None: + selected = self.job_selector_greedy( + jobs, remaining_snapshot, resources_snapshot, self._input_sizes + ) + + # Apply resource updates under the lock. + with self._lock: + self.update_available_resources(selected) + return selected
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
src/snakemake/dag.py(8 hunks)src/snakemake/scheduling/job_scheduler.py(9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit Configuration File
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/scheduling/job_scheduler.pysrc/snakemake/dag.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: johanneskoester
PR: snakemake/snakemake#3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.
📚 Learning: 2025-07-29T14:53:04.598Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.
Applied to files:
src/snakemake/scheduling/job_scheduler.pysrc/snakemake/dag.py
📚 Learning: 2024-10-04T16:12:18.927Z
Learnt from: lczech
PR: snakemake/snakemake#3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In `snakemake/scheduler.py`, avoid suggesting the use of `asyncio.gather` in the `jobs_rewards` method due to overhead concerns and the need for immediate results.
Applied to files:
src/snakemake/scheduling/job_scheduler.py
🧬 Code Graph Analysis (2)
src/snakemake/scheduling/job_scheduler.py (7)
src/snakemake/io/__init__.py (6)
size(172-173)size(623-633)size(1921-1922)update(1803-1809)items(1828-1830)get(1873-1878)src/snakemake/workflow.py (7)
scheduler(476-477)scheduler(480-481)dag(504-505)input(2042-2047)run(2301-2302)resources(2210-2215)name(2294-2299)src/snakemake/jobs.py (13)
jobs(1362-1363)jobs(1366-1367)AbstractJob(85-119)dag(1387-1388)input(337-338)input(341-342)input(1533-1538)resources(489-510)resources(1512-1526)name(1277-1278)name(1663-1667)scheduler_resources(513-514)scheduler_resources(1529-1530)src/snakemake/common/__init__.py (1)
async_run(91-108)src/snakemake/scheduling/greedy.py (2)
SchedulerSettings(10-34)select_jobs(42-113)src/snakemake/scheduling/milp.py (2)
SchedulerSettings(22-33)select_jobs(37-174)src/snakemake/settings/enums.py (1)
Quietness(24-29)
src/snakemake/dag.py (2)
src/snakemake/jobs.py (5)
dag(1387-1388)Job(183-1305)jobs(1362-1363)jobs(1366-1367)AbstractJob(85-119)src/snakemake/workflow.py (1)
dag(504-505)
🔇 Additional comments (9)
src/snakemake/dag.py (6)
34-34: Scheduler interface import looks correct.Importing DAGSchedulerInterface is aligned with the plugin-based architecture migration. No issues spotted here.
105-105: Adding DAGSchedulerInterface to the DAG MRO is appropriate.This cleanly exposes the DAG to scheduler plugins without disturbing executor/report interfaces. Good move.
206-206: Exposing dependencies as Mapping is a good public contract.Returning a read-only type at the interface boundary is sensible. Given performance considerations and existing usage, keeping it as a direct reference (without wrapping) is acceptable here.
609-615: Return type annotation for needrun_jobs is appropriate.The function returns iterables in both branches and the annotation matches usage in schedulers. LGTM.
650-652: Return type annotation for finished() is correct.The added boolean annotation aligns with scheduler expectations. No issues.
2200-2209: register_running signature tightening is fine and matches usage.Accepting AnySet[AbstractJob] works with set difference and existing call sites. No functional concerns here.
src/snakemake/scheduling/job_scheduler.py (3)
151-156: Greedy scheduler plugin bootstrap looks solid.Instantiating the built-in greedy plugin via the registry and wiring it with logger and settings is straightforward and aligns with the new plugin architecture.
158-160: Selector wiring is clear and forwards-compatible.Exposing both the user-provided scheduler’s selector and a greedy fallback is a good design for resilience.
326-327: Good: register_running now takes a set.This aligns the DAG update with the batch submission semantics introduced here.
| def validate_job(self, job): | ||
| if job in self._validated_jobs: | ||
| return | ||
| self._validated_jobs.add(job) | ||
|
|
||
| for name, available in self.global_resources.items(): | ||
| if isinstance(available, str): | ||
| continue | ||
| value = job.scheduler_resources.get(name, 0) | ||
| if value > available: | ||
| if name == "_cores": | ||
| name = "threads" | ||
| raise WorkflowError( | ||
| f"Job needs {name}={value} but only {name}={available} " | ||
| "are available. This is likely because two " | ||
| "jobs are connected via a pipe or a service output and have to run " | ||
| "simultaneously. Consider providing more " | ||
| "resources (e.g. via --cores)." | ||
| ) | ||
|
|
There was a problem hiding this comment.
Type-safety bug: comparing str-valued resources to ints can crash.
job.scheduler_resources may contain string-valued entries (Union[int, str]). Comparing a str value to an int available raises TypeError. Skip non-numeric per-job resource values during validation.
Apply this diff:
def validate_job(self, job):
if job in self._validated_jobs:
return
self._validated_jobs.add(job)
for name, available in self.global_resources.items():
if isinstance(available, str):
continue
- value = job.scheduler_resources.get(name, 0)
+ value = job.scheduler_resources.get(name, 0)
+ if isinstance(value, str):
+ # Non-numeric resources are not enforced by the scheduler capacity check.
+ continue
if value > available:
if name == "_cores":
name = "threads"
raise WorkflowError(
f"Job needs {name}={value} but only {name}={available} "
"are available. This is likely because two "
"jobs are connected via a pipe or a service output and have to run "
"simultaneously. Consider providing more "
"resources (e.g. via --cores)."
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def validate_job(self, job): | |
| if job in self._validated_jobs: | |
| return | |
| self._validated_jobs.add(job) | |
| for name, available in self.global_resources.items(): | |
| if isinstance(available, str): | |
| continue | |
| value = job.scheduler_resources.get(name, 0) | |
| if value > available: | |
| if name == "_cores": | |
| name = "threads" | |
| raise WorkflowError( | |
| f"Job needs {name}={value} but only {name}={available} " | |
| "are available. This is likely because two " | |
| "jobs are connected via a pipe or a service output and have to run " | |
| "simultaneously. Consider providing more " | |
| "resources (e.g. via --cores)." | |
| ) | |
| def validate_job(self, job): | |
| if job in self._validated_jobs: | |
| return | |
| self._validated_jobs.add(job) | |
| for name, available in self.global_resources.items(): | |
| if isinstance(available, str): | |
| continue | |
| value = job.scheduler_resources.get(name, 0) | |
| if isinstance(value, str): | |
| # Non-numeric resources are not enforced by the scheduler capacity check. | |
| continue | |
| if value > available: | |
| if name == "_cores": | |
| name = "threads" | |
| raise WorkflowError( | |
| f"Job needs {name}={value} but only {name}={available} " | |
| "are available. This is likely because two " | |
| "jobs are connected via a pipe or a service output and have to run " | |
| "simultaneously. Consider providing more " | |
| "resources (e.g. via --cores)." | |
| ) |
🤖 Prompt for AI Agents
In src/snakemake/scheduling/job_scheduler.py around lines 593 to 612, the
validation compares job.scheduler_resources values to integer global resources
but job.scheduler_resources can contain strings; add a guard to skip non-numeric
per-job resource values before the comparison (e.g., if isinstance(value, str):
continue) so only numeric values are compared and the TypeError is avoided.
There was a problem hiding this comment.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
src/snakemake/scheduling/job_scheduler.py (1)
585-592: Only adjust numeric resources in update_available_resources.If any global resource is string-valued, subtracting integers from it will raise TypeError. Restrict accounting to numeric capacities.
Apply this diff:
- for name in self.global_resources: - # _job_count is updated per JobRateLimiter before scheduling - if name != "_job_count": - self.resources[name] -= sum( - [job.scheduler_resources.get(name, 0) for job in selected_jobs] - ) + for name, cap in self.global_resources.items(): + # _job_count is updated per JobRateLimiter before scheduling + if name == "_job_count" or isinstance(cap, str): + continue + dec = 0 + for job in selected_jobs: + v = job.scheduler_resources.get(name, 0) + if isinstance(v, int): + dec += v + self.resources[name] -= dec
♻️ Duplicate comments (1)
src/snakemake/scheduling/job_scheduler.py (1)
593-621: Fix numeric validation: current assert and str handling will misfire.
- The
assert isinstance(available, int)fires before the subsequentisinstance(available, str)check, making the str-guard unreachable.- Raising on str-valued per-job resources is too strict and contradicts previous guidance; skip non-numeric per-job resource values during validation.
Apply this diff:
- for name, available in self.global_resources.items(): - assert isinstance(available, int), ( - f"Global resource {name}={available} is not an integer. " - "This is likely a bug in Snakemake." - ) - if isinstance(available, str): - continue - value = job.scheduler_resources.get(name, 0) - if isinstance(value, str): - raise WorkflowError( - f"Resource {name}={value} of rule {job.rule.name} is not an " - "integer but global resource is defined as integer." - ) + for name, available in self.global_resources.items(): + # Only enforce numeric capacity constraints + if isinstance(available, str): + continue + value = job.scheduler_resources.get(name, 0) + if isinstance(value, str): + # Non-numeric per-job resources are not enforced by the scheduler capacity check. + continue if value > available: if name == "_cores": name = "threads" raise WorkflowError( f"Job needs {name}={value} but only {name}={available} " "are available. This is likely because two " "jobs are connected via a pipe or a service output and have to run " "simultaneously. Consider providing more " "resources (e.g. via --cores)." )
🧹 Nitpick comments (4)
src/snakemake/scheduling/job_scheduler.py (4)
151-155: Greedy plugin instantiation: OK, but consider fallback messaging.Loading the built-in "greedy" plugin directly is fine. If the plugin is missing/mis-registered, a clearer error would help debugging (optional).
197-210: Avoid double-await and minor robustness in input-size fetch.
get_size()callsawait path.exists()twice. Cache the result to cut I/O and event-loop churn.Apply this diff:
- async def get_size(path: _IOFile): - return path, await path.size() if await path.exists() else None + async def get_size(path: _IOFile): + exists = await path.exists() + if not exists: + return path, None + return path, await path.size()
544-566: Shrink the critical section around selector (optional).
run_selector()holds_lockwhile invoking the plugin selector. MILP/greedy selection can be slow, which will block callbacks (_proceed/error_callback) that also acquire_lock, increasing latency for error handling and completions.Optional refactor: snapshot state under lock (resources, running, remaining), release the lock to run the selector, then re-acquire to validate that selected jobs are still runnable and to update resources. This reduces contention without sacrificing correctness.
Would you like a concrete refactor sketch that keeps resource consistency with a two-phase snapshot/commit?
569-579: Consistency: return an empty list instead of an empty set when rate-limited.Elsewhere the selector returns a sequence; returning
set()here is type-inconsistent (though current call sites tolerate it). Prefer[]for consistency.Apply this diff:
- logger.info("Job rate limit reached, waiting for free slots.") - return set() + logger.info("Job rate limit reached, waiting for free slots.") + return []
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
src/snakemake/scheduling/job_scheduler.py(9 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
**/*.py
⚙️ CodeRabbit Configuration File
**/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of theselfargument of methods.
Do not suggest type annotation of theclsargument of classmethods.
Do not suggest return type annotation if a function or method does not contain areturnstatement.
Files:
src/snakemake/scheduling/job_scheduler.py
🧠 Learnings (3)
📓 Common learnings
Learnt from: johanneskoester
PR: snakemake/snakemake#3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.
📚 Learning: 2025-07-29T14:53:04.598Z
Learnt from: johanneskoester
PR: snakemake/snakemake#3676
File: src/snakemake/cli.py:0-0
Timestamp: 2025-07-29T14:53:04.598Z
Learning: In the Snakemake repository, deprecated scheduler interfaces (like scheduler_ilp_solver, --scheduler-solver-path, --scheduler-greediness) should be kept for backward compatibility during the migration to the plugin-based scheduler system, rather than being removed immediately.
Applied to files:
src/snakemake/scheduling/job_scheduler.py
📚 Learning: 2024-10-04T16:12:18.927Z
Learnt from: lczech
PR: snakemake/snakemake#3113
File: snakemake/scheduler.py:912-914
Timestamp: 2024-10-04T16:12:18.927Z
Learning: In `snakemake/scheduler.py`, avoid suggesting the use of `asyncio.gather` in the `jobs_rewards` method due to overhead concerns and the need for immediate results.
Applied to files:
src/snakemake/scheduling/job_scheduler.py
🧬 Code Graph Analysis (1)
src/snakemake/scheduling/job_scheduler.py (4)
src/snakemake/io/__init__.py (8)
_IOFile(259-926)size(172-173)size(623-633)size(1921-1922)exists(491-495)update(1803-1809)items(1828-1830)get(1873-1878)src/snakemake/jobs.py (12)
jobs(1362-1363)jobs(1366-1367)dag(1387-1388)input(337-338)input(341-342)input(1533-1538)resources(489-510)resources(1512-1526)name(1277-1278)name(1663-1667)scheduler_resources(513-514)scheduler_resources(1529-1530)src/snakemake/common/__init__.py (1)
async_run(91-108)src/snakemake/scheduling/greedy.py (2)
SchedulerSettings(10-34)select_jobs(42-113)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (55)
- GitHub Check: tests (10, ubuntu-latest, py313)
- GitHub Check: tests (10, windows-latest, py313)
- GitHub Check: tests (9, ubuntu-latest, py311)
- GitHub Check: tests (10, ubuntu-latest, py311)
- GitHub Check: tests (10, windows-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py312)
- GitHub Check: tests (9, ubuntu-latest, py313)
- GitHub Check: tests (8, ubuntu-latest, py312)
- GitHub Check: tests (10, ubuntu-latest, py312)
- GitHub Check: tests (9, windows-latest, py313)
- GitHub Check: tests (8, windows-latest, py312)
- GitHub Check: tests (8, windows-latest, py313)
- GitHub Check: tests (8, ubuntu-latest, py311)
- GitHub Check: tests (9, windows-latest, py312)
- GitHub Check: tests (8, ubuntu-latest, py313)
- GitHub Check: tests (7, ubuntu-latest, py311)
- GitHub Check: tests (7, ubuntu-latest, py313)
- GitHub Check: tests (7, ubuntu-latest, py312)
- GitHub Check: tests (7, windows-latest, py312)
- GitHub Check: tests (7, windows-latest, py313)
- GitHub Check: tests (5, ubuntu-latest, py313)
- GitHub Check: tests (6, windows-latest, py313)
- GitHub Check: tests (6, ubuntu-latest, py312)
- GitHub Check: tests (6, ubuntu-latest, py313)
- GitHub Check: tests (4, ubuntu-latest, py311)
- GitHub Check: tests (6, ubuntu-latest, py311)
- GitHub Check: tests (5, ubuntu-latest, py312)
- GitHub Check: tests (5, windows-latest, py313)
- GitHub Check: tests (5, macos-latest, py312)
- GitHub Check: tests (5, windows-latest, py312)
- GitHub Check: tests (4, windows-latest, py312)
- GitHub Check: tests (5, ubuntu-latest, py311)
- GitHub Check: tests (6, windows-latest, py312)
- GitHub Check: tests (4, windows-latest, py313)
- GitHub Check: tests (4, ubuntu-latest, py312)
- GitHub Check: tests (4, ubuntu-latest, py313)
- GitHub Check: tests (3, windows-latest, py313)
- GitHub Check: tests (2, macos-latest, py313)
- GitHub Check: tests (3, windows-latest, py312)
- GitHub Check: tests (3, ubuntu-latest, py311)
- GitHub Check: tests (3, macos-latest, py312)
- GitHub Check: tests (3, ubuntu-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py313)
- GitHub Check: tests (3, ubuntu-latest, py313)
- GitHub Check: tests (2, windows-latest, py313)
- GitHub Check: tests (2, macos-latest, py312)
- GitHub Check: tests (1, windows-latest, py313)
- GitHub Check: tests (2, windows-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py311)
- GitHub Check: tests (1, ubuntu-latest, py312)
- GitHub Check: tests (2, ubuntu-latest, py312)
- GitHub Check: tests (1, ubuntu-latest, py311)
- GitHub Check: tests (1, ubuntu-latest, py313)
- GitHub Check: tests (1, macos-latest, py312)
- GitHub Check: tests (1, windows-latest, py312)
🔇 Additional comments (7)
src/snakemake/scheduling/job_scheduler.py (7)
6-16: LGTM: imports tidy-up and new dependencies are appropriate.Unused imports removed and new ones (asyncio, filterfalse, repeat, typing, scheduler/greedy settings) are justified by the new async input-size prefetch and plugin wiring.
Also applies to: 23-34
62-68: Constructor DI for scheduler and settings is a good move.Clear separation of concerns by injecting the scheduler and greedy settings. This makes the scheduler pluggable and testable.
83-83: Per-job validation cache is sensible.Memoizing validated jobs avoids quadratic behavior in repeated scheduling rounds.
124-124: Input-size cache added.The cache enables reusing input file sizes across rounds. See suggestion below to avoid duplicate awaits while populating it.
158-160: Selector wiring looks correct.Delegating to the injected scheduler with a greedy fallback is a clean abstraction.
327-327: Passing a set to register_running aligns with batch semantics.Good change; it avoids per-item overhead when registering multiple jobs.
485-489: Guard numeric types in_free_resourcesto avoid TypeError on string capacitiesOnly adjust integer resources when freeing them:
--- a/src/snakemake/scheduling/job_scheduler.py +++ b/src/snakemake/scheduling/job_scheduler.py @@ -485,4 +485,8 @@ def _free_resources(self, job): - for name, value in job.scheduler_resources.items(): - if name in self.resources and name != "_job_count": - self.resources[name] += value + for name, value in job.scheduler_resources.items(): + if name not in self.resources or name == "_job_count": + continue + # Only adjust numeric resources + if isinstance(self.resources[name], int) and isinstance(value, int): + self.resources[name] += value– File: src/snakemake/scheduling/job_scheduler.py (lines 485–489)
Please confirm whether any global resources are ever represented as strings in your workflows; if so, this guard is required to prevent runtime errors when jobs finish.
🤖 I have created a release *beep* *boop* --- ## [9.10.0](v9.9.0...v9.10.0) (2025-08-19) ### Features * migrate to scheduler plugin interface and scheduler plugins ([#3676](#3676)) ([26fcd38](26fcd38)) ### Bug Fixes * don't rate limit jobs when touching ([#3699](#3699)) ([9c499e5](9c499e5)) * raise an error when different rules produce identical (temp) output ([#3667](#3667)) ([f627176](f627176)) * silence messages on module load ([#3688](#3688)) ([b13e9c8](b13e9c8)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
…akemake#3676) <!--Add a description of your PR here--> ### QC <!-- Make sure that you can tick the boxes below. --> * [x] The PR contains a test case for the changes or the changes are already covered by an existing test case. * [x] The documentation (`docs/`) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake). <!-- This is an auto-generated comment: release notes by coderabbit.ai --> ## Summary by CodeRabbit * **New Features** * Plugin-based scheduler system with selectable scheduler plugins; new greedy and MILP schedulers and settings exposed. * Exposed total temporary-file size reporting for inputs. * **Improvements** * CLI now picks schedulers from available plugins; legacy solver options removed/deprecated and plugin settings integrated. * Asynchronous input-size prefetching and stricter per-job validation for scheduling; scheduler settings passed through execution API. * **Bug Fixes** * Fixed Python version error message formatting. * **Refactor** * Scheduling flow reworked to use external scheduler plugins with a greedy fallback. * **Chores** * Updated help text and deprecation notices. <!-- end of auto-generated comment: release notes by coderabbit.ai -->
🤖 I have created a release *beep* *boop* --- ## [9.10.0](snakemake/snakemake@v9.9.0...v9.10.0) (2025-08-19) ### Features * migrate to scheduler plugin interface and scheduler plugins ([snakemake#3676](snakemake#3676)) ([26fcd38](snakemake@26fcd38)) ### Bug Fixes * don't rate limit jobs when touching ([snakemake#3699](snakemake#3699)) ([9c499e5](snakemake@9c499e5)) * raise an error when different rules produce identical (temp) output ([snakemake#3667](snakemake#3667)) ([f627176](snakemake@f627176)) * silence messages on module load ([snakemake#3688](snakemake#3688)) ([b13e9c8](snakemake@b13e9c8)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please).
QC
docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).Summary by CodeRabbit
New Features
Improvements
Bug Fixes
Refactor
Chores