feat: switch from toposort to graphlib by fgvieira · Pull Request #3109 · snakemake/snakemake

fgvieira · 2024-10-03T07:56:50Z

Use graphlib instead of toposort for topological sorting (following #2134).

QC

The PR contains a test case for the changes or the changes are already covered by an existing test case.
The documentation (docs/) is updated to reflect the changes or this is not necessary (e.g. if the change does neither modify the language nor the behavior or functionalities of Snakemake).

Summary by CodeRabbit

New Features
- Enhanced topological sorting for job execution in workflows, improving efficiency and reliability.
- Introduced new tests to validate the functionality of the topological sorting feature.
Bug Fixes
- Removed the toposort dependency to streamline package requirements.
Chores
- Updated dependencies in the test environment for better compatibility and performance.
- Adjusted formatting and organization of the dependencies in configuration files for clarity.

coderabbitai · 2024-10-03T07:56:59Z

Warning

Rate limit exceeded

@fgvieira has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 9 minutes and 57 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

📥 Commits

Files that changed from the base of the PR and between 66e2510 and ac8b9c9.

Walkthrough

The changes in this pull request involve updates to the setup.cfg, snakemake/dag.py, and test-environment.yml files. The setup.cfg file has had the toposort dependency removed and formatting adjustments made to the install_requires section. In snakemake/dag.py, a new toposort function has been introduced, replacing the previous external library and refactoring the toposorted method to utilize Python's built-in graphlib.TopologicalSorter. The test-environment.yml file has seen several dependency modifications, including the removal of toposort and the addition of new dependencies.

Changes

File	Change Summary
setup.cfg	- Removed `toposort >=1.10,<2.0` from `install_requires` - Adjusted formatting of `snakemake-interface-report-plugins` dependency.
snakemake/dag.py	- Added `toposort(graph)` function. - Updated `toposorted(self, jobs=None, inherit_pipe_dependencies=False)` to use the new `toposort` function.
test-environment.yml	- Added dependencies: `pulp >=2.3.1`, `crc32c`, `requests-mock`, `peppy`, `snakemake-interface-executor-plugins >=9.1.0`, `snakemake-executor-plugin-cluster-generic >=1.0.9`, `snakemake-storage-plugin-http`, `snakemake-storage-plugin-s3`, `snakemake-storage-plugin-fs >=1.0.3` - Removed `toposort >=1.10` - Updated `paramiko` to `>=3.4.1`.
tests/test_toposort/Snakefile	- Introduced a new test for the `toposort` function validating job order in a directed acyclic graph (DAG).
tests/tests.py	- Added `test_toposort()` function to validate the topological sorting functionality.

Possibly related PRs

docs: update doc-environment.yml file and Documentation Setup documentation #3058: The removal of the toposort dependency in both the main PR and the retrieved PR indicates a direct relationship, as both changes involve modifications to dependencies related to the Snakemake project.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 2

🧹 Outside diff range and nitpick comments (1)

snakemake/dag.py (1)
2750-2757: Avoid shadowing built-in function sorted by renaming the variable

The variable sorted in your graphlib_toposort function shadows Python's built-in sorted() function, which can lead to confusion and potential errors. It's best practice to avoid using names of built-in functions for variable names.

Consider renaming the variable sorted to something like sorted_layers or topo_order.

Apply this diff:
-    sorted = list()
+    topo_order = list()
...
-        sorted.append(ready)
+        topo_order.append(ready)
...
-    return sorted
+    return topo_order

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between 19a2ab7 and b22e89d.

📒 Files selected for processing (3)

setup.cfg (0 hunks)
snakemake/dag.py (2 hunks)
test-environment.yml (0 hunks)

💤 Files with no reviewable changes (2)

setup.cfg
test-environment.yml

🧰 Additional context used

📓 Path-based instructions (1)

snakemake/dag.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

snakemake/dag.py

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

maarten-k · 2024-10-03T11:07:13Z

Can you make the graphlib_toposort outside of the function(I'm not too fond of inlining) and add a small test to ensure the right format is given?

coderabbitai

Actionable comments posted: 1

🧹 Outside diff range and nitpick comments (3)

tests/tests.py (1)
2270-2271: LGTM! Consider adding a brief docstring.

The new test function test_toposort() is well-structured and consistent with other tests in the file. It correctly uses the run() function with appropriate parameters for testing the topological sort functionality.

Consider adding a brief docstring to explain the purpose of this test, for example:
def test_toposort():
    """Test the topological sort functionality in dryrun mode."""
    run(dpath("test_toposort"), check_results=False, executor="dryrun")
snakemake/dag.py (2)
80-93: LGTM! Consider adding a type hint for clarity.

The toposort function is well-implemented and efficiently uses graphlib.TopologicalSorter. It correctly groups tasks that can be executed in parallel.

Consider adding a type hint for the graph parameter to improve clarity:
def toposort(graph: Dict[Any, Set[Any]]):
Line range hint 2793-2818: LGTM! Consider extracting the pipe job sorting logic for improved readability.

The updated toposorted method correctly uses the new toposort function and handles pipe dependencies well. The implementation for sorting pipe jobs within each layer is correct but complex.

To improve readability, consider extracting the pipe job sorting logic into a separate method:
def sort_pipe_jobs(group):
    return chain.from_iterable(
        self.toposort(
            {
                job: {dep for dep in self._dependencies[job] if dep in group}
                for job in group
            }
        )
    )

# Then in the main method:
sorted_layer.extend(sort_pipe_jobs(group))
This refactoring would make the main method easier to read and understand.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Files that changed from the base of the PR and between b22e89d and 66e2510.

📒 Files selected for processing (3)

snakemake/dag.py (3 hunks)
tests/test_toposort/Snakefile (1 hunks)
tests/tests.py (1 hunks)

🧰 Additional context used

📓 Path-based instructions (2)

snakemake/dag.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

tests/tests.py (1)

Pattern **/*.py: Do not try to improve formatting.
Do not suggest type annotations for functions that are defined inside of functions or methods.
Do not suggest type annotation of the self argument of methods.
Do not suggest type annotation of the cls argument of classmethods.
Do not suggest return type annotation if a function or method does not contain a return statement.

🔇 Additional comments (2)

tests/test_toposort/Snakefile (1)

1-3: LGTM: Import and graph definition look good.

The import statement and graph definition are clear and concise. The graph represents a simple but effective test case for the toposort function.

snakemake/dag.py (1)

Line range hint 1-2818: Overall, the changes look good and improve the DAG functionality.

The implementation of the new toposort function and the updates to the toposorted method are well-done. These changes enhance the topological sorting functionality and correctly handle pipe dependencies and parallel execution. The code is generally clean and efficient, with only minor suggestions for improvement in terms of type hinting and code organization.

tests/test_toposort/Snakefile

sonarqubecloud · 2024-10-11T12:31:06Z

Quality Gate passed

Issues
1 New issue
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarCloud

🤖 I have created a release *beep* *boop* --- ## [8.22.0](v8.21.0...v8.22.0) (2024-10-13) ### Features * switch from toposort to graphlib ([#3109](#3109)) ([91e875d](91e875d)) ### Bug Fixes * configfile `group` and `group-components` were not being registered ([#3135](#3135)) ([4397c7d](4397c7d)) * remove paramiko dependency as issue has been fixed ([#3110](#3110)) ([1b43250](1b43250)) --- This PR was generated with [Release Please](https://github.com/googleapis/release-please). See [documentation](https://github.com/googleapis/release-please#release-please). Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

Switch from toposort to graphlib

7b9cf95

fgvieira requested a review from johanneskoester as a code owner October 3, 2024 07:56

Code format

b22e89d

coderabbitai bot reviewed Oct 3, 2024

View reviewed changes

snakemake/dag.py Outdated Show resolved Hide resolved

snakemake/dag.py Outdated Show resolved Hide resolved

fgvieira and others added 2 commits October 3, 2024 10:08

Switch update to add

2e59794

Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>

Fix typo

fcdb098

fgvieira added 2 commits October 3, 2024 14:28

Add test

b682dc4

Fix typo

66e2510

coderabbitai bot reviewed Oct 3, 2024

View reviewed changes

tests/test_toposort/Snakefile Outdated Show resolved Hide resolved

fgvieira added 4 commits October 3, 2024 14:36

Fix typo

3529d21

Add more tests

c4ff8dc

Merge branch 'main' into toposort_to_graphlib

ac8b9c9

Merge branch 'main' into toposort_to_graphlib

3157b24

johanneskoester approved these changes Oct 13, 2024

View reviewed changes

johanneskoester merged commit 91e875d into snakemake:main Oct 13, 2024

github-actions bot mentioned this pull request Oct 13, 2024

chore(main): release 8.22.0 #3139

Merged

fgvieira deleted the toposort_to_graphlib branch October 14, 2024 06:19

This was referenced Oct 15, 2024

feat: add two new schedulers #3113

Closed

feat: subsample jobs to speed-up scheduler #3112

Merged

coderabbitai bot mentioned this pull request Nov 4, 2024

refactor: Remove datrie dependency #3176

Merged

2 tasks

coderabbitai bot mentioned this pull request Mar 11, 2025

fix: in remote jobs, upload storage in topological order such that modification dates are preserved (e.g. in case of group jobs) #3377

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: switch from toposort to graphlib#3109

feat: switch from toposort to graphlib#3109
johanneskoester merged 10 commits intosnakemake:mainfrom
fgvieira:toposort_to_graphlib

fgvieira commented Oct 3, 2024 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Oct 3, 2024 •

edited

Loading

Rate limit exceeded

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (`.coderabbit.yaml`)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

maarten-k commented Oct 3, 2024

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

fgvieira commented Oct 3, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

QC

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Oct 3, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Possibly related PRs

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

maarten-k commented Oct 3, 2024

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sonarqubecloud bot commented Oct 11, 2024

Quality Gate passed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

fgvieira commented Oct 3, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 3, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)