Skip to content

Allow custom sort functions for sort_values#8345

Merged
jsignell merged 6 commits intodask:mainfrom
charlesbluca:custom-sort-function
Nov 29, 2021
Merged

Allow custom sort functions for sort_values#8345
jsignell merged 6 commits intodask:mainfrom
charlesbluca:custom-sort-function

Conversation

@charlesbluca
Copy link
Copy Markdown
Member

This PR allows the sorting function called on each partition in last step of sort_values to be generalized, along with the kwargs that are supplied to it. This allows sort_values to be extended to support multi-column sorting and more complex ascending / null position handling.

The context for this PR is a desire to simplify the sorting algorithm used by dask-sql; since it only really differs from Dask's sorting algorithm in that it uses a custom sorting function, it seems like it would be easier to allow for that extension upstream rather than duplicate Dask code in dask-sql.

  • Closes #xxxx
  • Tests added / passed
  • Passes pre-commit run --all-files

Copy link
Copy Markdown
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add some tests for this change?

@ian-r-rose
Copy link
Copy Markdown
Collaborator

@charlesbluca just pinging to make sure you saw @jsignell 's review. Anything else other than that which needs to be done here?

@charlesbluca
Copy link
Copy Markdown
Member Author

Thanks for the ping @ian-r-rose - missed the follow up review, I can address that now - other than that, I don't have anything else that needs to be added

implementation of `sort_values`).
sort_function_kwargs: dict, optional
Additional keyword arguments to pass to the partition sorting function.
By default, `by`, `ascending`, and `na_position` are provided.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, but for rst we need double backticks on all these. Otherwise it's just italicshttps://dask--8345.org.readthedocs.build/en/8345/generated/dask.dataframe.DataFrame.sort_values.html

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Always forgetful of RST code formatting 😅

Copy link
Copy Markdown
Member

@jsignell jsignell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one last comment and this is good to go :)

@jsignell jsignell merged commit 73c52ec into dask:main Nov 29, 2021
rapids-bot bot pushed a commit to rapidsai/cudf that referenced this pull request Jan 14, 2022
Similar to dask/dask#8345, this PR allows the sorting function called on each partition in last step of dask-cudf's `sort_values` to be generalized, along with the kwargs that are supplied to it. This allows `sort_values` to be extended to support more complex ascending / null position handling.

The context for this PR is a desire to simplify the [sorting algorithm](https://github.com/dask-contrib/dask-sql/blob/main/dask_sql/physical/utils/sort.py) used by dask-sql; since it only really differs from dask-cudf's sorting algorithm in that it uses a custom sorting function, it seems like it would be easier to allow for that extension upstream rather than duplicate code in dask-sql.

Authors:
  - Charles Blackmon-Luca (https://github.com/charlesbluca)

Approvers:
  - GALI PREM SAGAR (https://github.com/galipremsagar)
  - Richard (Rick) Zamora (https://github.com/rjzamora)

URL: #9789
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants