Allow custom sort functions for sort_values#8345
Conversation
jsignell
left a comment
There was a problem hiding this comment.
Can you add some tests for this change?
|
@charlesbluca just pinging to make sure you saw @jsignell 's review. Anything else other than that which needs to be done here? |
|
Thanks for the ping @ian-r-rose - missed the follow up review, I can address that now - other than that, I don't have anything else that needs to be added |
dask/dataframe/core.py
Outdated
| implementation of `sort_values`). | ||
| sort_function_kwargs: dict, optional | ||
| Additional keyword arguments to pass to the partition sorting function. | ||
| By default, `by`, `ascending`, and `na_position` are provided. |
There was a problem hiding this comment.
nit, but for rst we need double backticks on all these. Otherwise it's just italicshttps://dask--8345.org.readthedocs.build/en/8345/generated/dask.dataframe.DataFrame.sort_values.html
There was a problem hiding this comment.
Thanks! Always forgetful of RST code formatting 😅
jsignell
left a comment
There was a problem hiding this comment.
Just one last comment and this is good to go :)
Similar to dask/dask#8345, this PR allows the sorting function called on each partition in last step of dask-cudf's `sort_values` to be generalized, along with the kwargs that are supplied to it. This allows `sort_values` to be extended to support more complex ascending / null position handling. The context for this PR is a desire to simplify the [sorting algorithm](https://github.com/dask-contrib/dask-sql/blob/main/dask_sql/physical/utils/sort.py) used by dask-sql; since it only really differs from dask-cudf's sorting algorithm in that it uses a custom sorting function, it seems like it would be easier to allow for that extension upstream rather than duplicate code in dask-sql. Authors: - Charles Blackmon-Luca (https://github.com/charlesbluca) Approvers: - GALI PREM SAGAR (https://github.com/galipremsagar) - Richard (Rick) Zamora (https://github.com/rjzamora) URL: #9789
This PR allows the sorting function called on each partition in last step of
sort_valuesto be generalized, along with the kwargs that are supplied to it. This allowssort_valuesto be extended to support multi-column sorting and more complex ascending / null position handling.The context for this PR is a desire to simplify the sorting algorithm used by dask-sql; since it only really differs from Dask's sorting algorithm in that it uses a custom sorting function, it seems like it would be easier to allow for that extension upstream rather than duplicate Dask code in dask-sql.
pre-commit run --all-files