Enable backend dispatching for Dask-DataFrame creation#11920
Enable backend dispatching for Dask-DataFrame creation#11920rapids-bot[bot] merged 27 commits intorapidsai:branch-22.12from
Conversation
Codecov ReportBase: 87.40% // Head: 88.13% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## branch-22.12 #11920 +/- ##
================================================
+ Coverage 87.40% 88.13% +0.72%
================================================
Files 133 133
Lines 21833 21987 +154
================================================
+ Hits 19084 19379 +295
+ Misses 2749 2608 -141
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
galipremsagar
left a comment
There was a problem hiding this comment.
Overall looks good Rick, few comments before this is ready to merge.
| def from_dict(data, npartitions, orient="columns", **kwargs): | ||
| if orient != "columns": | ||
| raise ValueError(f"orient={orient} is not supported") | ||
| return dd.from_pandas( |
There was a problem hiding this comment.
Is from_pandas required here because we don't support cudf.DataFrame.from_dict API yet? If so, can we add a todo here to change this after #11934 is resolved?
There was a problem hiding this comment.
Is from_pandas required here because we don't support cudf.DataFrame.from_dict API yet?
Yes and no - We should certainly use cudf.from_dict when it is supported (I'll add a TODO). However, I'll change the dd.from_pandas code to dask_cudf.from_cudf for clarity (from_cudf is just a cudf-friendly alias for from_pandas).
galipremsagar
left a comment
There was a problem hiding this comment.
Thanks @rjzamora !
wence-
left a comment
There was a problem hiding this comment.
Does the entrypoint need to be added to setup.py/cfg or similar?
Correct - The entrypoint is defined in |
Ah, somehow I missed that this had already been done back in March |
|
@gpucibot merge |
Description
This PR depends on dask/dask#9475 (Now Merged)
After dask#9475, external libraries are now able to implement (and expose) their own
DataFrameBackendEntrypointdefinitions to specify custom creation functions for DataFrame collections. This PR introduces theCudfBackendEntrypointclass to createdask_cudf.DataFramecollections using thedask.dataframeAPI. By installingdask_cudfwith this entrypoint definition in place, you get the following behavior indask.dataframe:Note that the code snippet above does not require an explicit import of
cudfordask_cudf. The following creation functions will support backend dispatching after dask#9475:from_dictread_paquetread_jsonread_orcread_csvread_hdfSee also: dask/design-docs#1
Checklist