-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Closed
Description
Is it possible to coerce a Pandas dataframe from an existing dask dataframe? e.g.
ddf.to_pandasdataframe(df, etc..)
I'm dumping to a csv and reading it back in later now, and that's slow and silly.
The use case here is for a single node, many core machine, with data that fits in memory, and, a CPU-intensive process that is embarrassingly parallel -- so using ddf.groupby(ddf.index).apply(func) to speed up the work. This turns out to be an order of magnitude faster than multiprocessing, btw. The result of the groupby.apply is a dask dataframe, but I need to do work on it using a variety of pandas functions not currently available in dask.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels