Coercing a Pandas Dataframe from a Dask Dataframe

Is it possible to coerce a Pandas dataframe from an existing dask dataframe? e.g. 

ddf.to_pandasdataframe(df, etc..)

I'm dumping to a csv and reading it back in later now, and that's slow and silly. 

The use case here is for a single node, many core machine, with data that fits in memory, and, a CPU-intensive process that is embarrassingly parallel -- so using ddf.groupby(ddf.index).apply(func) to speed up the work. This turns out to be an order of magnitude faster than multiprocessing, btw. The result of the groupby.apply is a dask dataframe, but I need to do work on it using a variety of pandas functions not currently available in dask.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Coercing a Pandas Dataframe from a Dask Dataframe #1122

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Coercing a Pandas Dataframe from a Dask Dataframe #1122

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions