Skip to content

Coercing a Pandas Dataframe from a Dask Dataframe #1122

@RedPandaHat

Description

@RedPandaHat

Is it possible to coerce a Pandas dataframe from an existing dask dataframe? e.g.

ddf.to_pandasdataframe(df, etc..)

I'm dumping to a csv and reading it back in later now, and that's slow and silly.

The use case here is for a single node, many core machine, with data that fits in memory, and, a CPU-intensive process that is embarrassingly parallel -- so using ddf.groupby(ddf.index).apply(func) to speed up the work. This turns out to be an order of magnitude faster than multiprocessing, btw. The result of the groupby.apply is a dask dataframe, but I need to do work on it using a variety of pandas functions not currently available in dask.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions