Skip to content

Identify performance opportunities from recent benchmark #6264

@mrocklin

Description

@mrocklin

This article seems decently well done. It has a nice notebook at the end that includes simple workflows that apparently Dask Dataframe didn't perform very well on. I think that it would be a useful exercise for someone to go through it, produce performance reports for each section and present them here for analysis. I suspect that we could find some opportunities for performance optimization.

https://towardsdatascience.com/beyond-pandas-spark-dask-vaex-and-other-big-data-technologies-battling-head-to-head-a453a1f8cc13

Edit: lessons learned

Metadata

Metadata

Assignees

Labels

dataframegood second issueClearly described, educational, but less trivial than "good first issue".

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions