Skip to content

Plan for high level graph optimizations #5644

@mrocklin

Description

@mrocklin

Now that we have high level graphs in our collections, we can do some more complex optimizations. We've done a little bit of this with blockwise fusion, read_parquet and column selection, and root fusion, all of which have had really positive effects.

There is likely more that we can do here. Some thoughts ...

  • Fuse column selection with other kinds of data access, like read_csv or ORC
  • Pass slicing through some blockwise operations in Dask array (a long-held request by @shoyer)
  • Optimize the subgraph callables in blockwise with Numba (or something else) for dask array to avoid memory copies and maybe reduce serialization time
  • Swap around Joins and filters and column access
  • ...

It would be nice to have a current maintainer consider these options, guess how long they would take, and see what makes sense to do short term

Metadata

Metadata

Assignees

No one assigned

    Labels

    highlevelgraphIssues relating to HighLevelGraphs.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions