Skip to content

Dask Persist #1344

@mrocklin

Description

@mrocklin

It would be convenient to load constituent dask.arrays into memory as dask.arrays rather than as numpy arrays. This would help with distributed computations where we want to load a large amount of data into distributed memory once and then iterate on the full xarray dataset repeatedly without reloading from disk every time.

We can probably solve this from either side:

  1. XArray could make a .persist method that replaced all of its dask.arrays with a persisted version of that array
import dask

dset.x, dset.y, dset.z = dask.persist(dset.x, dset.y, dset.z)
  1. We could look into the Dask duck type solution again Expected interface for dask objects dask/dask#1068

cc @shoyer @jcrist @rabernat @pwolfram

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions