Dask array rechunk should do intermediate sized rechunking when appropriate

Consider rechunking from chunks=(1, 100) to chunks=(100, 1) on an array of shape (100, 100).

A direct rechunk operation creates ~10,000 tasks corresponding to intermediate arrays of shape (1,1).

However, if we rechunk first to shape (10,10) (the same array size as all of our chunks) and then to our final chunk shape, there would be only ~2000 tasks, because the smallest intermediate arrays have 10 elements.

I don't know what the general rules are for how to optimize rechunkings but this makes a big difference in computational efficiency -- O(n^2) vs O(n^1.5) for square arrays with a side of length n. For n=10,000, this is a ~50x speed up (assuming that the task overhead dominates for small tasks).

Credit to @freeman-lab for this idea, which apparently works really nicely in Thunder. Jeremy, if you could share how you calculate intermediate chunks sizes that would be helpful!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dask array rechunk should do intermediate sized rechunking when appropriate #417

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Dask array rechunk should do intermediate sized rechunking when appropriate #417

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions