Cache result of Array.shape by bmerry · Pull Request #5916 · dask/dask

bmerry · 2020-02-18T12:48:37Z

One of the performance problems identified in #5913 was because
Array.shape gets called many times, and each time it is computed by
summing the chunk sizes. By using cached_cumsum, the calculation
should be fast after the first time. While a cumsum is overkill for
computing shape, it's likely that the cached result will be used again
at some point.

An alternative would be just to store the shape in the object (e.g.
computed during the destructor). That would need all the bits of
internal code that modify _chunks directly to be modified to update
the shape as well.

Also change cached_cumsum to return a tuple instead of a numpy array,
and also use toolz.accumulate rather than np.cumsum. I originally
needed this because shape was getting the wrong type (tuple of numpy
scalars instead of Python scalars), but it also makes the example in
#5913 (comment) about
10% faster. While it's a little slower to compute the first time, the
result of cached_cumsum is typically used for iteration or indexing in
Python land, rather than with vectorised numpy operations, and those are
a lot faster on a tuple. Similarly the bisect module is much faster than
np.searchsorted because the latter is dominated by C API overhead.

The one downside of using a tuple instead of a numpy array is that
slicing it is O(n) instead of O(1). That means that the
initial_zero=False answer is now a separate cache entry instead of the
initial_zero=True answer with the first element sliced off. There are
also a few bits of code that don't need the final element and slice it
with [:-1] which will now be making a copy. But I think it's still an
improvement overall.

Tests added / passed
Passes black dask / flake8 dask

One of the performance problems identified in dask#5913 was because Array.shape gets called many times, and each time it is computed by summing the chunk sizes. By using `cached_cumsum`, the calculation should be fast after the first time. While a cumsum is overkill for computing shape, it's likely that the cached result will be used again at some point. An alternative would be just to store the shape in the object (e.g. computed during the destructor). That would need all the bits of internal code that modify `_chunks` directly to be modified to update the shape as well. Also change `cached_cumsum` to return a tuple instead of a numpy array, and also use `toolz.accumulate` rather than `np.cumsum`. I originally needed this because `shape` was getting the wrong type (tuple of numpy scalars instead of Python scalars), but it also makes the example in dask#5913 (comment) about 10% faster. While it's a little slower to compute the first time, the result of `cached_cumsum` is typically used for iteration or indexing in Python land, rather than with vectorised numpy operations, and those are a lot faster on a tuple. Similarly the bisect module is much faster than np.searchsorted because the latter is dominated by C API overhead. The one downside of using a tuple instead of a numpy array is that slicing it is O(n) instead of O(1). That means that the initial_zero=False answer is now a separate cache entry instead of the initial_zero=True answer with the first element sliced off. There are also a few bits of code that don't need the final element and slice it with `[:-1]` which will now be making a copy. But I think it's still an improvement overall.

mrocklin · 2020-02-18T16:23:10Z

Thanks for this @bmerry .

I went through the uses of cached_cumsum and I agree that this seems to be used mostly in python-scale cases. In general things here look good to me.

I'm going to go ahead and merge.

bmerry requested a review from mrocklin February 18, 2020 12:48

mrocklin merged commit 632f68c into dask:master Feb 18, 2020

bmerry mentioned this pull request Feb 18, 2020

Discussion: making slicing scale better #5918

Open

bmerry deleted the cache-array-shape branch February 19, 2020 05:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Cache result of Array.shape#5916

Cache result of Array.shape#5916
mrocklin merged 1 commit intodask:masterfrom
bmerry:cache-array-shape

bmerry commented Feb 18, 2020

Uh oh!

mrocklin commented Feb 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

bmerry commented Feb 18, 2020

Uh oh!

mrocklin commented Feb 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants