Skip to content

Cache result of Array.shape#5916

Merged
mrocklin merged 1 commit intodask:masterfrom
bmerry:cache-array-shape
Feb 18, 2020
Merged

Cache result of Array.shape#5916
mrocklin merged 1 commit intodask:masterfrom
bmerry:cache-array-shape

Conversation

@bmerry
Copy link
Contributor

@bmerry bmerry commented Feb 18, 2020

One of the performance problems identified in #5913 was because
Array.shape gets called many times, and each time it is computed by
summing the chunk sizes. By using cached_cumsum, the calculation
should be fast after the first time. While a cumsum is overkill for
computing shape, it's likely that the cached result will be used again
at some point.

An alternative would be just to store the shape in the object (e.g.
computed during the destructor). That would need all the bits of
internal code that modify _chunks directly to be modified to update
the shape as well.

Also change cached_cumsum to return a tuple instead of a numpy array,
and also use toolz.accumulate rather than np.cumsum. I originally
needed this because shape was getting the wrong type (tuple of numpy
scalars instead of Python scalars), but it also makes the example in
#5913 (comment) about
10% faster. While it's a little slower to compute the first time, the
result of cached_cumsum is typically used for iteration or indexing in
Python land, rather than with vectorised numpy operations, and those are
a lot faster on a tuple. Similarly the bisect module is much faster than
np.searchsorted because the latter is dominated by C API overhead.

The one downside of using a tuple instead of a numpy array is that
slicing it is O(n) instead of O(1). That means that the
initial_zero=False answer is now a separate cache entry instead of the
initial_zero=True answer with the first element sliced off. There are
also a few bits of code that don't need the final element and slice it
with [:-1] which will now be making a copy. But I think it's still an
improvement overall.

  • Tests added / passed
  • Passes black dask / flake8 dask

One of the performance problems identified in dask#5913 was because
Array.shape gets called many times, and each time it is computed by
summing the chunk sizes. By using `cached_cumsum`, the calculation
should be fast after the first time. While a cumsum is overkill for
computing shape, it's likely that the cached result will be used again
at some point.

An alternative would be just to store the shape in the object (e.g.
computed during the destructor). That would need all the bits of
internal code that modify `_chunks` directly to be modified to update
the shape as well.

Also change `cached_cumsum` to return a tuple instead of a numpy array,
and also use `toolz.accumulate` rather than `np.cumsum`. I originally
needed this because `shape` was getting the wrong type (tuple of numpy
scalars instead of Python scalars), but it also makes the example in
dask#5913 (comment) about
10% faster. While it's a little slower to compute the first time, the
result of `cached_cumsum` is typically used for iteration or indexing in
Python land, rather than with vectorised numpy operations, and those are
a lot faster on a tuple. Similarly the bisect module is much faster than
np.searchsorted because the latter is dominated by C API overhead.

The one downside of using a tuple instead of a numpy array is that
slicing it is O(n) instead of O(1). That means that the
initial_zero=False answer is now a separate cache entry instead of the
initial_zero=True answer with the first element sliced off. There are
also a few bits of code that don't need the final element and slice it
with `[:-1]` which will now be making a copy. But I think it's still an
improvement overall.
@bmerry bmerry requested a review from mrocklin February 18, 2020 12:48
@mrocklin
Copy link
Member

Thanks for this @bmerry .

I went through the uses of cached_cumsum and I agree that this seems to be used mostly in python-scale cases. In general things here look good to me.

I'm going to go ahead and merge.

@mrocklin mrocklin merged commit 632f68c into dask:master Feb 18, 2020
@bmerry bmerry deleted the cache-array-shape branch February 19, 2020 05:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants