Skip to content

Speed up normalize_chunks for common case#10579

Merged
jrbourbeau merged 3 commits intodask:mainfrom
martindurant:array_norm_chunks_opt
Oct 25, 2023
Merged

Speed up normalize_chunks for common case#10579
jrbourbeau merged 3 commits intodask:mainfrom
martindurant:array_norm_chunks_opt

Conversation

@martindurant
Copy link
Member

In certain xarray zarr/kerchunk reads with many variables, a considerable amount of time is spent in dask.array's normalize chunks (see snakeviz output)

Screenshot 2023-10-19 at 10 51 33

For the case that a zarr-like chunksize was given, or a full chunks spec was already given, this is all unnecessary work. This PR skips it for such cases. The same part of the profile following the change looks like:

Screenshot 2023-10-19 at 10 55 21

@martindurant
Copy link
Member Author

bump @dask/dask-dev

@rsignell-usgs
Copy link
Contributor

@jrbourbeau this would really help speed up opening the 15TB coastal ocean dataset USGS has been working on. From 20s to 6s! It's headed for the AWS public dataset program!

@jrbourbeau
Copy link
Member

Nice, thanks @martindurant -- taking a look now

Comment on lines +3133 to +3134
if nonans or isinstance(sum(sum(_) for _ in chunks), int):
return tuple(tuple(_) for _ in chunks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking nit: The _ usage here is correct, but looks a little unusual. Any reason we can't do this, like we do elsewhere?

Suggested change
if nonans or isinstance(sum(sum(_) for _ in chunks), int):
return tuple(tuple(_) for _ in chunks)
if nonans or isinstance(sum(sum(c) for c in chunks), int):
return tuple(tuple(c) for c in chunks)


nonans = None
if chunks and shape is not None:
nonans = all(isinstance(c, int) for c in chunks)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to use an int check instead of not math.isnan(c)? The nonans name seems to be slightly different than the code that's in this line

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The c in this case can be tuples - we are checking specifically for individual ints.

Also, it's a bit faster eve

In [9]: l = [1] * 100

In [10]: %timeit all(isinstance(x, int) for x in l)
3.41 µs ± 14.2 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [11]: %timeit all(not math.isnan(x) for x in l)
5.55 µs ± 8.38 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

"Got chunks=%s, shape=%s" % (chunks, shape)
)

if nonans or isinstance(sum(sum(_) for _ in chunks), int):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What case is the or isinstance(sum(sum(_) for _ in chunks), int) part covering that's not already covered by the if nonans portion of this condition?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm surprised this double summation isn't slow in the many data variables case

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the nonans (now renamed allints) which is the very fast path.

However, the sum is still much faster. In the case there are nans, it would be slightly slower.

In [16]: chunks = ((1, 2, 3, 4), ) * 10

In [18]: %timeit tuple(tuple(int(x) if not math.isnan(x) else np.nan for x in c) for c in chunks)
5.15 µs ± 14.7 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)

In [19]: %timeit isinstance(sum(sum(_) for _ in chunks), int);tuple(tuple(ch) for ch in chunks)
1.34 µs ± 1.1 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What case is the or isinstance(sum(sum(_) for _ in chunks), int) part covering

This is for chunks like ((2, 2, 2), (3, 3, 3)), i.e., tuple of tuple of only ints.

As you see in the original snakeviz, this function is called three times under from_array; we can speed up inputs like (2, 3) (original zarr-like chunksize) and ((2, 2, 2), (3, 3, 3)) (already normalised). I haven't dug to see if there's a way to avoid calling normalize at all, since the code here is enough to have it drop out of my profiling.

@jrbourbeau jrbourbeau changed the title Speed up normalize_chunks for common case Speed up normalize_chunks for common case Oct 24, 2023
@martindurant
Copy link
Member Author

@jrbourbeau , thanks for the review

I implemented your suggestions and answered questions.
There may be a better way to avoid the time spent in this one function, but I didn't want to make changes further up the stack where they might have more consequences. I don't really understand why we have nans here at all, or why they might be something other than np.nan. For the case where we aren't using a chunksize, I don't really know why running this function is necessary at all and wouldn't have thought about it without the profiling.

@martindurant
Copy link
Member Author

Removing my stuff and simply using

    return tuple(tuple(ch) for ch in chunks)

on the last line would be much better, but there are two tests specifically passing float("nan") to check the current behaviour.

@rsignell
Copy link

@jrbourbeau are you satisfied with the approach and justifications for the approach or do you think it needs more work?

Copy link
Member

@jrbourbeau jrbourbeau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @martindurant! This is in

@jrbourbeau jrbourbeau merged commit 009489f into dask:main Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants