Enh/take along axis by abel-bzz · Pull Request #11076 · dask/dask

abel-bzz · 2024-04-26T13:57:07Z

This PR adds take_along_axis that works similarly to numpy's take_along_axis.

Closes Add NumPy's new take_along_axis #3663
Tests added / passed
Passes pre-commit run --all-files

Credit

@zklaus for providing a working solution in climix.

Example of use

from dask.array import take_along_axis

data: dask.array.Array 
top10_indices = data.argtopk(k=10, axis=-1)
top10 = take_along_axis(data, top10_indices, axis=-1)

# Equivalent to 
top10 = data.topk(k=10, axis=-1)

Performances

This is just a basic benchmark I ran on my laptop to compare how this implementation prerforms against numpy's.

import numpy as np
from dask.array import from_array
from dask.array.slicing import take_along_axis


random_arr = np.random.rand(1000,1000,1000)
dask_random_arr = from_array(random_arr)
top50 = dask_random_arr.argtopk(k=50, axis=0)
top50_np = top50.compute()

# Compute both argtopk and take_along_axis
%timeit take_along_axis(dask_random_arr, top50, axis=0).compute()
# 6.74 s ± 9.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit np.take_along_axis(random_arr, top50_np, axis=0)
# 864 ms ± 11.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit take_along_axis(dask_random_arr, from_array(top50_np), axis=0).compute()
# 2.3 s ± 9.04 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

GPUtester · 2024-04-26T13:57:10Z

Can one of the admins verify this patch?

Admins can comment ok to test to allow this one PR to run or add to allowlist to allow all future PRs from the same author to run.

abel-bzz · 2024-04-26T14:06:42Z

edit: added benchmark for sparse implementation.
@zklaus, there are a few deviation from your original code in climix:

I removed sparse arrays as it's not a dependency for dask.
Performance wise, on my laptop, it looks better without sparse.

from dask_take_along_axis import dask_take_along_axis # copy-pasted module from climix

%timeit dask_take_along_axis(dask_random_arr, from_array(top50_np), axis=0).compute()
6.33 s ± 60.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

I remove the check of shapes because it was breaking a use case. In particular, with this assert it break the following test dask/array/tests/test_slicing.py::test_take_along_axis__indexing_twice_same_1darray
I removed the meta argument passed in blockwise as it was a sparse array. I suspect it some sort of optimization to give blockwise what is the expected shape but it's not really documented.
Also some variable renaming and added types, docs and integration tests.

github-actions · 2024-04-26T14:29:32Z

Unit Test Results

See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests.

15 files ± 0 15 suites ±0 3h 29m 26s ⏱️ + 7m 10s
13 124 tests + 3 12 185 ✅ + 3 931 💤 ±0 8 ❌ ±0
162 507 runs +39 142 400 ✅ +39 20 099 💤 ±0 8 ❌ ±0

For more details on these failures, see this check.

Results for commit 105ef10. ± Comparison against base commit dafb6ac.

abel-bzz · 2024-06-10T07:54:10Z

Is there someone available to review/comment on this PR ?

zklaus · 2024-06-11T09:10:15Z

Sorry that I don't have the bandwidth to interact properly :/ One thing that concerns me a little is that the test/benchmark isn't really testing much dask because it is conducted with a dask array that consists of a single numpy array as it's only chunk. In that case, performance would probably be expected to be slightly worse than numpy, but it's also not really representative.

Particularly, the point of the sparse array, iirc, was that there is one array for every chunk that covers all the chunks. If this is sparse, that's ok (or at least I thought so when writing it), but it is prohibitive if dense, so I don't think replacing the use of sparse with dense matrices is the way to go. If we want to avoid the use of sparse as a dependency, the aggregation step needs to be re-thought.

As a practical step forward, @bzah, could you do a larger test? Why not apply this to some larger climate data in at least a local cluster with several workers? I suspect that perhaps more than compute time, memory requirements without the use of sparse would balloon.

mrocklin · 2024-06-11T12:28:33Z

@quasiben maybe this is something your team could review?

kgryte · 2024-06-27T09:34:17Z

Related to this PR is a proposal to add take_along_axis to the Array API Standard: data-apis/array-api#808.

If there are concerns regarding adding take_along_axis to Dask, it would be great to hear those concerns so that we can take those into account when formalizing expected behavior.

Abel Aoun and others added 3 commits April 26, 2024 14:48

ENH: Add take_along_axis function

d1e2075

Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>

ENH: expose take_along_axis on dask.array

06ae5c0

DOC: Add documentation for take_along_axis

105ef10

kgryte mentioned this pull request May 30, 2024

RFC: add take_along_axis to take values along a specified dimension data-apis/array-api#808

Closed

dcherian mentioned this pull request Jul 29, 2024

Add Array shuffle / general take using tasks #11257

Closed

github-actions bot added the needs attention It's been a while since this was pushed on. Needs attention from the owner or a maintainer. label Jul 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Enh/take along axis#11076

Enh/take along axis#11076
abel-bzz wants to merge 3 commits intodask:mainfrom
abel-bzz:enh/take_along_axis

abel-bzz commented Apr 26, 2024

Uh oh!

GPUtester commented Apr 26, 2024

Uh oh!

abel-bzz commented Apr 26, 2024 •

edited

Loading

Uh oh!

github-actions bot commented Apr 26, 2024

Uh oh!

abel-bzz commented Jun 10, 2024

Uh oh!

zklaus commented Jun 11, 2024

Uh oh!

mrocklin commented Jun 11, 2024

Uh oh!

kgryte commented Jun 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Uh oh!

Conversation

abel-bzz commented Apr 26, 2024

Credit

Example of use

Performances

Uh oh!

GPUtester commented Apr 26, 2024

Uh oh!

abel-bzz commented Apr 26, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Apr 26, 2024

Unit Test Results

Uh oh!

abel-bzz commented Jun 10, 2024

Uh oh!

zklaus commented Jun 11, 2024

Uh oh!

mrocklin commented Jun 11, 2024

Uh oh!

kgryte commented Jun 27, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

abel-bzz commented Apr 26, 2024 •

edited

Loading