Conversation
Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>
|
Can one of the admins verify this patch? Admins can comment |
|
edit: added benchmark for sparse implementation.
from dask_take_along_axis import dask_take_along_axis # copy-pasted module from climix
%timeit dask_take_along_axis(dask_random_arr, from_array(top50_np), axis=0).compute()
6.33 s ± 60.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
|
Unit Test ResultsSee test report for an extended history of previous test failures. This is useful for diagnosing flaky tests. 15 files ± 0 15 suites ±0 3h 29m 26s ⏱️ + 7m 10s For more details on these failures, see this check. Results for commit 105ef10. ± Comparison against base commit dafb6ac. |
|
Is there someone available to review/comment on this PR ? |
|
Sorry that I don't have the bandwidth to interact properly :/ One thing that concerns me a little is that the test/benchmark isn't really testing much dask because it is conducted with a dask array that consists of a single numpy array as it's only chunk. In that case, performance would probably be expected to be slightly worse than numpy, but it's also not really representative. Particularly, the point of the sparse array, iirc, was that there is one array for every chunk that covers all the chunks. If this is sparse, that's ok (or at least I thought so when writing it), but it is prohibitive if dense, so I don't think replacing the use of sparse with dense matrices is the way to go. If we want to avoid the use of sparse as a dependency, the aggregation step needs to be re-thought. As a practical step forward, @bzah, could you do a larger test? Why not apply this to some larger climate data in at least a local cluster with several workers? I suspect that perhaps more than compute time, memory requirements without the use of sparse would balloon. |
|
@quasiben maybe this is something your team could review? |
|
Related to this PR is a proposal to add If there are concerns regarding adding |
This PR adds
take_along_axisthat works similarly to numpy's take_along_axis.pre-commit run --all-filesCredit
@zklaus for providing a working solution in climix.
Example of use
Performances
This is just a basic benchmark I ran on my laptop to compare how this implementation prerforms against numpy's.