HLG: get_dependencies() of single keys#6699
Conversation
|
@sofroniewn thanks for reporting the performance issue. Can I ask you to try this PR, it should fix the issue? |
|
Hi @madsbk, I just tested this PR and the performance regressions have been fixed! I ran my test script import dask.array as da
import numpy as np
import time
data = da.random.random(
size=(100_000, 1000, 1000), chunks=(1, 1000, 1000)
)
idxs = [(0,), (50_000,), (99_999,)]
t0 = time.time()
reduced_data = np.min([np.min(data[idx]) for idx in idxs])
t1 = time.time()
print(t1 - t0)And this is back at around ~0.19 ms, which matches I'm not so familiar with dask testing practices or policies, but this seems as good a time and place as any to ask if the project has any timing based tests? In napari we use the Thanks again for the quick turn around on the fix!! |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @madsbk! This is in
|
Just FYI, the benchmark at https://pandas.pydata.org/speed/dask/#array.Slicing.time_slice_int_tail (source) did pick up this regression. It looks like with this fix we're still about 2.5x slower on that benchmark compared to 2.30.0. I'm unsure if that's worth worrying about. Edit: I'm going to open a new issue for these. A bit easier to track. |
Updates `get_dependencies()` to work on single keys instead of all keys in the `Layer`
Fixes #6694 by letting
get_dependencies()work on single keys instead of all keys in theLayer.black dask/flake8 dask