HLG: get_dependencies() of single keys by madsbk · Pull Request #6699 · dask/dask

madsbk · 2020-10-02T08:23:58Z

Fixes #6694 by letting get_dependencies() work on single keys instead of all keys in the Layer.

Tests added / passed
Passes black dask / flake8 dask

madsbk · 2020-10-02T09:38:02Z

@sofroniewn thanks for reporting the performance issue. Can I ask you to try this PR, it should fix the issue?

sofroniewn · 2020-10-02T16:02:02Z

Hi @madsbk, I just tested this PR and the performance regressions have been fixed!

I ran my test script

import dask.array as da
import numpy as np
import time


data = da.random.random(
    size=(100_000, 1000, 1000), chunks=(1, 1000, 1000)
)

idxs = [(0,), (50_000,), (99_999,)]

t0 = time.time()
reduced_data = np.min([np.min(data[idx]) for idx in idxs])
t1 = time.time()
print(t1 - t0)

And this is back at around ~0.19 ms, which matches 2.27.0 and is much better than the 3.4 seconds it was on 2.28.0

I'm not so familiar with dask testing practices or policies, but this seems as good a time and place as any to ask if the project has any timing based tests? In napari we use the pytest.mark.timeout(n) decorator in a few places to catch major performance regressions - see this test, which was very helpful for us identifying this performance regression - we saw it on our CI right away on the 2.28.0 release, and I'm wondering if some rough tests like that might be helpful for dask too?

Thanks again for the quick turn around on the fix!!

jrbourbeau

Thanks @madsbk! This is in

TomAugspurger · 2020-10-13T15:54:41Z

Just FYI, the benchmark at https://pandas.pydata.org/speed/dask/#array.Slicing.time_slice_int_tail (source) did pick up this regression.

It looks like with this fix we're still about 2.5x slower on that benchmark compared to 2.30.0. I'm unsure if that's worth worrying about.

Edit: I'm going to open a new issue for these. A bit easier to track.

Updates `get_dependencies()` to work on single keys instead of all keys in the `Layer`

madsbk added 3 commits October 2, 2020 09:00

HLG: updated doc

89a6c63

HLG: key_dependencies is allowed to be missing keys

a9ee343

get_dependencies() now works on a single key

73ad889

madsbk changed the title ~~Hlg deps per key~~ HLG: get_dependencies() of single keys Oct 2, 2020

madsbk mentioned this pull request Oct 2, 2020

Add ShuffleStage HLG Layer #6650

Merged

2 tasks

clean up

700aab2

madsbk mentioned this pull request Oct 2, 2020

Release 2.29.0 dask/community#98

Closed

sofroniewn approved these changes Oct 2, 2020

View reviewed changes

sofroniewn mentioned this pull request Oct 2, 2020

2.28.0 performance related issues #6694

Closed

madsbk mentioned this pull request Oct 8, 2020

Revert "Revert "Use HighLevelGraph layers everywhere in collections (… #6707

Merged

rjzamora mentioned this pull request Oct 8, 2020

Add optional IO-subgraph to Blockwise Layers #6715

Merged

6 tasks

jrbourbeau approved these changes Oct 8, 2020

View reviewed changes

jrbourbeau merged commit bd18820 into dask:master Oct 8, 2020

kumarprabhu1988 pushed a commit to kumarprabhu1988/dask that referenced this pull request Oct 29, 2020

HLG: get_dependencies() of single keys (dask#6699)

383ecc9

Updates `get_dependencies()` to work on single keys instead of all keys in the `Layer`

madsbk deleted the hlg_deps_per_key branch February 16, 2021 08:59

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

HLG: get_dependencies() of single keys#6699

HLG: get_dependencies() of single keys#6699
jrbourbeau merged 4 commits intodask:masterfrom
madsbk:hlg_deps_per_key

madsbk commented Oct 2, 2020 •

edited

Loading

Uh oh!

madsbk commented Oct 2, 2020 •

edited

Loading

Uh oh!

sofroniewn commented Oct 2, 2020

Uh oh!

jrbourbeau left a comment

Uh oh!

TomAugspurger commented Oct 13, 2020 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

madsbk commented Oct 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

madsbk commented Oct 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sofroniewn commented Oct 2, 2020

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

TomAugspurger commented Oct 13, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

madsbk commented Oct 2, 2020 •

edited

Loading

madsbk commented Oct 2, 2020 •

edited

Loading

TomAugspurger commented Oct 13, 2020 •

edited

Loading