Support slicing with out-of-order numpy arrays by mrocklin · Pull Request #3808 · dask/dask

mrocklin · 2018-07-23T22:41:14Z

This stages a random slice into a numpy array into a two-staged slice
in such a way that reduces overhead. We will now get, at worst, n-squared
behavior.

However, as currently implemented this also opens us up to all sorts of
corner cases. We'll probably have to either do a bunch of work or else
reduce the scope of when this gets applied.

Tests added / passed
Passes flake8 dask

cc @jakirkham

This stages a random slice into a numpy array into a two-staged slice in such a way that reduces overhead. We will now get, at worst, n-squared behavior. However, as currently implemented this also opens us up to all sorts of corner cases. We'll probably have to either do a bunch of work or else reduce the scope of when this gets applied.

mrocklin · 2018-07-23T23:12:05Z

In [1]: import dask.array as da
In [2]: x = da.random.random((10000, 10000), chunks=(1000, 1000))
In [3]: import numpy as np
In [4]: index = np.random.randint(0, len(x), size=len(x))

In [5]: len(x[index].dask)
/home/mrocklin/workspace/dask/dask/array/core.py:1398: PerformanceWarning: Slicing with an out-of-order index is generating 10 times more chunks
  return self[index3].rechunk({axis: c})[index4]
Out[5]: 1300

In [6]: len(x.dask)
Out[6]: 100

mrocklin · 2018-07-23T23:12:34Z

Still costly, but feasible. This is also a case where we might consider auto-rechunking

jakirkham · 2018-07-27T14:35:54Z

Thanks for working on this Matt. Will give it a closer look soon.

Have one question ATM. What happens if you replace index with the following?

index = np.arange(10000)
np.random.shuffle(index)

mrocklin · 2018-07-27T15:24:01Z

I think it'd be about the same

jcrist · 2018-08-07T14:46:03Z

dask/array/tests/test_slicing.py

+                (slice(0, 5), index),
+                (slice(0, None, 2), index),
+        #         (None, slice(None, None), index),
+        #         (None, index, slice(None, None)),


Why are these commented out?

jcrist · 2018-08-07T14:53:58Z

dask/array/slicing.py

    return out


+def shuffle_convert(index, chunks, axis):


This could use a docstring

jcrist · 2018-08-07T14:54:38Z

dask/array/core.py

+                    locs = np.concatenate([[0], np.where(ind2[1:] <
+                        ind2[:-1])[0] + 1, [len(ind2)]])
+                    c = tuple(np.diff(locs).tolist())
+                    return self[index3].rechunk({axis: c})[index4]


This whole block is quite complicated, and could use at least some block-level comments to explain what each operation is doing.

jcrist · 2018-08-07T14:58:18Z

dask/array/tests/test_slicing.py

+        #         (None, index, slice(None, None)),
+                (index[::2],),
+                (slice(None, None), index[::2],),
+                (5, index)]:


Would be nice to have some cases like the following (with a higher dimension array):

(*mix_of_nones_slices_and_integers1, index, *mix_of_nones_slices_and_integers1)

to check that prior and subsequent axis are being applied correctly.

Would also be good to explicitly check that the error from providing multiple array indices still is raised with this optimization.

jcrist reviewed Aug 7, 2018

View reviewed changes

mrocklin mentioned this pull request Aug 24, 2018

API: shuffle dask array #3901

Merged

TomAugspurger mentioned this pull request Aug 29, 2018

(H)DBSCAN dask/dask-ml#158

Open

mrocklin closed this Jan 3, 2019

mrocklin deleted the slice-shuffle branch January 3, 2019 17:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support slicing with out-of-order numpy arrays#3808

Support slicing with out-of-order numpy arrays#3808
mrocklin wants to merge 1 commit intodask:masterfrom
mrocklin:slice-shuffle

mrocklin commented Jul 23, 2018

Uh oh!

mrocklin commented Jul 23, 2018

Uh oh!

mrocklin commented Jul 23, 2018

Uh oh!

jakirkham commented Jul 27, 2018

Uh oh!

mrocklin commented Jul 27, 2018

Uh oh!

jcrist Aug 7, 2018

Uh oh!

jcrist Aug 7, 2018

Uh oh!

jcrist Aug 7, 2018

Uh oh!

jcrist Aug 7, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

mrocklin commented Jul 23, 2018

Uh oh!

mrocklin commented Jul 23, 2018

Uh oh!

mrocklin commented Jul 23, 2018

Uh oh!

jakirkham commented Jul 27, 2018

Uh oh!

mrocklin commented Jul 27, 2018

Uh oh!

jcrist Aug 7, 2018

Choose a reason for hiding this comment

Uh oh!

jcrist Aug 7, 2018

Choose a reason for hiding this comment

Uh oh!

jcrist Aug 7, 2018

Choose a reason for hiding this comment

Uh oh!

jcrist Aug 7, 2018

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants