Apply non-fancy indices first in vindex by jakirkham · Pull Request #3353 · dask/dask

jakirkham · 2018-03-29T15:51:11Z

Extracts the non-fancy indices and applies them first in vindex. This is important as these shrink the shape of the array. Thus it is easier to reason about later steps if we know this has already happened. Further these immediately result in Dask Array chunks being dropped from consideration as only the chunk(s) containing the index or slice are kept. This is a nice property to have before other subselections are applied.

jakirkham · 2018-03-29T16:35:33Z

If you have time to look at this @shoyer, would appreciate it. Would like to make sure that this is an ok thing to do in vindex. Planning on using this as part of the refactoring mentioned in PR ( #3210 ).

shoyer

generally looks good, just a few minor comments

shoyer · 2018-03-29T16:47:15Z

dask/array/core.py

+    partial_slices = []
+    reduced_indexes = []
+    for i, ind in enumerate(indexes):
+        if isinstance(ind, Number):


I would suggest checking for 0-dimensional arrays here, too, e.g., if isinstance(ind, Number) or getattr(ind, 'ndim', None) == 0

Yeah, was thinking about that too. Will update.

This might be tricky though if the object being checked is lazy. We can check for Dask Arrays pretty easily. Not sure if we should be checking for other things.

shoyer · 2018-03-29T16:49:08Z

dask/array/core.py

-    partial_slices = {i: ind for i, ind in enumerate(indexes)
-                      if isinstance(ind, slice) and ind != slice(None)}
+
+    num_indexes = []


nit: Maybe integer_indexes would be a more descriptive name than num_indexes? "num" sounds a little more likely to include numpy arrays.

Had int_indexes before. Also was debating scalar_indexes. No preferences amongst these personally. Please let me know what sounds best.

scalar_indexes is probably clearest, that definitely excludes ND arrays :).

Went with nonfancy_indexes given it now handles the scalar indices and partial slices. Happy to change the name if there are other thoughts.

shoyer · 2018-03-29T16:50:06Z

dask/array/core.py

-        key = tuple(partial_slices.get(i, slice(None))
-                    for i in range(len(indexes)))
-        x = x[key]
+        x = x[partial_slices]


I think you can safely do integer and slice indexing at the same time, though I don't know if that actually makes a difference as far as dask is concerned.

True.

I guess it means inserting another getitem call in the graph. Would expect it gets optimized out, but better just to avoid it altogether. So one less getitem call sounds like a good idea.

Though if we do that, we might want to adjust how we collect the indices at the beginning.

Now combined into one step.

Move all NumPy array slicing content into what use to be `_vindex_1d`, which is now `_vindex_array`. This will make it easier to extend `vindex` to handle other objects like Dask Arrays afterwards.

jakirkham · 2018-03-29T17:25:54Z

Pushed one more commit which moves the NumPy array indexing stuff into what was _vindex_1d and is now _vindex_array. If we would rather keep the functions separate, can just have _vindex_array call _vindex_1d unchanged. Please let me know what you think.

Extracts the scalar indices and partial slices. Then applies them first in `vindex`. Also drops the scalar indices and partial slices from the indices as they are handled. Also as the scalar indices reduce the array's rank, which needs to be taken into account for later slices. Thus making it easier to reason about later steps as we know this has already happened. Further these immediately result in Dask Array chunks being dropped from consideration as only the chunk including the index or partial slice are kept. This is a nice property to have before other subselections are applied.

jakirkham · 2018-03-31T19:17:52Z

Any thoughts on the revised version?

shoyer · 2018-03-31T21:24:44Z

Looks good to me

jakirkham · 2018-04-02T13:41:59Z

Thanks. Will plan on merging tomorrow if no further comments.

jakirkham changed the title ~~Apply scalar indices first in vindex~~ WIP: Apply scalar indices first in vindex Mar 29, 2018

jakirkham force-pushed the ref_vindex branch 2 times, most recently from eb83dad to 645336d Compare March 29, 2018 16:28

jakirkham changed the title ~~WIP: Apply scalar indices first in vindex~~ Apply scalar indices first in vindex Mar 29, 2018

shoyer reviewed Mar 29, 2018

View reviewed changes

Handle vindex's NumPy Arrays in a special function

77623cd

Move all NumPy array slicing content into what use to be `_vindex_1d`, which is now `_vindex_array`. This will make it easier to extend `vindex` to handle other objects like Dask Arrays afterwards.

jakirkham force-pushed the ref_vindex branch from 645336d to bdc6ead Compare March 29, 2018 17:23

jakirkham force-pushed the ref_vindex branch from bdc6ead to fa22d2e Compare March 30, 2018 04:51

jakirkham force-pushed the ref_vindex branch from fa22d2e to 5adad0b Compare March 30, 2018 04:54

jakirkham changed the title ~~Apply scalar indices first in vindex~~ Apply non-fancy indices first in vindex Mar 30, 2018

jakirkham merged commit 06b1c65 into dask:master Apr 4, 2018

jakirkham deleted the ref_vindex branch April 4, 2018 16:34

jakirkham mentioned this pull request Apr 5, 2018

WIP: Support using vindex with Dask Arrays #3210

Closed

Uh oh!

Conversation

jakirkham commented Mar 29, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jakirkham commented Mar 29, 2018

Uh oh!

shoyer left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakirkham commented Mar 29, 2018

Uh oh!

jakirkham commented Mar 31, 2018

Uh oh!

shoyer commented Mar 31, 2018

Uh oh!

jakirkham commented Apr 2, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jakirkham commented Mar 29, 2018 •

edited

Loading