Skip to content

Apply non-fancy indices first in vindex#3353

Merged
jakirkham merged 2 commits intodask:masterfrom
jakirkham:ref_vindex
Apr 4, 2018
Merged

Apply non-fancy indices first in vindex#3353
jakirkham merged 2 commits intodask:masterfrom
jakirkham:ref_vindex

Conversation

@jakirkham
Copy link
Member

@jakirkham jakirkham commented Mar 29, 2018

Extracts the non-fancy indices and applies them first in vindex. This is important as these shrink the shape of the array. Thus it is easier to reason about later steps if we know this has already happened. Further these immediately result in Dask Array chunks being dropped from consideration as only the chunk(s) containing the index or slice are kept. This is a nice property to have before other subselections are applied.

@jakirkham jakirkham changed the title Apply scalar indices first in vindex WIP: Apply scalar indices first in vindex Mar 29, 2018
@jakirkham jakirkham force-pushed the ref_vindex branch 2 times, most recently from eb83dad to 645336d Compare March 29, 2018 16:28
@jakirkham jakirkham changed the title WIP: Apply scalar indices first in vindex Apply scalar indices first in vindex Mar 29, 2018
@jakirkham
Copy link
Member Author

If you have time to look at this @shoyer, would appreciate it. Would like to make sure that this is an ok thing to do in vindex. Planning on using this as part of the refactoring mentioned in PR ( #3210 ).

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generally looks good, just a few minor comments

partial_slices = []
reduced_indexes = []
for i, ind in enumerate(indexes):
if isinstance(ind, Number):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest checking for 0-dimensional arrays here, too, e.g., if isinstance(ind, Number) or getattr(ind, 'ndim', None) == 0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, was thinking about that too. Will update.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be tricky though if the object being checked is lazy. We can check for Dask Arrays pretty easily. Not sure if we should be checking for other things.

partial_slices = {i: ind for i, ind in enumerate(indexes)
if isinstance(ind, slice) and ind != slice(None)}

num_indexes = []
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Maybe integer_indexes would be a more descriptive name than num_indexes? "num" sounds a little more likely to include numpy arrays.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had int_indexes before. Also was debating scalar_indexes. No preferences amongst these personally. Please let me know what sounds best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

scalar_indexes is probably clearest, that definitely excludes ND arrays :).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Went with nonfancy_indexes given it now handles the scalar indices and partial slices. Happy to change the name if there are other thoughts.

key = tuple(partial_slices.get(i, slice(None))
for i in range(len(indexes)))
x = x[key]
x = x[partial_slices]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can safely do integer and slice indexing at the same time, though I don't know if that actually makes a difference as far as dask is concerned.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True.

I guess it means inserting another getitem call in the graph. Would expect it gets optimized out, but better just to avoid it altogether. So one less getitem call sounds like a good idea.

Though if we do that, we might want to adjust how we collect the indices at the beginning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now combined into one step.

Move all NumPy array slicing content into what use to be `_vindex_1d`,
which is now `_vindex_array`. This will make it easier to extend
`vindex` to handle other objects like Dask Arrays afterwards.
@jakirkham
Copy link
Member Author

Pushed one more commit which moves the NumPy array indexing stuff into what was _vindex_1d and is now _vindex_array. If we would rather keep the functions separate, can just have _vindex_array call _vindex_1d unchanged. Please let me know what you think.

Extracts the scalar indices and partial slices. Then applies them first
in `vindex`. Also drops the scalar indices and partial slices from the
indices as they are handled.  Also as the scalar indices reduce the
array's rank, which needs to be taken into account for later slices.
Thus making it easier to reason about later steps as we know this has
already happened. Further these immediately result in Dask Array chunks
being dropped from consideration as only the chunk including the index
or partial slice are kept. This is a nice property to have before other
subselections are applied.
@jakirkham jakirkham changed the title Apply scalar indices first in vindex Apply non-fancy indices first in vindex Mar 30, 2018
@jakirkham
Copy link
Member Author

Any thoughts on the revised version?

@shoyer
Copy link
Member

shoyer commented Mar 31, 2018

Looks good to me

@jakirkham
Copy link
Member Author

Thanks. Will plan on merging tomorrow if no further comments.

@jakirkham jakirkham merged commit 06b1c65 into dask:master Apr 4, 2018
@jakirkham jakirkham deleted the ref_vindex branch April 4, 2018 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants