Apply non-fancy indices first in vindex#3353
Conversation
eb83dad to
645336d
Compare
shoyer
left a comment
There was a problem hiding this comment.
generally looks good, just a few minor comments
| partial_slices = [] | ||
| reduced_indexes = [] | ||
| for i, ind in enumerate(indexes): | ||
| if isinstance(ind, Number): |
There was a problem hiding this comment.
I would suggest checking for 0-dimensional arrays here, too, e.g., if isinstance(ind, Number) or getattr(ind, 'ndim', None) == 0
There was a problem hiding this comment.
Yeah, was thinking about that too. Will update.
There was a problem hiding this comment.
This might be tricky though if the object being checked is lazy. We can check for Dask Arrays pretty easily. Not sure if we should be checking for other things.
dask/array/core.py
Outdated
| partial_slices = {i: ind for i, ind in enumerate(indexes) | ||
| if isinstance(ind, slice) and ind != slice(None)} | ||
|
|
||
| num_indexes = [] |
There was a problem hiding this comment.
nit: Maybe integer_indexes would be a more descriptive name than num_indexes? "num" sounds a little more likely to include numpy arrays.
There was a problem hiding this comment.
Had int_indexes before. Also was debating scalar_indexes. No preferences amongst these personally. Please let me know what sounds best.
There was a problem hiding this comment.
scalar_indexes is probably clearest, that definitely excludes ND arrays :).
There was a problem hiding this comment.
Went with nonfancy_indexes given it now handles the scalar indices and partial slices. Happy to change the name if there are other thoughts.
dask/array/core.py
Outdated
| key = tuple(partial_slices.get(i, slice(None)) | ||
| for i in range(len(indexes))) | ||
| x = x[key] | ||
| x = x[partial_slices] |
There was a problem hiding this comment.
I think you can safely do integer and slice indexing at the same time, though I don't know if that actually makes a difference as far as dask is concerned.
There was a problem hiding this comment.
True.
I guess it means inserting another getitem call in the graph. Would expect it gets optimized out, but better just to avoid it altogether. So one less getitem call sounds like a good idea.
Though if we do that, we might want to adjust how we collect the indices at the beginning.
There was a problem hiding this comment.
Now combined into one step.
Move all NumPy array slicing content into what use to be `_vindex_1d`, which is now `_vindex_array`. This will make it easier to extend `vindex` to handle other objects like Dask Arrays afterwards.
|
Pushed one more commit which moves the NumPy array indexing stuff into what was |
Extracts the scalar indices and partial slices. Then applies them first in `vindex`. Also drops the scalar indices and partial slices from the indices as they are handled. Also as the scalar indices reduce the array's rank, which needs to be taken into account for later slices. Thus making it easier to reason about later steps as we know this has already happened. Further these immediately result in Dask Array chunks being dropped from consideration as only the chunk including the index or partial slice are kept. This is a nice property to have before other subselections are applied.
|
Any thoughts on the revised version? |
|
Looks good to me |
|
Thanks. Will plan on merging tomorrow if no further comments. |
Extracts the non-fancy indices and applies them first in
vindex. This is important as these shrink the shape of the array. Thus it is easier to reason about later steps if we know this has already happened. Further these immediately result in Dask Array chunks being dropped from consideration as only the chunk(s) containing the index or slice are kept. This is a nice property to have before other subselections are applied.