-
-
Notifications
You must be signed in to change notification settings - Fork 12k
DOC: major revision of NEP 21, advanced indexing #11414
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
My focus here has been on clarifying the motivation for these changes, and clearly defining indexing terminology. I've also made the proposal a bit more definitive about what changes *should* be made, and have added myself as a co-author.
|
|
||
| Possibly a mixin could be provided to help implementation. These improvements | ||
| are not essential to the initial implementation, so they are saved for | ||
| future work. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not disagree, but I have a bit of reservations.
- We have to figure out subclasses internally, mostly for masked arrays (I do not care about matrix really, it is horribly broken, and maybe OK to not to support them. The mmap subclass is pretty simple so also not a big deal).
- I am still not quite sure if ideas such as
__numpy_getitem__(or a kwarg to getitem itself) may be the best way.
Or should we just ignore it for MA. Probably I am just not doing enough with funny array-likes to have a quite clear plan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be fine not to immediately implement oindex/vindex for masked arrays, as long as we don’t break existing behavior. Somebody who cares about masked arrays will need to work on it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or a kwarg to getitem itself
Other alternatives I can think of for the meaning of arr.oindex[a, b, c]:
arr[np.core.orthogonal_index((a, b, c))]- will work fine by default on maskedarrays, and any subclasses that just forward the indexer to the base classarr[np.core.orthogonal_indexing, a, b, c](like C++ tag-dispatching)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, one thing to consider is, that if MA does not support it and we implement DeprecationWarnings, for example matplotlib might have no way to go forward with the warnings? Though thta might be a general problem, since backward compatibility and avoiding warnings is tricky. It would be good to find a solution for that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it would probably be quite easy to make masked array’s ‘getitem‘ use ‘legacy_index‘ internally. That’s about as far as I would be willing to go, personally.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair point, though it feels a bit half arsed to me :). With a bit of effort from someone not me we could have had that two years ago, hehe...
|
Reads well, nice changes! Not sure, but we could also list things like: #5749 frankly, I did not remember that Jaime was probably the first to suggest an indexing attribute :). |
| functions such as ``np.ogrid`` and ``np.ix_``, e.g., | ||
| ``x[np.ix_([0, 1], [0, 1])]``. However, there are no utilities for emulating | ||
| fully general/mixed outer indexing, which could unambiguously allow for slices, | ||
| integers, and 1D boolean and integer arrays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting we do not allow higher dim bools in oindex? It seems rather useful to me, even if basically just a helper around nonzero.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add it to Alternatives ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The possibility of allowing booleans with more than one dimension in oindex never occurred to me. I will need to ponder it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I tried to add support for that to np.ix_ at some point in the past..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it definitely makes sense, unless you want to do something like allow a higher D boolean, but not in combination with others, which seems strange to me. It is also a nice thing about forcing the correct dimension for the index itself, I think that might reduce bugs for such cases slightly.
What I could imagine discussing is to not quite allow all such cases for the final state of plain indexing.
About np.ix_ I think adding it shouldn't be too hard (with the exception of 0d bools maybe, would have to think about it), but if this moves ahead...
| * when any integer index array is present (possibly additional for | ||
| more then one boolean index array). | ||
| * Boolean indices are not supported. All indices must be integers, | ||
| integer arrays or slices. |
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
This comment was marked as resolved.
Sorry, something went wrong.
| ``subclass.__getitem__ is ndarray.__getitem__``. If not, the | ||
| subclass has special handling for indexing and ``NotImplementedError`` | ||
| should be raised, requiring that the indexing attributes is also explicitly | ||
| overwritten. Likewise, implementations of ``__setitem__`` should check to see |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, here's where my comment about constructing special indexing objects should have gone. Just translate arr.oindex[ind] to arr[np.core.oindex_object(ind)]. As long as oindex_object does not attempt to duck-type tuple, then the subclass will just crash if it tried to actually inspect the index. For ma.array and memmap, the index is just passed through, so this would just work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like an ok suggestion for the implementation. Though such an object would have to explicitly not provide the sequence API or someone might still end up just using it... I doubt it is enough for MA, but it might make the MA implementation simple. Also this can provide help to the duck array, since this "indexing object" can provide extra information like "a copy should be made" or "here I converted the boolean arrays for you".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, a class like this would work:
class oindex_object:
value: tuple
I think it would work just fine for ma - ma.array.__getitem__ does nothing with idx other than data[idx] and mask[idx]
|
|
||
| We don't think that new functions are a good alternative, because indexing | ||
| notation ``[]`` offer some syntactic advantages in Python (i.e., direct | ||
| creation of slice objects) compared to functions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We already have hacks for this like np.s_[some:slice].
Is there a counter against np.oindex[arr, the:actual:index] or arr[np.oindex[the:actual:index]]?
The latter has the advantage of not taking up two more property names on subclasses.
If nothing else, I'd perhaps like to see these listed as rejected alternatives.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second suggestion seems OK to me as such, though I think np.oindex(arr)[...] which ends up doing the opposite (as API, using a special indexing object) is nicer. The first suggestion, I personally do not like much, it just mangles up the index and the indexed array which IMO makes the square bracket notation much less clear (and actually saves no characters compared to oindex(arr)[...]!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
However, if we implement a special index object, which would not be unreasonable, I am not sure the point below means too much, since it would not be much of a function call, it would just bind to arr and then call arr.__getitem__, it would not actually do anything with arr itself. arr could be in charge of doing some of the checks (such as that the dimension is good) inside its __getitem__. Or maybe easier, arr has to provide ndim or maybe even .shape but that would be the only thing.
|
One general remark - introducing these attributes will either break or not support recarray users with fields of the same name (depending on which takes precedence). I think this is just a design flaw in recarray that can't really be fixed, but it's worth noting as an incompatibility this change would introduce. |
|
I would like to merge (publish) this soon since some of this discussion might work better on the mailing list. Any last-minute changes to the document? |
|
Seems OK to me, we can always update it later, but I think @shoyer should say if he wants to do other changes. Or you can just merge yourself Stephan. |
I added a link in to Jaime's PR. I'm going to merge this and email the list. |
My focus here has been on clarifying the motivation for these changes,
and clearly defining indexing terminology. I've also made the proposal
a bit more definitive about what changes should be made, and have
added myself as a co-author.