Skip to content

Conversation

@shoyer
Copy link
Member

@shoyer shoyer commented Jun 24, 2018

My focus here has been on clarifying the motivation for these changes,
and clearly defining indexing terminology. I've also made the proposal
a bit more definitive about what changes should be made, and have
added myself as a co-author.

My focus here has been on clarifying the motivation for these changes,
and clearly defining indexing terminology. I've also made the proposal
a bit more definitive about what changes *should* be made, and have
added myself as a co-author.

Possibly a mixin could be provided to help implementation. These improvements
are not essential to the initial implementation, so they are saved for
future work.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not disagree, but I have a bit of reservations.

  1. We have to figure out subclasses internally, mostly for masked arrays (I do not care about matrix really, it is horribly broken, and maybe OK to not to support them. The mmap subclass is pretty simple so also not a big deal).
  2. I am still not quite sure if ideas such as __numpy_getitem__ (or a kwarg to getitem itself) may be the best way.

Or should we just ignore it for MA. Probably I am just not doing enough with funny array-likes to have a quite clear plan.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be fine not to immediately implement oindex/vindex for masked arrays, as long as we don’t break existing behavior. Somebody who cares about masked arrays will need to work on it.

Copy link
Member

@eric-wieser eric-wieser Jun 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or a kwarg to getitem itself

Other alternatives I can think of for the meaning of arr.oindex[a, b, c]:

  • arr[np.core.orthogonal_index((a, b, c))] - will work fine by default on maskedarrays, and any subclasses that just forward the indexer to the base class
  • arr[np.core.orthogonal_indexing, a, b, c] (like C++ tag-dispatching)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, one thing to consider is, that if MA does not support it and we implement DeprecationWarnings, for example matplotlib might have no way to go forward with the warnings? Though thta might be a general problem, since backward compatibility and avoiding warnings is tricky. It would be good to find a solution for that!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it would probably be quite easy to make masked array’s ‘getitem‘ use ‘legacy_index‘ internally. That’s about as far as I would be willing to go, personally.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, though it feels a bit half arsed to me :). With a bit of effort from someone not me we could have had that two years ago, hehe...

@seberg
Copy link
Member

seberg commented Jun 24, 2018

Reads well, nice changes! Not sure, but we could also list things like: #5749 frankly, I did not remember that Jaime was probably the first to suggest an indexing attribute :).
It is off topic, but I still think numpy could in principle provide some helpers for downstream for index preprocessing, I remember toying with something like it, probably still in one of my oindex backup branches...

functions such as ``np.ogrid`` and ``np.ix_``, e.g.,
``x[np.ix_([0, 1], [0, 1])]``. However, there are no utilities for emulating
fully general/mixed outer indexing, which could unambiguously allow for slices,
integers, and 1D boolean and integer arrays.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting we do not allow higher dim bools in oindex? It seems rather useful to me, even if basically just a helper around nonzero.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add it to Alternatives ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The possibility of allowing booleans with more than one dimension in oindex never occurred to me. I will need to ponder it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I tried to add support for that to np.ix_ at some point in the past..

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it definitely makes sense, unless you want to do something like allow a higher D boolean, but not in combination with others, which seems strange to me. It is also a nice thing about forcing the correct dimension for the index itself, I think that might reduce bugs for such cases slightly.

What I could imagine discussing is to not quite allow all such cases for the final state of plain indexing.

About np.ix_ I think adding it shouldn't be too hard (with the exception of 0d bools maybe, would have to think about it), but if this moves ahead...

* when any integer index array is present (possibly additional for
more then one boolean index array).
* Boolean indices are not supported. All indices must be integers,
integer arrays or slices.

This comment was marked as resolved.

This comment was marked as resolved.

``subclass.__getitem__ is ndarray.__getitem__``. If not, the
subclass has special handling for indexing and ``NotImplementedError``
should be raised, requiring that the indexing attributes is also explicitly
overwritten. Likewise, implementations of ``__setitem__`` should check to see
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, here's where my comment about constructing special indexing objects should have gone. Just translate arr.oindex[ind] to arr[np.core.oindex_object(ind)]. As long as oindex_object does not attempt to duck-type tuple, then the subclass will just crash if it tried to actually inspect the index. For ma.array and memmap, the index is just passed through, so this would just work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like an ok suggestion for the implementation. Though such an object would have to explicitly not provide the sequence API or someone might still end up just using it... I doubt it is enough for MA, but it might make the MA implementation simple. Also this can provide help to the duck array, since this "indexing object" can provide extra information like "a copy should be made" or "here I converted the boolean arrays for you".

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, a class like this would work:

class oindex_object:
    value: tuple

I think it would work just fine for ma - ma.array.__getitem__ does nothing with idx other than data[idx] and mask[idx]


We don't think that new functions are a good alternative, because indexing
notation ``[]`` offer some syntactic advantages in Python (i.e., direct
creation of slice objects) compared to functions.
Copy link
Member

@eric-wieser eric-wieser Jun 25, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have hacks for this like np.s_[some:slice].

Is there a counter against np.oindex[arr, the:actual:index] or arr[np.oindex[the:actual:index]]?
The latter has the advantage of not taking up two more property names on subclasses.
If nothing else, I'd perhaps like to see these listed as rejected alternatives.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second suggestion seems OK to me as such, though I think np.oindex(arr)[...] which ends up doing the opposite (as API, using a special indexing object) is nicer. The first suggestion, I personally do not like much, it just mangles up the index and the indexed array which IMO makes the square bracket notation much less clear (and actually saves no characters compared to oindex(arr)[...]!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

However, if we implement a special index object, which would not be unreasonable, I am not sure the point below means too much, since it would not be much of a function call, it would just bind to arr and then call arr.__getitem__, it would not actually do anything with arr itself. arr could be in charge of doing some of the checks (such as that the dimension is good) inside its __getitem__. Or maybe easier, arr has to provide ndim or maybe even .shape but that would be the only thing.

@eric-wieser
Copy link
Member

eric-wieser commented Jun 25, 2018

One general remark - introducing these attributes will either break or not support recarray users with fields of the same name (depending on which takes precedence). I think this is just a design flaw in recarray that can't really be fixed, but it's worth noting as an incompatibility this change would introduce.

@mattip
Copy link
Member

mattip commented Jun 25, 2018

I would like to merge (publish) this soon since some of this discussion might work better on the mailing list. Any last-minute changes to the document?

@seberg
Copy link
Member

seberg commented Jun 25, 2018

Seems OK to me, we can always update it later, but I think @shoyer should say if he wants to do other changes. Or you can just merge yourself Stephan.

@shoyer
Copy link
Member Author

shoyer commented Jun 25, 2018

Reads well, nice changes! Not sure, but we could also list things like: #5749 frankly, I did not remember that Jaime was probably the first to suggest an indexing attribute :).

I added a link in to Jaime's PR.

I'm going to merge this and email the list.

@shoyer shoyer merged commit d262c78 into numpy:master Jun 25, 2018
@shoyer shoyer deleted the nep21-update branch June 25, 2018 21:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants