ENH: Add (put|take)_along_axis by eric-wieser · Pull Request #11105 · numpy/numpy

eric-wieser · 2018-05-16T08:16:41Z

This is a less ambitious version of #8714, as I proposed in this comment

Essentially, this limits the function to the obvious broadcasting cases when the indices and array have the same number of dimensions. This is sufficiently powerful that any other broadcasting behavior can be achieved by inserting extra dimensions, but keeps to:

In the face of ambiguity, refuse the temptation to guess.

An expansion of these functions is given in their docstring (using the format adopted by #9946), but the one-line summaries are:

put: out[ii + s_[j,] + kk] = a[ii + s_[:,] + kk][indices[ii + s_[j,] + kk]]
take: a[ii + s_[:,] + kk][indices[ii + s_[j,] + kk]] = values[ii + s_[j,] + kk]

Or perhaps more clearly in terms of basic indexing only:

put: out[ii + s_[j,] + kk] = a[ii + s_[indices[ii + s_[j,] + kk],] + kk]
take: a[ii + s_[indices[ii + s_[j,] + kk],] + kk] = values[ii + s_[j,] + kk]

mhvk

Overall, still looks very good and I think it makes sense to get this simpler version in first. Does need tests for put_along_axis (which I think has a problem as is for axis=None). Also, a suggestion for a clearer description.

mhvk · 2018-05-16T13:05:57Z

numpy/lib/shape_base.py

I'd try for a true one-liner, maybe just the one you have above: "Take elements by matching the array and the index arrays" (though perhaps in all places adding "along axis"?)

How about "Take values from the input array by matching 1d index and data slices."?

That's not bad. I'd take that or perhaps
"Take values by matching 1d index and data slices along axis."
(and then also update the one-liners in the other file)

Now looking at the full docstring, I think your one-liner is really fine as is. I would still change it also in the See-also sections.

mhvk · 2018-05-16T13:38:10Z

numpy/lib/shape_base.py

This selects a single element! Would need to be s_[:,]

Oops, good catch

mhvk · 2018-05-16T14:00:21Z

numpy/lib/shape_base.py

I find the description very hard to parse even though I implemented my own version of this! In part, it is because of using two other numpy routines which (I think) are rarely used. E.g., I had to look up what np.s_ was; I think one might as well omit that, and perhaps attempt to explain even more what ndindex does, as in:

This is equivalent to (but faster than) the following use of `ndindex` to iterate over the axes before and after the requested axis: Ni, M, Nk = a.shape[:axis], a.shape[axis], a.shape[axis+1:] J = indices.shape[axis] out = np.empty(indices.shape, dtype=a.dtype) for ii in ndindex(Ni): # ii is a tuple of indices into the axes before axis for kk in ndindex(Nk): # kk is a tuple of indices into the axes after axis a_1d = a[ii + (slice(0, M),) + kk] # values along axis for this ii, kk indices_1d = indices[ii + (slice(0, J),) + kk] # corresponding indices for j in range(J): out[ii + (j,) + kk] = a_1d[indices_1d[j]] Equivalently, eliminating the inner loop, the last two lines would be, out[ii + (slice(0, J),) + kk] = a_1d[indices_1d]

Looking at this again, it might even be an idea to add

out_1d = out[ii + (slice(0, J),) + kk] # corresponding view of output for j in range(J): out_1d[j] = a_1d[indices_1d[j]]

Note that this does at least link to ndindex and s_. I do like the extra temporaries you introduce here, but I would prefer to keep s_ as it makes the connection to : clear.

Yes, the np.s_[:,] is not as confusing as np.s_[j,] (though the comma looks very odd). One argument in favour of the slice as I wrote it is that it uses M and J and thus might help make clearer that those dimensions are not necessarily the same (which I think is useful). But of course one could just write np.s_[0:M,], etc.

On the flipside, using :M suggests that the dimension may not actually be of size M, which would also be confusing.

I've incorporated some of your ideas into the docstrings - perhaps best to create a new review thread on the updated one

What I'd really like here is to just be able to spell it out[*ii, j, *kk] - I tried patching python to allow this, but wasn't able to work out where to go after patching the grammar.

Yes, I tried the same!

I do think your update is a lot clearer already. I would argue to keep the definition of out = np.empty(...) since that helps to understand what the result shape should be and makes the code already runnable. And perhaps add a comment in the example after extracting J along the lines of # Note that J does not have to equal M=a.shape[axis]

mhvk · 2018-05-16T14:17:22Z

numpy/lib/shape_base.py

Move this up right after the np.asanyarray(indices), or, perhaps better, to _make_along_axis_idx (since that does other sanity checks as well that are shared with put_along_axis)

mhvk · 2018-05-16T14:21:29Z

numpy/lib/shape_base.py

See comments on take_along_axis

mhvk · 2018-05-16T14:25:12Z

numpy/lib/shape_base.py

This won't work if one cannot ravel the array without copying (in which case the assignment below would not affect the input array; do make a test case...). I think you'd need to use arr = arr.flat (for symmetry, might as well do the same for take_along_axis).

mhvk · 2018-05-16T14:26:40Z

numpy/lib/tests/test_shape_base.py

Needs tests for put_along_axis as well...

Will add these in a future commit. There's not any obvious way to test the argfunc/func equivalence like there was for take_along_axis, as far as I can tell.

Fine to do in a separate commit. One can test, e.g., adding a very large number and then ensuring that argmax gives back indices.

mhvk · 2018-05-16T14:28:16Z

numpy/lib/tests/test_shape_base.py

Total nitpick, but replace dict() by {} (and {kth: 2} further down)

This was a deliberate choice, because dict(kth=2) looks more like some_func(kth=2)

Definitely fine to leave as is.

shoyer · 2018-05-16T18:40:01Z

numpy/lib/shape_base.py

Is there a use-case for setting axis=None? Personally I find this auto-flattening behavior quite distasteful.

The only legitimate use-case I think of is making thing simpler for 1D arrays, but that would need a default axis. Then the default value axis=None would mean "axis=0 for 1D arrays, otherwise raise an error".

Without this behaviour, you can't use it to fully wrap argsort back into sort. I don't think argsort and sort should accept axis=None either, but given they do, we want to support it here too.

See the maskedarray change in this PR for a case where we sadly have to support None.

OK, that makes sense to me. Let's just make sure the behavior is documented, then.

shoyer · 2018-05-16T18:41:08Z

I like this. My only objection is the auto-flattening behavior with axis=None.

shoyer · 2018-05-16T18:42:43Z

numpy/lib/shape_base.py

This should mention the broadcasting behavior with size 1 dimensions.

eric-wieser · 2018-05-16T20:45:12Z

A case for axis=None:

def sort_by(func, x, axis):
    ind = func(x).argsort(axis=axis)
    return np.take_along_axis(x, ind, axis=axis)

This allows sort_by(np.abs, x, axis=None) to be used where np.sort(x, axis=None) was previously used. I don't think any of these functions should accept None, but they do.

Do we go with the status quo here, or make a deliberate effort to start dropping axis=None support from all new functions?

shoyer · 2018-05-16T20:56:23Z

Do we go with the status quo here, or make a deliberate effort to start dropping axis=None support from all new functions?

I agree that it's better to be consistent with argsort() and similar functions. In the broader scheme of things, flattening with axis=None is not so terrible.

eric-wieser · 2018-05-16T21:09:26Z

flattening with axis=None is not so terrible.

Well, unless axis=None is the default, and then it's pretty terrible.

eric-wieser · 2018-05-17T06:50:29Z

Updated with re-ordered and improved docstrings, and correct handling of axis=None.

Fixup commits have a description as their third line. Will squash once approved.

mhvk

Only a docstring copy & paste left, plus making the one-liners in see-also consistent.

mhvk · 2018-05-17T12:35:07Z

numpy/lib/shape_base.py

I had considered asking you to just pass on the shape, but thought it was too much trouble, but now I see it ended up being needed! Nicer.

I also considered adding .shape and .ndim to flatiter, but then I'd need to do it to MaskedArray.flat too, and it was a little unclear whether .shape and .ndim should refer to the flattened or original array.

mhvk · 2018-05-17T12:36:43Z

numpy/lib/shape_base.py

Oops, copy & paste error...

Fixed and reworded here and for the other function

mhvk · 2018-05-17T12:47:01Z

numpy/lib/shape_base.py

More as a note, really, but this one line will prevent me from actually using the routine in astropy's Time class (which is not an ndarray subclass, but does allow indexing like it). I guess right now there is little choice but to do it, and eventually this should become a check for sufficient duck-typing. Anyway, an additional argument to make _make_along_axis_idx public in a follow-up PR (which means thinking of a better name....).

I think a better follow-up would be to implement this as a gufunc

Perhaps I should just omit this? I don't see any point in allowing the caller to pass a list, and all I actually need here is .shape, .flat, and .__getitem__

Yes, why not indeed, since realistically this is not going to be used on lists, but on things that indices were taken from.

Do I leave the asanyarray on indices?

Might as well remove it too. These indices would be very hard to construct correctly as a nested list anyway.

eric-wieser · 2018-05-25T16:37:25Z

@mhvk: asanyarray removed, and squashed

mhvk · 2018-05-25T18:43:09Z

Darn, now your test with invalid index input raises a different error...

This is the reduced version that does not allow any insertion of extra dimensions

mhvk · 2018-05-29T00:42:36Z

Hadn't noticed this was now fixed - all looks OK, so will merge. Nice addition!

eric-wieser · 2018-05-29T02:32:16Z

Wonder if we should have run this by the mailing list, for the sake of name bike-shedding...

mhvk · 2018-05-29T02:38:19Z

Could still do it?

eric-wieser · 2018-05-29T03:07:54Z

Done

eric-wieser added 01 - Enhancement component: numpy.lib labels May 16, 2018

eric-wieser mentioned this pull request May 16, 2018

ENH: Implement take_along_axis as described in #8708 #8714

Closed

mhvk requested changes May 16, 2018

View reviewed changes

shoyer reviewed May 16, 2018

View reviewed changes

numpy/lib/shape_base.py Outdated

Copy link

Member

shoyer May 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should mention the broadcasting behavior with size 1 dimensions.

mhvk reviewed May 17, 2018

View reviewed changes

mhvk approved these changes May 17, 2018

View reviewed changes

mhvk reviewed May 17, 2018

View reviewed changes

eric-wieser mentioned this pull request May 19, 2018

ENH: Weighted quantile #9211

Closed

eric-wieser force-pushed the take_along_axis-strict branch from ad0ec85 to fcdced0 Compare May 25, 2018 16:36

eric-wieser added 2 commits May 25, 2018 22:55

ENH: Add (put|take)_along_axis as described in numpy#8708

905e906

This is the reduced version that does not allow any insertion of extra dimensions

MAINT: rewrite np.ma.(median|sort) to use take_along_axis

7a3c50a

eric-wieser force-pushed the take_along_axis-strict branch from fcdced0 to 7a3c50a Compare May 26, 2018 05:56

mhvk merged commit 23cb605 into numpy:master May 29, 2018

mhvk mentioned this pull request May 29, 2018

ENH: Add a function that performs the indexing needed to map argsort to sort #8708

Closed

eric-wieser added this to the 1.15.0 release milestone May 29, 2018

kgryte mentioned this pull request Jun 27, 2024

RFC: add take_along_axis to take values along a specified dimension data-apis/array-api#808

Closed

Uh oh!

Conversation

eric-wieser commented May 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser May 16, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

shoyer commented May 16, 2018

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented May 16, 2018

Uh oh!

shoyer commented May 16, 2018

Uh oh!

eric-wieser commented May 16, 2018

Uh oh!

eric-wieser commented May 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mhvk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eric-wieser commented May 16, 2018 •

edited

Loading

eric-wieser May 16, 2018 •

edited

Loading

eric-wieser commented May 17, 2018 •

edited

Loading