Conversation
mhvk
left a comment
There was a problem hiding this comment.
Overall, still looks very good and I think it makes sense to get this simpler version in first. Does need tests for put_along_axis (which I think has a problem as is for axis=None). Also, a suggestion for a clearer description.
numpy/lib/shape_base.py
Outdated
There was a problem hiding this comment.
I'd try for a true one-liner, maybe just the one you have above: "Take elements by matching the array and the index arrays" (though perhaps in all places adding "along axis"?)
There was a problem hiding this comment.
How about "Take values from the input array by matching 1d index and data slices."?
There was a problem hiding this comment.
That's not bad. I'd take that or perhaps
"Take values by matching 1d index and data slices along axis."
(and then also update the one-liners in the other file)
There was a problem hiding this comment.
Now looking at the full docstring, I think your one-liner is really fine as is. I would still change it also in the See-also sections.
numpy/lib/shape_base.py
Outdated
There was a problem hiding this comment.
This selects a single element! Would need to be s_[:,]
numpy/lib/shape_base.py
Outdated
There was a problem hiding this comment.
I find the description very hard to parse even though I implemented my own version of this! In part, it is because of using two other numpy routines which (I think) are rarely used. E.g., I had to look up what np.s_ was; I think one might as well omit that, and perhaps attempt to explain even more what ndindex does, as in:
This is equivalent to (but faster than) the following use of `ndindex` to iterate
over the axes before and after the requested axis:
Ni, M, Nk = a.shape[:axis], a.shape[axis], a.shape[axis+1:]
J = indices.shape[axis]
out = np.empty(indices.shape, dtype=a.dtype)
for ii in ndindex(Ni): # ii is a tuple of indices into the axes before axis
for kk in ndindex(Nk): # kk is a tuple of indices into the axes after axis
a_1d = a[ii + (slice(0, M),) + kk] # values along axis for this ii, kk
indices_1d = indices[ii + (slice(0, J),) + kk] # corresponding indices
for j in range(J):
out[ii + (j,) + kk] = a_1d[indices_1d[j]]
Equivalently, eliminating the inner loop, the last two lines would be,
out[ii + (slice(0, J),) + kk] = a_1d[indices_1d]
There was a problem hiding this comment.
Looking at this again, it might even be an idea to add
out_1d = out[ii + (slice(0, J),) + kk] # corresponding view of output
for j in range(J):
out_1d[j] = a_1d[indices_1d[j]]
There was a problem hiding this comment.
Note that this does at least link to ndindex and s_. I do like the extra temporaries you introduce here, but I would prefer to keep s_ as it makes the connection to : clear.
There was a problem hiding this comment.
Yes, the np.s_[:,] is not as confusing as np.s_[j,] (though the comma looks very odd). One argument in favour of the slice as I wrote it is that it uses M and J and thus might help make clearer that those dimensions are not necessarily the same (which I think is useful). But of course one could just write np.s_[0:M,], etc.
There was a problem hiding this comment.
On the flipside, using :M suggests that the dimension may not actually be of size M, which would also be confusing.
I've incorporated some of your ideas into the docstrings - perhaps best to create a new review thread on the updated one
There was a problem hiding this comment.
What I'd really like here is to just be able to spell it out[*ii, j, *kk] - I tried patching python to allow this, but wasn't able to work out where to go after patching the grammar.
There was a problem hiding this comment.
Yes, I tried the same!
I do think your update is a lot clearer already. I would argue to keep the definition of out = np.empty(...) since that helps to understand what the result shape should be and makes the code already runnable. And perhaps add a comment in the example after extracting J along the lines of # Note that J does not have to equal M=a.shape[axis]
numpy/lib/shape_base.py
Outdated
There was a problem hiding this comment.
Move this up right after the np.asanyarray(indices), or, perhaps better, to _make_along_axis_idx (since that does other sanity checks as well that are shared with put_along_axis)
numpy/lib/shape_base.py
Outdated
There was a problem hiding this comment.
See comments on take_along_axis
numpy/lib/shape_base.py
Outdated
There was a problem hiding this comment.
This won't work if one cannot ravel the array without copying (in which case the assignment below would not affect the input array; do make a test case...). I think you'd need to use arr = arr.flat (for symmetry, might as well do the same for take_along_axis).
numpy/lib/tests/test_shape_base.py
Outdated
There was a problem hiding this comment.
Needs tests for put_along_axis as well...
There was a problem hiding this comment.
Will add these in a future commit. There's not any obvious way to test the argfunc/func equivalence like there was for take_along_axis, as far as I can tell.
There was a problem hiding this comment.
Fine to do in a separate commit. One can test, e.g., adding a very large number and then ensuring that argmax gives back indices.
numpy/lib/tests/test_shape_base.py
Outdated
There was a problem hiding this comment.
Total nitpick, but replace dict() by {} (and {kth: 2} further down)
There was a problem hiding this comment.
This was a deliberate choice, because dict(kth=2) looks more like some_func(kth=2)
There was a problem hiding this comment.
Definitely fine to leave as is.
numpy/lib/shape_base.py
Outdated
There was a problem hiding this comment.
Is there a use-case for setting axis=None? Personally I find this auto-flattening behavior quite distasteful.
The only legitimate use-case I think of is making thing simpler for 1D arrays, but that would need a default axis. Then the default value axis=None would mean "axis=0 for 1D arrays, otherwise raise an error".
There was a problem hiding this comment.
Without this behaviour, you can't use it to fully wrap argsort back into sort. I don't think argsort and sort should accept axis=None either, but given they do, we want to support it here too.
See the maskedarray change in this PR for a case where we sadly have to support None.
There was a problem hiding this comment.
OK, that makes sense to me. Let's just make sure the behavior is documented, then.
|
I like this. My only objection is the auto-flattening behavior with |
numpy/lib/shape_base.py
Outdated
There was a problem hiding this comment.
This should mention the broadcasting behavior with size 1 dimensions.
|
A case for def sort_by(func, x, axis):
ind = func(x).argsort(axis=axis)
return np.take_along_axis(x, ind, axis=axis)This allows Do we go with the status quo here, or make a deliberate effort to start dropping |
I agree that it's better to be consistent with |
Well, unless |
|
Updated with re-ordered and improved docstrings, and correct handling of Fixup commits have a description as their third line. Will squash once approved. |
mhvk
left a comment
There was a problem hiding this comment.
Only a docstring copy & paste left, plus making the one-liners in see-also consistent.
numpy/lib/shape_base.py
Outdated
There was a problem hiding this comment.
I had considered asking you to just pass on the shape, but thought it was too much trouble, but now I see it ended up being needed! Nicer.
There was a problem hiding this comment.
I also considered adding .shape and .ndim to flatiter, but then I'd need to do it to MaskedArray.flat too, and it was a little unclear whether .shape and .ndim should refer to the flattened or original array.
numpy/lib/shape_base.py
Outdated
There was a problem hiding this comment.
Fixed and reworded here and for the other function
numpy/lib/shape_base.py
Outdated
There was a problem hiding this comment.
More as a note, really, but this one line will prevent me from actually using the routine in astropy's Time class (which is not an ndarray subclass, but does allow indexing like it). I guess right now there is little choice but to do it, and eventually this should become a check for sufficient duck-typing. Anyway, an additional argument to make _make_along_axis_idx public in a follow-up PR (which means thinking of a better name....).
There was a problem hiding this comment.
I think a better follow-up would be to implement this as a gufunc
There was a problem hiding this comment.
Perhaps I should just omit this? I don't see any point in allowing the caller to pass a list, and all I actually need here is .shape, .flat, and .__getitem__
There was a problem hiding this comment.
Yes, why not indeed, since realistically this is not going to be used on lists, but on things that indices were taken from.
There was a problem hiding this comment.
Do I leave the asanyarray on indices?
There was a problem hiding this comment.
Might as well remove it too. These indices would be very hard to construct correctly as a nested list anyway.
ad0ec85 to
fcdced0
Compare
|
@mhvk: |
|
Darn, now your test with invalid index input raises a different error... |
This is the reduced version that does not allow any insertion of extra dimensions
fcdced0 to
7a3c50a
Compare
|
Hadn't noticed this was now fixed - all looks OK, so will merge. Nice addition! |
|
Wonder if we should have run this by the mailing list, for the sake of name bike-shedding... |
|
Could still do it? |
|
Done |
This is a less ambitious version of #8714, as I proposed in this comment
Essentially, this limits the function to the obvious broadcasting cases when the indices and array have the same number of dimensions. This is sufficiently powerful that any other broadcasting behavior can be achieved by inserting extra dimensions, but keeps to:
An expansion of these functions is given in their docstring (using the format adopted by #9946), but the one-line summaries are:
out[ii + s_[j,] + kk] = a[ii + s_[:,] + kk][indices[ii + s_[j,] + kk]]a[ii + s_[:,] + kk][indices[ii + s_[j,] + kk]] = values[ii + s_[j,] + kk]Or perhaps more clearly in terms of basic indexing only:
out[ii + s_[j,] + kk] = a[ii + s_[indices[ii + s_[j,] + kk],] + kk]a[ii + s_[indices[ii + s_[j,] + kk],] + kk] = values[ii + s_[j,] + kk]