BUG: fix unique(arr, axis=n) when 0 in arr.shape#15565
BUG: fix unique(arr, axis=n) when 0 in arr.shape#15565huonw wants to merge 4 commits intonumpy:masterfrom
0 in arr.shape#15565Conversation
Previously, `unique` failed with two forms of errors if:
- the `axis` argument is used, not `None` (e.g. `axis=0`)
- the input array has a zero in one or more of its dimensions (e.g. `shape=(1, 0, 2)`)
If `axis` is one of the zero dimensions (`arr.shape[axis] == 0`), it failed in the first `reshape`
with:
ValueError: cannot reshape array of size 0 into shape (0,newaxis)
If `axis` is not one of the zero dimensions (`arr.shape[axis] != 0`), it failed at `consolidated =
ar.view(...)` with:
ValueError: When changing to a smaller dtype, its size must be a divisor of the size of original
dtype
This patch special cases empty arrays to skip over all of that reshaping logic, because it isn't
needed: when the array is empty, the result(s) are empty too, and that can be deduced without any
data reshaping or transformations.
Fixes: numpy#15559
|
The failures don't seem to be related to this patch? E.g. https://dev.azure.com/numpy/numpy/_build/results?buildId=8204&view=logs&j=f49a0588-9972-5862-0c63-6d346787e7be&t=6afda261-ea77-5891-8a3b-e57b484e96fc&l=23 |
eric-wieser
left a comment
There was a problem hiding this comment.
I'll come back for a more thorough review later, but this is not a correct generalization.
numpy/lib/tests/test_arraysetops.py
Outdated
|
|
||
| assert_equal(uniq.dtype, single_zero.dtype) | ||
| msg = "Unique with shape=(2, 0) and axis=0 failed" | ||
| assert_array_equal(uniq, np.empty(shape=(0, 0)), msg) |
There was a problem hiding this comment.
It's possible I'm thinking of axis=1 and not axis=0, but this result is wrong, and should be
| assert_array_equal(uniq, np.empty(shape=(0, 0)), msg) | |
| assert_array_equal(uniq, np.empty(shape=(1, 0)), msg) |
There is exactly one unique empty object, not zero of them.
There was a problem hiding this comment.
Ah, yeah, you're totally correct. I've rewritten it to not try to reuse the _unique1d code, and instead just compute the answers manually. It's a bit unfortunate in my mind that this edge case requires so much extra code, but I can't think of a better way to do it here.
I guess another approach would be to generalise ndarray.reshape and view to handle empty/zero-sized data better so that the code for unique doesn't have to change at all. Do you have thoughts on whether that would be a better way?
There was a problem hiding this comment.
Unfortunately the semantics of reshape and view are both ungeneralizable.
It's my opinion that reshape is just a bad API - 90% or more of it's use cases are to collapse adjacent axes together, yet it neither is able to check that that actually happens nor perform that action for 0-size arrays.
View would also need an API change, perhaps something like .view(dt, axis=None) to indicate that no axes may be resized (the default would be -1).
Both are interesting things to discuss further, but I wouldn't attempt to combine them with this patch.
There was a problem hiding this comment.
Ok, thanks for the info.
c35880a to
d40f08d
Compare
|
Hi @eric-wieser, just a gentle request to take a second look when you get a chance. Thanks! |
|
Will defer to @seberg here, awfully short on time at the moment. |
seberg
left a comment
There was a problem hiding this comment.
I dislike that we have to put this in such a big special branch... but I cannot think of a better way to do it. It seems fine with me, I may give it a pass when I have a bit more quiet, but just ping me if I forget to put it in please.
Thanks, the tests look pretty thorough!
numpy/lib/arraysetops.py
Outdated
| # there's no elements along this axis, so there's no unique | ||
| # elements: | ||
| num_empties = return_index + return_inverse + return_counts | ||
| output = (ar,) + (np.array([], dtype=np.intp),) * num_empties |
There was a problem hiding this comment.
I guess returning the same array should be OK for an empty one...
There was a problem hiding this comment.
I think the dtype and shape may be different between these values and ar.
The behaviour of "empty element" being an element is now implemented/tested
numpy/lib/tests/test_arraysetops.py
Outdated
| msg = "Unique with shape=(2, 0) and axis=0 failed" | ||
| assert_array_equal(uniq, np.empty(shape=(1, 0)), msg) |
There was a problem hiding this comment.
These messages don't seem to match the error messages.
Normally we don't bother with this type of message anyway, as its easy to infer from the stack trace.
There was a problem hiding this comment.
Nevermind, misread that. Still think that these messages are probably overkill, and make the test harder to follow
There was a problem hiding this comment.
Yeah, you can just skip the message. With pytest now, we get a huge amount of information already.
numpy/lib/arraysetops.py
Outdated
| # viewing doesn't work, but there's also no data, so those | ||
| # manipulations aren't needed to get right answer. | ||
|
|
||
| axis_len = ar.shape[axis] |
There was a problem hiding this comment.
This won't raise AxisError appropriately.
Can you put these special cases just before # Must reshape to a contiguous 2D instead?
There was a problem hiding this comment.
Ah, nice catch; putting it after the moveaxis makes this a little more awkward (because it has to then also do np.moveaxis back to the right place), but not too bad.
numpy/lib/arraysetops.py
Outdated
|
|
||
| extra_output = () | ||
| if return_index: | ||
| # first index is the first occurance of an empty element |
There was a problem hiding this comment.
I fixed the typo, but #15747 looks better.
…th 0 axis.
This code is from github user huonw, from this PR:
numpy#15565
|
Closing, this was done differently in gh-15747 |
Previously,
uniquefailed with two forms of errors if:axisargument is used, notNone(e.g.axis=0)shape=(1, 0, 2))If
axisis one of the zero dimensions (arr.shape[axis] == 0), it failed in the firstreshapewith:
If
axisis not one of the zero dimensions (arr.shape[axis] != 0), it failed atconsolidated = ar.view(...)with:This patch special cases empty arrays to skip over all of that reshaping logic, because it isn't
needed: when the array is empty, the result(s) are empty too, and that can be deduced without any
data reshaping or transformations.
Fixes: #15559