Skip to content

BUG: fix unique(arr, axis=n) when 0 in arr.shape#15565

Closed
huonw wants to merge 4 commits intonumpy:masterfrom
huonw:bugfix/15559-unique-axis-zeros
Closed

BUG: fix unique(arr, axis=n) when 0 in arr.shape#15565
huonw wants to merge 4 commits intonumpy:masterfrom
huonw:bugfix/15559-unique-axis-zeros

Conversation

@huonw
Copy link
Copy Markdown
Contributor

@huonw huonw commented Feb 13, 2020

Previously, unique failed with two forms of errors if:

  • the axis argument is used, not None (e.g. axis=0)
  • the input array has a zero in one or more of its dimensions (e.g. shape=(1, 0, 2))

If axis is one of the zero dimensions (arr.shape[axis] == 0), it failed in the first reshape
with:

ValueError: cannot reshape array of size 0 into shape (0,newaxis)

If axis is not one of the zero dimensions (arr.shape[axis] != 0), it failed at consolidated = ar.view(...) with:

ValueError: When changing to a smaller dtype, its size must be a divisor of the size of original
dtype

This patch special cases empty arrays to skip over all of that reshaping logic, because it isn't
needed: when the array is empty, the result(s) are empty too, and that can be deduced without any
data reshaping or transformations.

Fixes: #15559

Previously, `unique` failed with two forms of errors if:

- the `axis` argument is used, not `None` (e.g. `axis=0`)
- the input array has a zero in one or more of its dimensions (e.g. `shape=(1, 0, 2)`)

If `axis` is one of the zero dimensions (`arr.shape[axis] == 0`), it failed in the first `reshape`
with:

    ValueError: cannot reshape array of size 0 into shape (0,newaxis)

If `axis` is not one of the zero dimensions (`arr.shape[axis] != 0`), it failed at `consolidated =
ar.view(...)` with:

    ValueError: When changing to a smaller dtype, its size must be a divisor of the size of original
    dtype

This patch special cases empty arrays to skip over all of that reshaping logic, because it isn't
needed: when the array is empty, the result(s) are empty too, and that can be deduced without any
data reshaping or transformations.

Fixes: numpy#15559
@huonw huonw marked this pull request as ready for review February 13, 2020 22:19
@huonw
Copy link
Copy Markdown
Contributor Author

huonw commented Feb 13, 2020

The failures don't seem to be related to this patch? E.g. https://dev.azure.com/numpy/numpy/_build/results?buildId=8204&view=logs&j=f49a0588-9972-5862-0c63-6d346787e7be&t=6afda261-ea77-5891-8a3b-e57b484e96fc&l=23

ln: /usr/local/bin/gfortran: File exists
##[error]Bash exited with code '1'.

@eric-wieser eric-wieser self-requested a review February 14, 2020 06:57
@charris charris closed this Feb 15, 2020
@charris charris reopened this Feb 15, 2020
Copy link
Copy Markdown
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll come back for a more thorough review later, but this is not a correct generalization.


assert_equal(uniq.dtype, single_zero.dtype)
msg = "Unique with shape=(2, 0) and axis=0 failed"
assert_array_equal(uniq, np.empty(shape=(0, 0)), msg)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible I'm thinking of axis=1 and not axis=0, but this result is wrong, and should be

Suggested change
assert_array_equal(uniq, np.empty(shape=(0, 0)), msg)
assert_array_equal(uniq, np.empty(shape=(1, 0)), msg)

There is exactly one unique empty object, not zero of them.

Copy link
Copy Markdown
Contributor Author

@huonw huonw Feb 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, you're totally correct. I've rewritten it to not try to reuse the _unique1d code, and instead just compute the answers manually. It's a bit unfortunate in my mind that this edge case requires so much extra code, but I can't think of a better way to do it here.

I guess another approach would be to generalise ndarray.reshape and view to handle empty/zero-sized data better so that the code for unique doesn't have to change at all. Do you have thoughts on whether that would be a better way?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately the semantics of reshape and view are both ungeneralizable.

It's my opinion that reshape is just a bad API - 90% or more of it's use cases are to collapse adjacent axes together, yet it neither is able to check that that actually happens nor perform that action for 0-size arrays.

View would also need an API change, perhaps something like .view(dt, axis=None) to indicate that no axes may be resized (the default would be -1).

Both are interesting things to discuss further, but I wouldn't attempt to combine them with this patch.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, thanks for the info.

@huonw huonw force-pushed the bugfix/15559-unique-axis-zeros branch from c35880a to d40f08d Compare February 16, 2020 22:17
@huonw
Copy link
Copy Markdown
Contributor Author

huonw commented Feb 24, 2020

Hi @eric-wieser, just a gentle request to take a second look when you get a chance. Thanks!

@eric-wieser
Copy link
Copy Markdown
Member

Will defer to @seberg here, awfully short on time at the moment.

@seberg seberg self-requested a review February 24, 2020 03:01
Copy link
Copy Markdown
Member

@seberg seberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike that we have to put this in such a big special branch... but I cannot think of a better way to do it. It seems fine with me, I may give it a pass when I have a bit more quiet, but just ping me if I forget to put it in please.
Thanks, the tests look pretty thorough!

# there's no elements along this axis, so there's no unique
# elements:
num_empties = return_index + return_inverse + return_counts
output = (ar,) + (np.array([], dtype=np.intp),) * num_empties
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess returning the same array should be OK for an empty one...

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the dtype and shape may be different between these values and ar.

@seberg seberg dismissed eric-wieser’s stale review February 26, 2020 16:48

The behaviour of "empty element" being an element is now implemented/tested

Comment on lines +574 to +575
msg = "Unique with shape=(2, 0) and axis=0 failed"
assert_array_equal(uniq, np.empty(shape=(1, 0)), msg)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These messages don't seem to match the error messages.

Normally we don't bother with this type of message anyway, as its easy to infer from the stack trace.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind, misread that. Still think that these messages are probably overkill, and make the test harder to follow

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you can just skip the message. With pytest now, we get a huge amount of information already.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, ok 👍

# viewing doesn't work, but there's also no data, so those
# manipulations aren't needed to get right answer.

axis_len = ar.shape[axis]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't raise AxisError appropriately.

Can you put these special cases just before # Must reshape to a contiguous 2D instead?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, nice catch; putting it after the moveaxis makes this a little more awkward (because it has to then also do np.moveaxis back to the right place), but not too bad.


extra_output = ()
if return_index:
# first index is the first occurance of an empty element
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: occurrence

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed the typo, but #15747 looks better.

@WarrenWeckesser
Copy link
Copy Markdown
Member

@seberg wrote

I dislike that we have to put this in such a big special branch... but I cannot think of a better way to do it.

I agree, and while the changes here look good, I was curious to see what it would take to avoid that big block of special-case code. The result is at #15747.

WarrenWeckesser pushed a commit to WarrenWeckesser/numpy that referenced this pull request Mar 15, 2020
…th 0 axis.

This code is from github user huonw, from this PR:
    numpy#15565
@mattip
Copy link
Copy Markdown
Member

mattip commented Mar 15, 2020

Closing, this was done differently in gh-15747

@seberg seberg closed this Mar 15, 2020
@huonw huonw deleted the bugfix/15559-unique-axis-zeros branch March 15, 2020 21:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: np.unique(arr, axis=...) fails for arr with a zero dimension

6 participants