BUG: Buttress handling of extreme values in randint#8846
BUG: Buttress handling of extreme values in randint#8846charris merged 1 commit intonumpy:masterfrom gfyoung:randint-extreme-range
Conversation
numpy/random/tests/test_random.py
Outdated
There was a problem hiding this comment.
Would np.iinfo(np.bool_) be a reasonable thing for numpy to contain? (EDIT: #8849)
There was a problem hiding this comment.
Perhaps, as it does not work for the moment:
>>> np.iinfo(np.bool_)
...
ValueError: Invalid integer data type.|
So, one other problem to address is generating
Of course you could just ask for a direct 64-bit integer, but it would be nice to cover the case where a user api wants to generate up to some variable without a special case Thoughts? |
|
@eric-wieser : |
|
@gfyoung: Good catch on the typo. I meant generate numbers in |
numpy/random/mtrand/mtrand.pyx
Outdated
There was a problem hiding this comment.
Indeed, a lot of "quirks" (i.e. bugs) arose when I started working on this PR in general.
There was a problem hiding this comment.
Actually, I think I'd be happy with this patch if you add a TODO comment that this is a workaround for gh-8851
There was a problem hiding this comment.
I have no problems doing that.
|
@eric-wieser : Got around to looking at this again, and I'm trying to refresh myself on what exactly is blocking this PR from being merged. Let me respond to the suggestions you provided above:
|
|
Main blocker is that I want someone else to weigh in here, and that this raises a bunch of other issues*. I agree that But I think this is in some senses a workaround - we should actually just fix the relational operators here, rather than add a workaround - Would that fix this bug? *For instance: This fails because we already get an overflow before we make it to |
|
Okay. Fair enough. Two points in response:
|
|
I like the idea of adding those comparison ufunc loops. Adding new ufunc loops is pretty straightforward. It might complicate someone's theoretical table of how dtype promotion works, but for the comparison operators, I'm okay with that. :-) |
numpy/random/tests/test_random.py
Outdated
There was a problem hiding this comment.
This is a little weird - better to leave the raw exception so that the tests produce a useful stack trace, I think
There was a problem hiding this comment.
I did this because I wanted to know which element in the for-loop, if any, failed. No stack-trace is going to tell you that AFAICT.
There was a problem hiding this comment.
Maybe we need an assert_no_exception or some such.
There was a problem hiding this comment.
I don't think assert_no_exception would be all the useful - wouldn't that go on every single function call otherwise? Based on @gfyoung's reasining, parameterized test cases would solve this.
I did this because I wanted to know which element in the for-loop, if any, failed.
In what way does this do a better job of doing that, given that dt is not used in the message?
numpy/random/mtrand/mtrand.pyx
Outdated
There was a problem hiding this comment.
If we claim that everything should work if comparisons get fixed, then this line doesn't need to change - right?
There was a problem hiding this comment.
That is correct. The following lines will not be needed:
ilow = int(ilow)
ihigh = int(high)There was a problem hiding this comment.
Let me be more clear: I think we should continue to pass low and high on this line - there is no reason to change to ilow and ihigh here.
There was a problem hiding this comment.
That's not the case:
>>> uint64_max = np.uint64(np.iinfo(np.uint64).max)
>>> uint64_max - 1
1.8446744073709552e+19This PR cannot demonstrate any clearer just how fragile numpy's operators are when handling large uint64 numbers.
There was a problem hiding this comment.
Darn, you're right. In that case, the comment above saying "remove these lines" should also point out that we need them to work around subtraction as well. Or just a comment to that effect next to the subtraction
Alternatively, np.subtract(high, 1, dtype=dtype) would probably be safe here too.
There was a problem hiding this comment.
Fair enough. In fact, even your suggestion is not safe:
>>> uint64_max = np.uint64(np.iinfo(np.uint64).max)
>>>
# correct but bad for randint
>>> np.subtract(uint64_max, 1, dtype=np.int64)
-2
>>>
>>> np.subtract(uint64_max, 1, dtype=None) # oops
1.8446744073709552e+19There was a problem hiding this comment.
Not sure what you meant by # correct but bad for randint. You're in for a bad time if you ask for an upper bound that doesn't come close to fitting in the container type. What does passing dtype=None into randint do normally?
There was a problem hiding this comment.
Actually, looking at those examples again, I realize that those aren't valid in the context of randint. The only time that we could do this subtraction is if dtype=np.uint64, in which case the subtraction actually works correctly.
So ignore everything I just said in the previous comment above. I still would rather prefer to use high - 1 (as the current workaround) over np.subtract, as that will make it clear to us later that we need to patch this once #8851 is fixed.
dtype=None is not valid (per the docs) and will error as an invalid dtype.
There was a problem hiding this comment.
np.subtract has the benefit of extending to broadcasting more naturally.
I'm happy with the current workaround, provided there's a comment indicating that the subtraction too is a workaround
There was a problem hiding this comment.
Comment has been appended to the TODO I wrote earlier.
|
What is the status of this? |
|
IINM @eric-wieser has OK'ed the changes, but he wanted feedback from other maintainers before merging this PR. |
|
@eric-wieser Seems reasonable to me. I'll merge this later if you don't beat me to it. |
|
My main qualm with this patch is that we lock ourselves into behaviour that will be difficult to maintain in #6938 |
|
@eric-wieser : What behavior are you referring to? |
|
@eric-wieser Looks like it just fixes a bug in the current version. What am I missing? |
That once we have broadcasting, casting the results to |
|
I guess my concern comes down to the fact that at some level this is a workaround for an underlying bug (uint64/int64 operations), which leaves us two options in future:
|
|
True that this is a workaround in some respects, but that patch should not affect the numbers generated by this function. I don't see how correctly implementing |
|
Worst case scenario, we decide that Let's put this in, with the assumption that in the unlikely event supporting this becomes a burden, we can just go through a deprecation cycle. |
|
Fair enough. If that does end up happening, I'd have no problems putting together the patches to deprecate given the unlikelihood. |
|
I suppose the other option is to raise an error for the bad combinations. |
|
Thanks @gfyoung . |
In
randint, we perform integer comparisons to check whether the bounds are violated. Withnumpysigned integers, those comparisons can be prone to overflow bugs at the extremes. This PR makes the function robust against those bugs.cc @eric-wieser