BUG: Raise if histogram cannot create finite bin sizes#27148
BUG: Raise if histogram cannot create finite bin sizes#27148ngoldbaum merged 1 commit intonumpy:mainfrom
Conversation
When many bins are requested in a small value region, it may not be possible to create enough distinct bin edges due to limited numeric precision. Up to now, `histogram` then returned identical subsequent bin edges, which would mean a bin width of 0. These bins could also have counts associated with them. Instead of returning such unlogical bin distributions, this PR raises a value error if the calculated bins do not all have a finite size. Closes numpy#27142.
|
|
||
| # these should not crash | ||
| np.histogram([np.array(0.5) for i in range(10)] + [.500000000000001]) | ||
| np.histogram([np.array(0.5) for i in range(10)] + [.500000000000002]) |
There was a problem hiding this comment.
Note: This test was failing because one of the created bins had zero width. I assume this was not the intention of the test. It was added in #10268, which is about coercing values of object arrays. I've increased the value minimally to not run into the zero-width bin case.
|
Not sure if the pypy failure is real. |
Can't tell. But even if it's real, it seems unrelated to the PR, because it's an assertion rewrite problem in |
mattip
left a comment
There was a problem hiding this comment.
Makes sense to me, and tests (including the new one) are passing. This even caught an edge case in an existing test.
|
The pypy failure went away when I re-triggered it so it must be flake. Thanks @timhoffm! |
|
Sorry for commenting on a merged PR, but I'm not sure this is issue worthy. I'm having trouble consistently producing the error message added here. If I copy the test case then I get the error. If I increase the base number of the arrays then it works fine: In [31]: np.histogram_bin_edges(np.array([1.0, 1.0 + 2e-16] * 10))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[31], line 1
----> 1 np.histogram_bin_edges(np.array([1.0, 1.0 + 2e-16] * 10))
...
ValueError: Too many bins for data range. Cannot create 10 finite-sized bins.
In [32]: np.histogram_bin_edges(np.array([2.0, 2.0 + 2e-16] * 10))
Out[32]: array([1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5])Why does starting the array with 1.0 give different bins that starting with 2.0? For context, I'm trying to reproduce one of my unit tests that has started failing with this new error, but copying and pasting the |
|
|
This is related to upgrading numpy versions, not 3.14 specifically. See numpy/numpy#27148 ghstack-source-id: f4ce0a9 Pull-Request: #167681
This is related to upgrading numpy versions, not 3.14 specifically. See numpy/numpy#27148 ghstack-source-id: 02f49f8 Pull-Request: #167681
This is related to upgrading numpy versions, not 3.14 specifically. See numpy/numpy#27148 Pull Request resolved: #167681 Approved by: https://github.com/williamwen42 ghstack dependencies: #167619
This is related to upgrading numpy versions, not 3.14 specifically. See numpy/numpy#27148 Pull Request resolved: pytorch#167681 Approved by: https://github.com/williamwen42 ghstack dependencies: pytorch#167619
When many bins are requested in a small value region, it may not be possible to create enough distinct bin edges due to limited numeric precision. Up to now,
histogramthen returned identical subsequent bin edges, which would mean a bin width of 0. These bins could also have counts associated with them.Instead of returning such unlogical bin distributions, this PR raises a value error if the calculated bins do not all have a finite size.
Closes #27142.