Skip to content

BUG: Raise if histogram cannot create finite bin sizes#27148

Merged
ngoldbaum merged 1 commit intonumpy:mainfrom
timhoffm:histogram-small-range
Aug 9, 2024
Merged

BUG: Raise if histogram cannot create finite bin sizes#27148
ngoldbaum merged 1 commit intonumpy:mainfrom
timhoffm:histogram-small-range

Conversation

@timhoffm
Copy link
Copy Markdown
Contributor

@timhoffm timhoffm commented Aug 8, 2024

When many bins are requested in a small value region, it may not be possible to create enough distinct bin edges due to limited numeric precision. Up to now, histogram then returned identical subsequent bin edges, which would mean a bin width of 0. These bins could also have counts associated with them.

Instead of returning such unlogical bin distributions, this PR raises a value error if the calculated bins do not all have a finite size.

Closes #27142.

When many bins are requested in a small value region,
it may not be possible to create enough distinct bin
edges due to limited numeric precision. Up to now,
`histogram` then returned identical subsequent bin
edges, which would mean a bin width of 0. These bins
could also have counts associated with them.

Instead of returning such unlogical bin distributions,
this PR raises a value error if the calculated bins
do not all have a finite size.

Closes numpy#27142.

# these should not crash
np.histogram([np.array(0.5) for i in range(10)] + [.500000000000001])
np.histogram([np.array(0.5) for i in range(10)] + [.500000000000002])
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: This test was failing because one of the created bins had zero width. I assume this was not the intention of the test. It was added in #10268, which is about coercing values of object arrays. I've increased the value minimally to not run into the zero-width bin case.

@ngoldbaum
Copy link
Copy Markdown
Member

Not sure if the pypy failure is real.

@timhoffm
Copy link
Copy Markdown
Contributor Author

timhoffm commented Aug 8, 2024

Not sure if the pypy failure is real.

Can't tell. But even if it's real, it seems unrelated to the PR, because it's an assertion rewrite problem in test_simd.py.

Copy link
Copy Markdown
Member

@mattip mattip left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me, and tests (including the new one) are passing. This even caught an edge case in an existing test.

@ngoldbaum ngoldbaum merged commit 251f7e1 into numpy:main Aug 9, 2024
@ngoldbaum
Copy link
Copy Markdown
Member

The pypy failure went away when I re-triggered it so it must be flake.

Thanks @timhoffm!

@timhoffm timhoffm deleted the histogram-small-range branch August 9, 2024 13:13
@djhoese
Copy link
Copy Markdown
Contributor

djhoese commented Aug 22, 2024

Sorry for commenting on a merged PR, but I'm not sure this is issue worthy. I'm having trouble consistently producing the error message added here. If I copy the test case then I get the error. If I increase the base number of the arrays then it works fine:

In [31]: np.histogram_bin_edges(np.array([1.0, 1.0 + 2e-16] * 10))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[31], line 1
----> 1 np.histogram_bin_edges(np.array([1.0, 1.0 + 2e-16] * 10))
...
ValueError: Too many bins for data range. Cannot create 10 finite-sized bins.

In [32]: np.histogram_bin_edges(np.array([2.0, 2.0 + 2e-16] * 10))
Out[32]: array([1.5, 1.6, 1.7, 1.8, 1.9, 2. , 2.1, 2.2, 2.3, 2.4, 2.5])

Why does starting the array with 1.0 give different bins that starting with 2.0?

For context, I'm trying to reproduce one of my unit tests that has started failing with this new error, but copying and pasting the repr of the array (np.array([999.9999989758, 999.9999989758, 999.9999989758, 999.9999989758])) involved does not trigger the error interactively like it does during the test.

@timhoffm
Copy link
Copy Markdown
Contributor Author

np.array([2.0, 2.0 + 2e-16]).ptp() is 0. Precision limits put all values to the same number. There is special handling for a histogram of all the same number: Because there is no intrinsic scale but it's a valid use case, some "suitable" bins around the value are chosen. This expansion is not triggered if there are (even tiny) differences in the numbers, which again because of floating point precision is the case for the test.

rtimpe added a commit to pytorch/pytorch that referenced this pull request Nov 12, 2025
This is related to upgrading numpy versions, not 3.14 specifically.  See numpy/numpy#27148


ghstack-source-id: f4ce0a9
Pull-Request: #167681
rtimpe added a commit to pytorch/pytorch that referenced this pull request Nov 12, 2025
This is related to upgrading numpy versions, not 3.14 specifically.  See numpy/numpy#27148


ghstack-source-id: 02f49f8
Pull-Request: #167681
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Nov 13, 2025
This is related to upgrading numpy versions, not 3.14 specifically.  See numpy/numpy#27148
Pull Request resolved: #167681
Approved by: https://github.com/williamwen42
ghstack dependencies: #167619
Silv3S pushed a commit to Silv3S/pytorch that referenced this pull request Nov 18, 2025
This is related to upgrading numpy versions, not 3.14 specifically.  See numpy/numpy#27148
Pull Request resolved: pytorch#167681
Approved by: https://github.com/williamwen42
ghstack dependencies: pytorch#167619
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

BUG: zero-width histogram bins if the data values are in a small range close to numeric precision

4 participants