BUG: Histogramdd breaks on big arrays in Windows#22561
Conversation
Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py
numpy/lib/histograms.py
Outdated
| N, D = sample.shape | ||
|
|
||
| nbin = np.empty(D, int) | ||
| #nbin = np.empty(D, int) |
There was a problem hiding this comment.
Yes, and committed.
Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py
| edges = histogram_bin_edges(arr, bins='auto', range=(0, 1)) | ||
| assert_array_equal(edges, e) | ||
|
|
||
| def test_big_arrays(self): |
There was a problem hiding this comment.
Thanks. We need to mark this for large memory use, have a look for @requires_memory. I suspect it should also be marked with @pytest.mark.slow to skip on most runs.
|
There are two binary files that got accidentally commit, please delete them. |
|
@navpreetnp7 would be great to finish this up, since branching will be in a week or two probably. Could you add that CI/tests are currently failing because the test requires a huge amount of memory to run successfully. |
|
Yeah I just finished it up. Sorry for the delay |
numpy/lib/tests/test_histograms.py
Outdated
| edges = histogram_bin_edges(arr, bins='auto', range=(0, 1)) | ||
| assert_array_equal(edges, e) | ||
|
|
||
| @requires_memory(free_bytes=100000000 * 3) |
There was a problem hiding this comment.
Thanks. Lets refine this a little bit. If you check sample.nbytes, you will see that it is 8 times larger. Also the result array (with the current choices) has a significant size. You could just reduce that size by a lot, even then it probably makes sense to round up a little bit.
(This is a sanity check that we should be good after all. The decorator also catches MemoryError so we should be fine, but I would just round up generously.)
There was a problem hiding this comment.
Sure. That makes sense. I made it 100 times larger
There was a problem hiding this comment.
OK, if I do the math, this should be about double of what is needed, and 10 GiB are not forbidding these days, so this should be fine.
|
Thanks @navpreetnp7. |
* BUG: Histogramdd breaks on big arrays in Windows Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py * BUG: Histogramdd breaks on big arrays in Windows Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py * Removed the binary files * Update test_histograms.py * Update test_histograms.py * Update test_histograms.py
Resolved the issue with line change from int to np.intp in numpy/numpy/lib/histograms.py
New test function was added called test_big_arrays on numpy/lib/tests/test_histograms.py
Resolved #22288
Pydata Sprint NYC 2022