Skip to content

BUG: Histogramdd breaks on big arrays in Windows #22288

@scseeman

Description

@scseeman

Describe the issue:

numpy.histogramdd fails with large arrays on Windows 10 64-bit. This is likely due to a bizarre behavior on Windows64 where the default integer dtype returns as int32 causing numpy.prod to overflow

Reproduce the code example:

>>> import numpy as np
>>> sample = np.zeros([100000000, 3])
>>> xbins = 400
>>> ybins = 400
>>> zbins = np.arange(16000)
>>> hist = np.histogramdd(sample=sample, bins=(xbins, ybins, zbins))

Error message:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<__array_function__ internals>", line 180, in histogramdd
  File "C:\Users\stephanies\AppData\Local\Continuum\miniconda3\envs\numpy_only\lib\site-packages\numpy\lib\histograms.py", line 1095, in histogramdd
    hist = np.bincount(xy, weights, minlength=nbin.prod())
  File "<__array_function__ internals>", line 180, in bincount
ValueError: 'minlength' must not be negative

NumPy/Python version information:

1.23.3 3.10.6 | packaged by conda-forge | (main, Aug 22 2022, 20:29:51) [MSC v.1929 64 bit (AMD64)]

Context for the issue:

This is a platform issue preventing the generation of large histograms. The same code above works fine on a linux machine

import numpy as np
sample = np.zeros([100000000, 3])
xbins = 400
ybins = 400
zbins = np.arange(16000)
hist = np.histogramdd(sample=sample, bins=(xbins, ybins, zbins))
hist[0].shape
(400, 400, 15999)

Metadata

Metadata

Assignees

No one assigned

    Labels

    00 - BugsprintableIssue fits the time-frame and setting of a sprint

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions