-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
StandardScaler fit overflows on float16 #13007
Copy link
Copy link
Closed
Description
Description
When using StandardScaler on a large float16 numpy array the mean and std calculation overflows. I can convert the array to a larger precision but when working with a larger dataset the memory saved by using float16 on smaller numbers kind of matter. The error is mostly on numpy. Adding the dtype on the mean/std calculation does it but I'm not sure if that how people here would like to do it.
Steps/Code to Reproduce
from sklearn.preprocessing import StandardScaler
sample = np.full([10_000_000, 1], 10.0, dtype=np.float16)
StandardScaler().fit_transform(sample)Expected Results
The normalized array
Actual Results
/opt/conda/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
return umr_sum(a, axis, dtype, out, keepdims, initial)
/opt/conda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/opt/conda/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
return umr_sum(a, axis, dtype, out, keepdims, initial)
/opt/conda/lib/python3.6/site-packages/sklearn/preprocessing/data.py:765: RuntimeWarning: invalid value encountered in true_divide
X /= self.scale_
array([[nan],
[nan],
[nan],
...,
[nan],
[nan],
[nan]], dtype=float16)
Versions
System:
python: 3.6.6 |Anaconda, Inc.| (default, Oct 9 2018, 12:34:16) [GCC 7.3.0]
executable: /opt/conda/bin/python
machine: Linux-4.9.0-5-amd64-x86_64-with-debian-9.4
BLAS:
macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
lib_dirs: /opt/conda/lib
cblas_libs: mkl_rt, pthread
Python deps:
pip: 18.1
setuptools: 39.1.0
sklearn: 0.20.2
numpy: 1.16.0
scipy: 1.1.0
Cython: 0.29.2
pandas: 0.23.4
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels