Skip to content

StandardScaler fit overflows on float16 #13007

@baluyotraf

Description

@baluyotraf

Description

When using StandardScaler on a large float16 numpy array the mean and std calculation overflows. I can convert the array to a larger precision but when working with a larger dataset the memory saved by using float16 on smaller numbers kind of matter. The error is mostly on numpy. Adding the dtype on the mean/std calculation does it but I'm not sure if that how people here would like to do it.

Steps/Code to Reproduce

from sklearn.preprocessing import StandardScaler

sample = np.full([10_000_000, 1], 10.0, dtype=np.float16)
StandardScaler().fit_transform(sample)

Expected Results

The normalized array

Actual Results

/opt/conda/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)
/opt/conda/lib/python3.6/site-packages/numpy/core/fromnumeric.py:86: RuntimeWarning: overflow encountered in reduce
  return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/opt/conda/lib/python3.6/site-packages/numpy/core/_methods.py:36: RuntimeWarning: overflow encountered in reduce
  return umr_sum(a, axis, dtype, out, keepdims, initial)
/opt/conda/lib/python3.6/site-packages/sklearn/preprocessing/data.py:765: RuntimeWarning: invalid value encountered in true_divide
  X /= self.scale_

array([[nan],
       [nan],
       [nan],
       ...,
       [nan],
       [nan],
       [nan]], dtype=float16)

Versions

System:
    python: 3.6.6 |Anaconda, Inc.| (default, Oct  9 2018, 12:34:16)  [GCC 7.3.0]
executable: /opt/conda/bin/python
   machine: Linux-4.9.0-5-amd64-x86_64-with-debian-9.4

BLAS:
    macros: SCIPY_MKL_H=None, HAVE_CBLAS=None
  lib_dirs: /opt/conda/lib
cblas_libs: mkl_rt, pthread

Python deps:
       pip: 18.1
setuptools: 39.1.0
   sklearn: 0.20.2
     numpy: 1.16.0
     scipy: 1.1.0
    Cython: 0.29.2
    pandas: 0.23.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions