Skip to content

MissingIndicator failed with non-numeric inputs #13035

@baluyotraf

Description

@baluyotraf

Description

sklearn.Imputer.MissingIndicator fails with string and object type numpy arrays

String Types

Steps/Code to Reproduce
import numpy as np
from sklearn.impute import MissingIndicator

a = np.array([[c] for c in 'abcdea'], dtype=str)

MissingIndicator().fit_transform(a)
MissingIndicator(missing_values='a').fit_transform(a)
Expected Results
[[False]
 [False]
 [False]
 [False]
 [False]
 [False]]
[[False]
 [False]
 [True]
 [False]
 [False]
 [False]]
Actual Results
C:\Users\snowt\Python\scikit-learn\env\Scripts\python.exe C:/Users/snowt/Python/scikit-learn/test.py
[[False]
 [False]
 [False]
 [False]
 [False]
 [False]]
C:\Users\snowt\Python\scikit-learn\sklearn\utils\validation.py:558: FutureWarning: Beginning in version 0.22, arrays of bytes/strings will be converted to decimal numbers if dtype='numeric'. It is recommended that you convert the array to a float dtype before using it in scikit-learn, for example by using your_array = your_array.astype(np.float64).
  FutureWarning)
C:\Users\snowt\Python\scikit-learn\sklearn\utils\validation.py:558: FutureWarning: Beginning in version 0.22, arrays of bytes/strings will be converted to decimal numbers if dtype='numeric'. It is recommended that you convert the array to a float dtype before using it in scikit-learn, for example by using your_array = your_array.astype(np.float64).
  FutureWarning)
C:\Users\snowt\Python\scikit-learn\sklearn\utils\validation.py:558: FutureWarning: Beginning in version 0.22, arrays of bytes/strings will be converted to decimal numbers if dtype='numeric'. It is recommended that you convert the array to a float dtype before using it in scikit-learn, for example by using your_array = your_array.astype(np.float64).
  FutureWarning)
Traceback (most recent call last):
  File "C:/Users/snowt/Python/scikit-learn/test.py", line 7, in <module>
    print(MissingIndicator(missing_values='a').fit_transform(a))
  File "C:\Users\snowt\Python\scikit-learn\sklearn\impute.py", line 634, in fit_transform
    return self.fit(X, y).transform(X)
  File "C:\Users\snowt\Python\scikit-learn\sklearn\impute.py", line 570, in fit
    if self.features == 'missing-only'
  File "C:\Users\snowt\Python\scikit-learn\sklearn\impute.py", line 528, in _get_missing_features_info
    imputer_mask = _get_mask(X, self.missing_values)
  File "C:\Users\snowt\Python\scikit-learn\sklearn\impute.py", line 52, in _get_mask
    return np.equal(X, value_to_mask)
TypeError: ufunc 'equal' did not contain a loop with signature matching types dtype('<U1') dtype('<U1') dtype('bool')

Process finished with exit code 1

Object Types

Steps/Code to Reproduce
import numpy as np
from sklearn.impute import MissingIndicator

a = np.array([[c] for c in 'abcdea'], dtype=object)

MissingIndicator().fit_transform(a)
MissingIndicator(missing_values='a').fit_transform(a)
Expected Results
[[False]
 [False]
 [False]
 [False]
 [False]
 [False]]
[[False]
 [False]
 [True]
 [False]
 [False]
 [False]]
Actual Results
C:\Users\snowt\Python\scikit-learn\env\Scripts\python.exe C:/Users/snowt/Python/scikit-learn/test.py
Traceback (most recent call last):
  File "C:/Users/snowt/Python/scikit-learn/test.py", line 6, in <module>
    print(MissingIndicator().fit_transform(a))
  File "C:\Users\snowt\Python\scikit-learn\sklearn\impute.py", line 634, in fit_transform
    return self.fit(X, y).transform(X)
  File "C:\Users\snowt\Python\scikit-learn\sklearn\impute.py", line 555, in fit
    force_all_finite=force_all_finite)
  File "C:\Users\snowt\Python\scikit-learn\sklearn\utils\validation.py", line 522, in check_array
    array = np.asarray(array, dtype=dtype, order=order)
  File "C:\Users\snowt\Python\scikit-learn\env\lib\site-packages\numpy\core\numeric.py", line 538, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: could not convert string to float: 'a'

Process finished with exit code 1

Versions

System:
    python: 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64
 bit (AMD64)]
executable: C:\Users\snowt\Python\scikit-learn\env\Scripts\python.exe
   machine: Windows-10-10.0.17763-SP0

BLAS:
    macros:
  lib_dirs:
cblas_libs: cblas

Python deps:
       pip: 18.1
setuptools: 40.6.3
   sklearn: 0.21.dev0
     numpy: 1.16.0
     scipy: 1.2.0
    Cython: 0.29.3
    pandas: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions