Skip to content

Confusing error message in OrdinalEncoder when passing single list of categories #13446

@jorisvandenbossche

Description

@jorisvandenbossche

Small example:

In [38]: from sklearn.preprocessing import OrdinalEncoder 

In [39]: X = np.array([['L', 'M', 'S', 'M', 'L']], dtype=object).T

In [40]: ohe = OrdinalEncoder(categories=['S', 'M', 'L'])

In [41]: ohe.fit(X)
...
ValueError: Shape mismatch: if n_values is an array, it has to be of shape (n_features,).

The error message is still using the old n_values, which makes it very confusing.

(another issue is that we might be able to actually detect this case)

Versions

System:
    python: 3.7.1 | packaged by conda-forge | (default, Feb 18 2019, 01:42:00)  [GCC 7.3.0]
executable: /home/joris/miniconda3/bin/python
   machine: Linux-4.4.0-142-generic-x86_64-with-debian-stretch-sid

BLAS:
    macros: HAVE_CBLAS=None
  lib_dirs: /home/joris/miniconda3/lib
cblas_libs: openblas, openblas

Python deps:
       pip: 19.0.2
setuptools: 40.8.0
   sklearn: 0.20.2
     numpy: 1.16.1
     scipy: 1.2.1
    Cython: None
    pandas: 0.23.4

Metadata

Metadata

Assignees

No one assigned

    Labels

    good first issueEasy with clear instructions to resolve

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions