-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Wrong infrequent categories and error in OrdinalEncoder #27088
Copy link
Copy link
Closed
Labels
Description
Describe the bug
When I manually set the numpy object to categories in OrdinalEncoder, I got wrong infrequent_categories_.
If I run fit_transform, then I got an error. See the code below.
Steps/Code to Reproduce
import numpy as np
from sklearn.preprocessing import OneHotEncoder, OrdinalEncoder
categories = [np.array([np.nan, 'b', 'c', 'a'], dtype=object)]
X = np.array([[np.nan]*2+['b']*2+['a']],dtype=object).T
ohe = OneHotEncoder(categories=categories, min_frequency=2)
ode = OrdinalEncoder(categories=categories, min_frequency=2)
ohe.fit(X)
ode.fit(X)
print('onehot', ohe.infrequent_categories_)
print('ordinal', ode.infrequent_categories_)
print(ohe.fit_transform(X))
print(ode.fit_transform(X))Expected Results
onehot [array(['c', 'a'], dtype=object)]
ordinal [array(['c', 'a'], dtype=object)]
(0, 0) 1.0
(1, 0) 1.0
(2, 1) 1.0
(3, 1) 1.0
(4, 2) 1.0
[[nan]
[nan]
[0.]
[0.]
[ 1.]]
Actual Results
onehot [array(['c', 'a'], dtype=object)]
ordinal [array(['b', 'c'], dtype=object)]
(0, 0) 1.0
(1, 0) 1.0
(2, 1) 1.0
(3, 1) 1.0
(4, 2) 1.0
Traceback (most recent call last):
File "tt.py", line 17, in <module>
print(ode.fit_transform(X))
File "/Users/xxf/miniconda3/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 140, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/xxf/miniconda3/lib/python3.8/site-packages/sklearn/base.py", line 915, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/Users/xxf/miniconda3/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 140, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/Users/xxf/miniconda3/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 1573, in transform
X_int, X_mask = self._transform(
File "/Users/xxf/miniconda3/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 236, in _transform
self._map_infrequent_categories(X_int, X_mask, ignore_category_indices)
File "/Users/xxf/miniconda3/lib/python3.8/site-packages/sklearn/preprocessing/_encoders.py", line 437, in _map_infrequent_categories
X_int[rows_to_update, i] = np.take(mapping, X_int[rows_to_update, i])
File "<__array_function__ internals>", line 200, in take
File "/Users/xxf/miniconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 190, in take
return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
File "/Users/xxf/miniconda3/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 57, in _wrapfunc
return bound(*args, **kwds)
IndexError: index 3 is out of bounds for axis 0 with size 3
Versions
System:
python: 3.8.16 (default, Jan 17 2023, 16:39:35) [Clang 14.0.6 ]
executable: /Users/xxf/miniconda3/bin/python
machine: macOS-13.5-arm64-arm-64bit
Python dependencies:
sklearn: 1.3.0
pip: 22.3.1
setuptools: 65.6.3
numpy: 1.24.2
scipy: 1.10.1
Cython: None
pandas: 1.5.3
matplotlib: 3.7.2
joblib: 1.2.0
threadpoolctl: 3.1.0
Built with OpenMP: True
threadpoolctl info:
user_api: openmp
internal_api: openmp
prefix: libomp
filepath: /Users/xxf/miniconda3/lib/python3.8/site-packages/sklearn/.dylibs/libomp.dylib
version: None
num_threads: 8
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: /Users/xxf/miniconda3/lib/python3.8/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
version: 0.3.21
threading_layer: pthreads
architecture: armv8
num_threads: 8
user_api: blas
internal_api: openblas
prefix: libopenblas
filepath: /Users/xxf/miniconda3/lib/python3.8/site-packages/scipy/.dylibs/libopenblas.0.dylib
version: 0.3.18
threading_layer: pthreads
architecture: armv8
num_threads: 8Reactions are currently unavailable