-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
LabelEncoder should add flexibility to future new label #8136
Copy link
Copy link
Closed
Description
Description
I used LabelEncoder to transform categorical feature to numerical feature. But my test set data has new labels that are not fit in the training set. So LabelEncoder raise a ValueError. I think LabelEncoder should be able to deal with the unknown label, maybe just assign the len(self.classes_)+1 to it and update the current LabelEncoder 's self.classes_?
Steps/Code to Reproduce
for i,f in enumerate(train_cat.columns):
train_cat[f] = le.fit_transform(train_cat[f])
test_cat[f] = le.transform(test_cat[f])
Expected Results
Actual Results
Traceback (most recent call last): File "insurancev3.py", line 110, in <module> test_cat[f] = le.transform(test_cat[f])
File "/usr/local/python-3.4.4/lib/python3.4/site-packages/sklearn/preprocessing/label.py", line 149, in transform raise ValueError("y contains new labels: %s" % str(diff))
ValueError: y contains new labels: ['F']
Versions
>>> import platform; print(platform.platform())
Windows-10-10.0.14393-SP0
>>> import sys; print("Python", sys.version)
Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.11.1
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 0.18.1
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.18.1
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels