Skip to content

LabelEncoder should add flexibility to future new label #8136

@kaiwang0112006

Description

@kaiwang0112006

Description

I used LabelEncoder to transform categorical feature to numerical feature. But my test set data has new labels that are not fit in the training set. So LabelEncoder raise a ValueError. I think LabelEncoder should be able to deal with the unknown label, maybe just assign the len(self.classes_)+1 to it and update the current LabelEncoder 's self.classes_?

Steps/Code to Reproduce

for i,f in enumerate(train_cat.columns):
    train_cat[f] = le.fit_transform(train_cat[f])
    test_cat[f] = le.transform(test_cat[f])

Expected Results

Actual Results

Traceback (most recent call last):  File "insurancev3.py", line 110, in <module>    test_cat[f] = le.transform(test_cat[f])  
File "/usr/local/python-3.4.4/lib/python3.4/site-packages/sklearn/preprocessing/label.py", line 149, in transform    raise ValueError("y contains new labels: %s" % str(diff))
ValueError: y contains new labels: ['F']

Versions

>>> import platform; print(platform.platform())
Windows-10-10.0.14393-SP0
>>> import sys; print("Python", sys.version)
Python 3.5.2 |Anaconda 4.2.0 (64-bit)| (default, Jul  5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)]
>>> import numpy; print("NumPy", numpy.__version__)
NumPy 1.11.1
>>> import scipy; print("SciPy", scipy.__version__)
SciPy 0.18.1
>>> import sklearn; print("Scikit-Learn", sklearn.__version__)
Scikit-Learn 0.18.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions