Skip to content

Pipeline doesn't work with Label Encoder #3956

@johnny555

Description

@johnny555

I've found that I cannot use pipelines if I wish to use the label encoder. In the following I wish to build a pipeline that first encodes the label and then constructs a one-hot encoding from that labelling.

from sklearn.preprocessing import  OneHotEncoder, LabelEncoder
from sklearn.pipeline import make_pipeline
import numpy as np

X = np.array(['cat', 'dog', 'cow', 'cat', 'cow', 'dog'])

enc = LabelEncoder()
hot = OneHotEncoder()

pipe = make_pipeline(enc, hot)
pipe.fit_transform(X)

However, the following error is returned:

lib/python2.7/site-packages/sklearn/pipeline.pyc in _pre_transform(self, X, y, **fit_params)
    117         for name, transform in self.steps[:-1]:
    118             if hasattr(transform, "fit_transform"):
--> 119                 Xt = transform.fit_transform(Xt, y, **fit_params_steps[name])
    120             else:
    121                 Xt = transform.fit(Xt, y, **fit_params_steps[name]) \

TypeError: fit_transform() takes exactly 2 arguments (3 given)

It seems that the problem is that the fit method for label encoder only takes a y argument, whereas the pipeline assumes that it will take an X and an optional y.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions