Can't provide feature indices for OneHotEncoder in pipeline

Let's say I want to apply a transformation only to some features in a pipeline, such as imputation or one-hot-encoding (or scaling, which currently doesn't support this).
I could provide the indices of the columns I want to transform. But if there are any previous steps in the pipeline, they might re-arrange the features in some arbitrary way (like OneHotEncoder does).

Example
```python
import numpy as np

# assume the second feature is categorical and the third is continuous
X = [[np.NaN, np.NaN, 5], [np.NaN, 1, 3], [np.NaN, 1, np.NaN]]

from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import Imputer, OneHotEncoder

pipe = make_pipeline(Imputer(strategy='most_frequent'), OneHotEncoder(categorical_features=[1], sparse=False))

pipe.fit_transform(X)
```
>array([[ 0.,  1.,  1.],
       [ 1.,  0.,  1.],
       [ 1.,  0.,  1.]])

desired outcome:
> array([[ 1.,  5.],
       [ 1.,  3.],
       [ 1.,  3.]])

Even if each output feature corresponds to exactly one input feature, and we knew which that was, there would be no way to specify this in OneHotEncoder. This might look constructed but is a pretty obvious use-case in which you have per-column meta-data.

The only solution I see is by keeping along a column index (or column names) and allow to pass that.
Given my experience of ``.iloc`` vs ``.loc`` in pandas, I'm not entirely happy with the prospect.

cc @mfeurer 

Conceptually somewhat related to #8480 and https://github.com/scikit-learn/enhancement_proposals/pull/5 as they deal with feature meta-data.

[and then I introduced hierarchical indices over columns into scikit-learn.... not]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can't provide feature indices for OneHotEncoder in pipeline #8539

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Can't provide feature indices for OneHotEncoder in pipeline #8539

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions