Skip to content

ColumnTransformer.transfomers_ should store indices rather than a function #12097

@jnothman

Description

@jnothman

When column is specified as a function, this should not be stored in transformers_. Rather, the calculated indices should be stored. The current approach risks getting different sets of indices returned when fit and transform are called.

>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.compose import ColumnTransformer
>>> def get_all(X):
...     return np.arange(X.shape[1])
...
>>> trans = ColumnTransformer([('foobar', StandardScaler(), get_all)])
>>> trans.fit(np.array([[1., 2, 3]]))

Expected:

>>> trans.transformers_
[('foobar', StandardScaler(copy=True, with_mean=True, with_std=True), array([0, 1, 2]))]

Actual:

>>> trans.transformers_
[('foobar', StandardScaler(copy=True, with_mean=True, with_std=True), <function <lambda> at 0x1811fa3048>)]

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions