When column is specified as a function, this should not be stored in transformers_. Rather, the calculated indices should be stored. The current approach risks getting different sets of indices returned when fit and transform are called.
>>> from sklearn.preprocessing import StandardScaler
>>> from sklearn.compose import ColumnTransformer
>>> def get_all(X):
... return np.arange(X.shape[1])
...
>>> trans = ColumnTransformer([('foobar', StandardScaler(), get_all)])
>>> trans.fit(np.array([[1., 2, 3]]))
Expected:
>>> trans.transformers_
[('foobar', StandardScaler(copy=True, with_mean=True, with_std=True), array([0, 1, 2]))]
Actual:
>>> trans.transformers_
[('foobar', StandardScaler(copy=True, with_mean=True, with_std=True), <function <lambda> at 0x1811fa3048>)]