-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
The validate=True default is inconvenient for quick function transformers #10648
Description
All the column extractor examples I have seen start from scratch using a TransformerMixin and implementing some smelly boilerplate (mainly a dummy fit method and a constructor that just "remembers" the parameters to be used by transform).
But it's simpler than that! FunctionTransformer(lambda X: X[col], validate=False). This easily generalizes to many real life lambda one-line transformers: no need to implement a dummy fit, not need to do the vintage one-method class trick.
One obvious problem is that many of these ultra simple transformers are expecting dicts or dataframes or whatever, instead of raw arrays, so the need to specify validate=False every time is annoying both to the writer and to the reader.
Would you consider adding a simple make_transformer(...) interface that promotes sweet and short transformers so that people start writing:
make_transformer(lambda df: df['age'])
instead of:
class ColumnExtractor(TransformerMixin):
def __init__(self, col):
self.col = col
def transform(self, X, y=None):
return X[col]
def fit(self, X, y=None):
return self
And also, could you add an example (or modify the existing one) showing how to extract a column using a one line transformer (with validate=False or not) instead of a giant class (including comments) to do the same?