Skip to content

The validate=True default is inconvenient for quick function transformers #10648

@memeplex

Description

@memeplex

All the column extractor examples I have seen start from scratch using a TransformerMixin and implementing some smelly boilerplate (mainly a dummy fit method and a constructor that just "remembers" the parameters to be used by transform).

But it's simpler than that! FunctionTransformer(lambda X: X[col], validate=False). This easily generalizes to many real life lambda one-line transformers: no need to implement a dummy fit, not need to do the vintage one-method class trick.

One obvious problem is that many of these ultra simple transformers are expecting dicts or dataframes or whatever, instead of raw arrays, so the need to specify validate=False every time is annoying both to the writer and to the reader.

Would you consider adding a simple make_transformer(...) interface that promotes sweet and short transformers so that people start writing:

make_transformer(lambda df: df['age'])

instead of:

class ColumnExtractor(TransformerMixin):
 
    def __init__(self, col):
        self.col = col

    def transform(self, X, y=None):
        return X[col]

    def fit(self, X, y=None):
        return self

And also, could you add an example (or modify the existing one) showing how to extract a column using a one line transformer (with validate=False or not) instead of a giant class (including comments) to do the same?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions