Confusing pretty print repr for nested Pipeline

Taking the examples from the docs (https://scikit-learn.org/dev/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py) that involves some nested pipelines in columntransformer in pipeline

```py
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.linear_model import LogisticRegression

numeric_features = ['age', 'fare']
numeric_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='median')),
    ('scaler', StandardScaler())])

categorical_features = ['embarked', 'sex', 'pclass']
categorical_transformer = Pipeline(steps=[
    ('imputer', SimpleImputer(strategy='constant', fill_value='missing')),
    ('onehot', OneHotEncoder(handle_unknown='ignore'))])

preprocessor = ColumnTransformer(
    transformers=[
        ('num', numeric_transformer, numeric_features),
        ('cat', categorical_transformer, categorical_features)])

clf = Pipeline(steps=[('preprocessor', preprocessor),
                      ('classifier', LogisticRegression(solver='lbfgs'))])
```

The repr that you get for this pipeline:

```py
In [8]: clf
Out[8]: 
Pipeline(memory=None,
         steps=[('preprocessor',
                 ColumnTransformer(n_jobs=None, remainder='drop',
                                   sparse_threshold=0.3,
                                   transformer_weights=None,
                                   transformers=[('num',
                                                  Pipe...cept_scaling=1,
                                    l1_ratio=None, max_iter=100,
                                    multi_class='warn', n_jobs=None,
                                    penalty='l2', random_state=None,
                                    solver='lbfgs', tol=0.0001, verbose=0,
                                    warm_start=False))])
```

which I found very confusing: the outer pipeline seems to have only 1 step (the 'preprocessor', as the 'classifier' disappeared in the `...`).

It's probably certainly not easy to get a good repr in all cases, and for sure the old behaviour was even worse (it would show the first 'imputer' step of the pipeline inside the column transformer as if it was the second step of the outer pipeline ..). But just opening this issue as a data point for possible improvements.

Without knowing how the current repr is determined: ideally I would expect that, if the full repr is too long, we first try to trim it step per step of the outer pipeline, so that the structure of that outer pipeline is still visible. But that is easier to write than to code .. :)

cc @NicolasHug 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Confusing pretty print repr for nested Pipeline #13372

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Confusing pretty print repr for nested Pipeline #13372

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions