[MRG +1] ColumnTransformer: store evaluated function column specifier during fit#12107
Conversation
|
@jnothman I think it would be nice to get this one in for 0.20 |
|
I agree it would be nice because that's kind of a bug :-/ |
|
It's more a design issue than bug I would say (there are many other places in sklearn that would also fail if your X changes between training and fitting), but since it is a nice feature to see the actual column selection and since it would be a change in behaviour later, certainly nice to get it in now. |
| continue | ||
|
|
||
| yield (name, trans, sub, get_weight(name)) | ||
| yield (name, trans, sub, get_weight(name), column) |
There was a problem hiding this comment.
to use _iter in _update_fitted_transformers I needed to add column in the output (as it is needed there), so therefore you see also changes in the diff for each place where _iter is called now, making the diff a bit bigger.
The reason to also use _iter there is to avoid repeating the code to interleave the validated column specifiers and to add the remainder.
There was a problem hiding this comment.
FWIW (cosmetics), I don't think we any longer need to have X passed into this method, or its subset passed
out. That feature is only used in one method (_fit_transform) where we could instead just call _get_columns.
There was a problem hiding this comment.
Good idea! Made the small change
| continue | ||
|
|
||
| yield (name, trans, sub, get_weight(name)) | ||
| yield (name, trans, sub, get_weight(name), column) |
There was a problem hiding this comment.
FWIW (cosmetics), I don't think we any longer need to have X passed into this method, or its subset passed
out. That feature is only used in one method (_fit_transform) where we could instead just call _get_columns.
We're not talking about X's dtypes or column names. You could have columns = |
|
@ogrisel or @glemaitre with a second look, this should be ready to merge |
|
LGTM as well, merged! |
|
@jnothman I let you update the backport PR accordingly. We need a backport bot :) |
* tag '0.20.0': (77 commits) ColumnTransformer generalization to work on empty lists (scikit-learn#12084) add sparse_threshold to make_columns_transformer (scikit-learn#12152) [MRG] Convert ColumnTransformer input list to numpy array (scikit-learn#12104) Change version to 0.20.0 BUG: check equality instead of identity in check_cv (scikit-learn#12155) [MRG] Fix FutureWarnings in logistic regression examples (scikit-learn#12114) [MRG] Update test_metaestimators to pass y parameter when calling score (scikit-learn#12089) DOC Removed duplicated doc in tree.rst (scikit-learn#11922) [MRG] DOC covariance doctest examples (scikit-learn#12124) typo and formatting fixes in 0.20 doc (scikit-learn#11963) DOC Replaced the deprecated early_stopping parameter with n_iter_no_change. (scikit-learn#12133) [MRG +1] ColumnTransformer: store evaluated function column specifier during fit (scikit-learn#12107) Fix typo (scikit-learn#12126) DOC Typo in OneHotEncoder DOC Update fit_transform docstring of OneHotEncoder (scikit-learn#12117) DOC Removing quotes from variant names. (scikit-learn#12113) DOC BaggingRegressor missing default value for oob_score in docstring (scikit-learn#12108) [MRG] MNT Re-enable PyPy CI (scikit-learn#12039) MNT Only checks warnings on latest depedendencies versions in CI (scikit-learn#12048) TST Ignore warnings in common test to avoid collection errors (scikit-learn#12093) ...
* releases: (77 commits) ColumnTransformer generalization to work on empty lists (scikit-learn#12084) add sparse_threshold to make_columns_transformer (scikit-learn#12152) [MRG] Convert ColumnTransformer input list to numpy array (scikit-learn#12104) Change version to 0.20.0 BUG: check equality instead of identity in check_cv (scikit-learn#12155) [MRG] Fix FutureWarnings in logistic regression examples (scikit-learn#12114) [MRG] Update test_metaestimators to pass y parameter when calling score (scikit-learn#12089) DOC Removed duplicated doc in tree.rst (scikit-learn#11922) [MRG] DOC covariance doctest examples (scikit-learn#12124) typo and formatting fixes in 0.20 doc (scikit-learn#11963) DOC Replaced the deprecated early_stopping parameter with n_iter_no_change. (scikit-learn#12133) [MRG +1] ColumnTransformer: store evaluated function column specifier during fit (scikit-learn#12107) Fix typo (scikit-learn#12126) DOC Typo in OneHotEncoder DOC Update fit_transform docstring of OneHotEncoder (scikit-learn#12117) DOC Removing quotes from variant names. (scikit-learn#12113) DOC BaggingRegressor missing default value for oob_score in docstring (scikit-learn#12108) [MRG] MNT Re-enable PyPy CI (scikit-learn#12039) MNT Only checks warnings on latest depedendencies versions in CI (scikit-learn#12048) TST Ignore warnings in common test to avoid collection errors (scikit-learn#12093) ...
* dfsg: (77 commits) ColumnTransformer generalization to work on empty lists (scikit-learn#12084) add sparse_threshold to make_columns_transformer (scikit-learn#12152) [MRG] Convert ColumnTransformer input list to numpy array (scikit-learn#12104) Change version to 0.20.0 BUG: check equality instead of identity in check_cv (scikit-learn#12155) [MRG] Fix FutureWarnings in logistic regression examples (scikit-learn#12114) [MRG] Update test_metaestimators to pass y parameter when calling score (scikit-learn#12089) DOC Removed duplicated doc in tree.rst (scikit-learn#11922) [MRG] DOC covariance doctest examples (scikit-learn#12124) typo and formatting fixes in 0.20 doc (scikit-learn#11963) DOC Replaced the deprecated early_stopping parameter with n_iter_no_change. (scikit-learn#12133) [MRG +1] ColumnTransformer: store evaluated function column specifier during fit (scikit-learn#12107) Fix typo (scikit-learn#12126) DOC Typo in OneHotEncoder DOC Update fit_transform docstring of OneHotEncoder (scikit-learn#12117) DOC Removing quotes from variant names. (scikit-learn#12113) DOC BaggingRegressor missing default value for oob_score in docstring (scikit-learn#12108) [MRG] MNT Re-enable PyPy CI (scikit-learn#12039) MNT Only checks warnings on latest depedendencies versions in CI (scikit-learn#12048) TST Ignore warnings in common test to avoid collection errors (scikit-learn#12093) ...
Fixes #12097