[MRG] ENH Add support for dataframe in PDP#14028
[MRG] ENH Add support for dataframe in PDP#14028thomasjpfan merged 110 commits intoscikit-learn:masterfrom
Conversation
|
Some questions regarding the expected behaviour:
|
|
If a pipeline is passed, we should be quantifying in terms of its input
features... If the user wants otherwise they should slice it.
|
After couple of iterations this morning is what I thought as well. This will make thing so difficult otherwise. |
|
Feel free to ping when you need reviews! (I'll look at #14035) |
|
I think this is good to be reviewed @thomasjpfan @NicolasHug I will probably add an example to illustrate to we can use the ColumnTransformer |
|
We would need to merge #15354 before then. |
|
I dropped the support and put a meaningful error message depending of the type of |
ogrisel
left a comment
There was a problem hiding this comment.
I started to review but I have now the following question:
ogrisel
left a comment
There was a problem hiding this comment.
LGTM.
I think the example https://scikit-learn.org/dev/auto_examples/inspection/plot_partial_dependence.html should be updated to use a data frame and to pass features names instead of feature indices to make it more readable. But this can be done in a later PR if your prefer. As you wish.
| features : list or array-like of int | ||
| features : array-like of {int, str} | ||
| The target features for which the partial dependency should be | ||
| computed. |
There was a problem hiding this comment.
Maybe make it explicit that it should be either a single feature or a pair of 2 features.
Also "target features" does not really mean anything.
May I suggest the following: "The feature or pair of interacting features for which the partial dependency should be computed."
|
I updated the examples. However, be aware that we still have to specify |
|
@NicolasHug Can we merge this one. |
|
@thomasjpfan as well :P |
|
Thanks @glemaitre ! |
build on:
This PR adds support for:
ColumnTransformeras a preprocessing within aPipeline.Additional features for future PRs:
PolynomialFeatures) and would be interested in the PDP of these new features. It seems that we will need support forget_feature_namesfor the pipeline to ease such use case.