[MRG] Fast PDPs for histogram-based GBDT#13769
[MRG] Fast PDPs for histogram-based GBDT#13769glemaitre merged 28 commits intoscikit-learn:masterfrom
Conversation
|
Ping me when this is ready to be reviewed ;-) |
…ikit-learn into fast_partial_dep_hist_gbdt
|
Ready @glemaitre ;) |
|
I have pushed an updated example to use the HistGradientBoosting regressor in the PDP example. I have also fixed the example to:
Here are the results: You can observe that the Neural Net PDP now agrees with the GBRT PDB for all the features. It was not the case with a weaker model. |
|
thanks!! |
|
The new plots match Figure 10.16 (p 374) of ESL II even better than the PDP plots of the non-histogram based GBRT. |
|
The |
ogrisel
left a comment
There was a problem hiding this comment.
Here a bunch of minor comments:
|
In the plots of the example above, I find it weird that the |
|
I found the cause of the shift: the offset in y is supposed to happen before the train / test split, otherwise it's either not taken into account or the r2 score cannot be interpreted easily :) |
This reverts commit f0f8641. Actually viridis is already good enough on recent matplotlib versions and we want to continue supporting older matplotlib versions.
|
Here is the rendering of the example: One can see that the links to the from sklearn.experimental import enable_hist_gradient_boosting # noqais present in The same problem appears in the API table of contents: https://58079-843222-gh.circle-artifacts.com/0/doc/modules/classes.html#module-sklearn.ensemble |
|
This is probably caused by #13824 that was merged to master concurrently. Let me try to merge master and fix in this PR. |
ogrisel
left a comment
There was a problem hiding this comment.
I addressed my own nitpicks. LGTM.
…st_partial_dep_hist_gbdt
…ikit-learn into fast_partial_dep_hist_gbdt
…st_partial_dep_hist_gbdt
|
I believe this PR is ready to merge. @glemaitre any further comment? |
|
I find the first plot a bit small: Would it be better to have 2 separate figures? Otherwise LGTM. |
|
They already are two images. It's sphinx-gallery that's displaying them side by side. But I agree this could be improved by having two code blocks: one for MLPRegressor and one for GBRT. Each figure would appear under the matching code block and that should improve readability. |
…st_partial_dep_hist_gbdt
|
made 2 code blocks, plots are now bigger |
|
Can you use tight layout or constraint layout to make the ylabels not overlap the plots? |
|
The new one looks slightly better, though it's hard to find something that works both for local and sphinx plots. |
|
The rendering is ok for now. Thanks @NicolasHug |


This PR implements fast partial dependence computation for the new histogram-based GBDTs.
Both
BaseGradientBoostingandBaseHistGradientBoostingnow have a_compute_partial_dependence_recursion()method.The cython code for computing PDPs of the histogram-based predictors is very similar to that of the regular trees.