[MRG+1] Cleaning for fast partial dependence computation#13738
[MRG+1] Cleaning for fast partial dependence computation#13738jnothman merged 3 commits intoscikit-learn:masterfrom
Conversation
|
I'll take a look from Monday ;) |
glemaitre
left a comment
There was a problem hiding this comment.
LGTM. I find it much more readable than the previous version.
|
Is it actually straightforward to expand it for random forest. I would have thought that combining the weight of each tree would be different than in the gradient boosting since this is not a sequential algorithm. What do you have in mind to do so? |
|
I'm thinking of introducing a The In any case, a few other changes will be required, in particular in order to handle classifiers: for now, the fast PDP helper is only relevant for regression trees. I'll need to think about it a little more, but I want to go incremental on this one, and maybe focus on regressors first. |
|
OK this seems a good plan. I think going incremental would be great. We always have the non-recursive for the default case. |
|
@ogrisel Would you have time to take a a look at this one. |
jnothman
left a comment
There was a problem hiding this comment.
I can't immediately see substantial differences between the implementations, so I think this is good.
sklearn/tree/_tree.pyx
Outdated
| return arr | ||
|
|
||
|
|
||
| def _partial_dependence(self, DTYPE_t[:, ::1] X, |
There was a problem hiding this comment.
Should this be a public method of a private class?
There was a problem hiding this comment.
TBH I don't know what the convention is here.
Some methods of the Tree class are public like predict or apply, some of them are private like _add_node.
There was a problem hiding this comment.
Is _add_node ever called from outside tree building?
There was a problem hiding this comment.
oh OK that's the convention. I'll make it public then
|
Yes the code is pretty much the same, I mostly avoided the use of some weird patterns like |
Partial dependence computation can be optimized for trees.
Currently, only GradientBoostingClassifier and GradientBoostingRegressor support the optimized method. There's no reason for this, and I'm planning to expand fast PD computation to the other tree estimators.
This is a preliminary PR that:
_partial_dependence_treehelper from the GBDT code to a method of theTreeclass.Ping @thomasjpfan @glemaitre in case you'd be interested in reviewing this ;)