[MRG+1] Cleaning for fast partial dependence computation by NicolasHug · Pull Request #13738 · scikit-learn/scikit-learn

NicolasHug · 2019-04-27T19:57:41Z

Partial dependence computation can be optimized for trees.
Currently, only GradientBoostingClassifier and GradientBoostingRegressor support the optimized method. There's no reason for this, and I'm planning to expand fast PD computation to the other tree estimators.

This is a preliminary PR that:

moves the _partial_dependence_tree helper from the GBDT code to a method of the Tree class.
clean the fast PD computation code to make it simpler and more maintainable.

Ping @thomasjpfan @glemaitre in case you'd be interested in reviewing this ;)

glemaitre · 2019-04-27T20:51:50Z

I'll take a look from Monday ;)

glemaitre

LGTM. I find it much more readable than the previous version.

glemaitre · 2019-04-29T13:19:12Z

Is it actually straightforward to expand it for random forest. I would have thought that combining the weight of each tree would be different than in the gradient boosting since this is not a sequential algorithm.

What do you have in mind to do so?

NicolasHug · 2019-04-29T13:28:37Z

I'm thinking of introducing a _compute_partial_dependence method for any tree-based estimator.

The _compute_partial_dependence of GBDTs would be the current _partial_dependence_recursion() function, and random forest would have a different behaviour.

In any case, a few other changes will be required, in particular in order to handle classifiers: for now, the fast PDP helper is only relevant for regression trees.

I'll need to think about it a little more, but I want to go incremental on this one, and maybe focus on regressors first.

glemaitre · 2019-04-29T13:54:38Z

OK this seems a good plan. I think going incremental would be great. We always have the non-recursive for the default case.

glemaitre · 2019-05-02T12:54:48Z

@ogrisel Would you have time to take a a look at this one.

jnothman

I can't immediately see substantial differences between the implementations, so I think this is good.

jnothman · 2019-05-02T21:50:19Z

sklearn/tree/_tree.pyx

        return arr
+
+
+    def _partial_dependence(self, DTYPE_t[:, ::1] X,


Should this be a public method of a private class?

TBH I don't know what the convention is here.
Some methods of the Tree class are public like predict or apply, some of them are private like _add_node.

Is _add_node ever called from outside tree building?

oh OK that's the convention. I'll make it public then

NicolasHug · 2019-05-02T22:00:49Z

Yes the code is pretty much the same, I mostly avoided the use of some weird patterns like value[current_node - root_node]

)

fast partial dep cleaning

290669d

minor pep8

cd1e4a3

glemaitre self-requested a review April 29, 2019 11:53

glemaitre approved these changes Apr 29, 2019

View reviewed changes

glemaitre changed the title ~~[MRG] Cleaning for fast partial dependence computation~~ [MRG+1] Cleaning for fast partial dependence computation Apr 29, 2019

glemaitre added the Waiting for Reviewer label May 2, 2019

NicolasHug mentioned this pull request May 2, 2019

[MRG] Fast PDPs for histogram-based GBDT #13769

Merged

jnothman reviewed May 2, 2019

View reviewed changes

jnothman approved these changes May 2, 2019

View reviewed changes

made method public

74e19c1

jnothman merged commit 4de404d into scikit-learn:master May 2, 2019

NicolasHug removed the Waiting for Reviewer label May 2, 2019

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request May 6, 2019

MNT Cleaning for fast partial dependence computation (scikit-learn#13738

8a99f1d

)

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

MNT Cleaning for fast partial dependence computation (scikit-learn#13738

d7e5794

)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[MRG+1] Cleaning for fast partial dependence computation#13738

[MRG+1] Cleaning for fast partial dependence computation#13738
jnothman merged 3 commits intoscikit-learn:masterfrom
NicolasHug:fast_partial_dep

NicolasHug commented Apr 27, 2019

Uh oh!

glemaitre commented Apr 27, 2019

Uh oh!

glemaitre left a comment

Uh oh!

glemaitre commented Apr 29, 2019

Uh oh!

NicolasHug commented Apr 29, 2019

Uh oh!

glemaitre commented Apr 29, 2019

Uh oh!

glemaitre commented May 2, 2019

Uh oh!

jnothman left a comment

Uh oh!

jnothman May 2, 2019

Uh oh!

NicolasHug May 2, 2019

Uh oh!

jnothman May 2, 2019

Uh oh!

NicolasHug May 2, 2019

Uh oh!

NicolasHug commented May 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

NicolasHug commented Apr 27, 2019

Uh oh!

glemaitre commented Apr 27, 2019

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Apr 29, 2019

Uh oh!

NicolasHug commented Apr 29, 2019

Uh oh!

glemaitre commented Apr 29, 2019

Uh oh!

glemaitre commented May 2, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman May 2, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug May 2, 2019

Choose a reason for hiding this comment

Uh oh!

jnothman May 2, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug May 2, 2019

Choose a reason for hiding this comment

Uh oh!

NicolasHug commented May 2, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants