[WIP] GradientBoostingClassifierCV without early stopping by raghavrv · Pull Request #8226 · scikit-learn/scikit-learn

raghavrv · 2017-01-24T00:30:20Z

Spin off from #7071 without the complications of early stopping API

This PR tries to implement just GradientBoostingClassifierCV (And I intend to restrict it to GBCCV / GBRCV alone without early stopping support). It takes advantage of the incremental boosting stages and for the same performance is much faster than GridSearchCV.

Results

Code for the plot - https://gist.github.com/raghavrv/21d59453de5c6890c89e9f907bcd4044

Thanks @agramfort for IRL discussions leading to this simpler PR!!

Also ping @amueller, @jnothman, @vighneshbirodkar, @ogrisel and @pprett

TODO

Add tests
~~Polish example's doc~~ Remove example
GradientBoostingRegressorCV
Tests for GBRCV.
Add links to this at relevant places in doc
Post memory comparison with GridSearchCV

jnothman · 2017-01-24T03:19:06Z

sklearn/ensemble/gradient_boosting.py

+            Sample groups for the cross-validation splitter.
+        """
+        if isinstance(self.cv_n_estimators, (numbers.Integral, np.integer)):
+            print('heee')


Arghh sorry forgot to remove the scaffold :@

jnothman

I think this is probably useful in practice. However, I think adding a use_warm_start parameter to GridSearchCV would automatically handle this case, RandomForests, SGD, etc. without defining a new API. WDYT?

Otherwise, please add this to the list in doc/modules/grid_search.rst, and to appropriate "see also"s.

jnothman · 2017-01-24T11:02:30Z

examples/ensemble/plot_gradient_boosting_cv_vs_grid_search.py

@@ -0,0 +1,77 @@
+"""


I'm not convinced this example is worth having. A nice benchmark, but users don't gain from playing with it; as much can be said ("Will always improve performance over GridSearchCV for searching over n_estimators") in narrative docs and what's new.

Ok, I'll add it as a gist snippet at the PR description...

jnothman · 2017-01-24T11:03:22Z

sklearn/ensemble/gradient_boosting.py

+        """
+        if isinstance(self.cv_n_estimators, (numbers.Integral, np.integer)):
+            print('heee')
+            cv_n_estimators = np.array([self.cv_n_estimators, ], dtype=np.int)


perhaps this case should be interpreted as range(1, cv_n_estimators + 1)

Ah yea. That would be more useful...

jnothman · 2017-01-24T11:20:24Z

sklearn/ensemble/gradient_boosting.py

+        learning rate shrinks the contribution of each tree by `learning_rate`.
+        There is a trade-off between learning_rate and n_estimators.
+
+    cv_n_estimators : int or array-like of shape (n_cv_stages), (default=100)


Do we use this cv_ prefix elsewhere for *CV objects? We use Cs, alphas, etc. That convention is hard to adopt here. learning_curve uses param_range, and I think n_estimators_range would be okay here.

+1 for n_estimators_range... Thx!

jnothman · 2017-01-24T11:20:54Z

sklearn/ensemble/gradient_boosting.py

+        estimator.set_params(n_estimators=n_estimators)
+        estimator.fit(X_train, y_train, sample_weight=weight_train)
+        all_stage_scores[i] = scorer(estimator, X_test, y_test,
+                                     sample_weight=weight_test)


This use of sample_weight differs from GridSearchCV. Might be best to leave it out for now, else to note it prominently.

But eventually we would want to support sample_weights at GridSearchCV no? I'll push a commit which documents this... Let me know if you still want it removed... (Moreover the GradientBoostingClassifier.fit supports sw, and users might expect a similar interface maybe?)

raghavrv · 2017-01-24T14:27:12Z

Closing in favor of #8230

raghavrv · 2017-01-25T21:44:59Z

@jnothman From recent discussions, I think this is simpler and also easier to use...

raghavrv · 2017-01-25T21:45:21Z

I'm opening this now. I'll address your comments on this soon...

jnothman · 2017-01-26T06:23:01Z

We support sample weights to grid search that are off to fit but not to score.

…

On 26 Jan 2017 8:48 am, "(Venkat) Raghav (Rajagopalan)" < ***@***.***> wrote: Reopened #8226 <#8226>. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#8226 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6yYzMroxNB9knK_7jraRPiFiqqKSks5rV8MngaJpZM4LruL2> .

raghavrv · 2017-01-30T16:23:27Z

Have removed example and addressed your comments. Will add tests, clean up docs and ping you back!

raghavrv · 2017-02-13T17:12:13Z

Closing again in favor of updated #8230 ;)

jnothman · 2017-02-13T22:47:55Z

Hahaha you can reopen to implement early stopping...?

raghavrv · 2017-02-13T23:27:15Z

:p I was wondering about it... What would be the API? Do we perform cv or specify a validation set?

raghavrv · 2017-02-13T23:29:13Z

Also ping @agramfort! :) we discussed about this and decided to split it into two different problems the GradientBoostingCV part and the early stopping part... Now with #8230, using warm_start is solved... What do you think is the way to go for early stopping?

raghavrv added 2 commits January 24, 2017 01:22

FEATURE Add GradientBoostingClassifierCV without early stopping

cd16d6e

EXA Add example to compare speed with GridSearchCV

a1fb607

raghavrv requested review from agramfort, amueller, jnothman and ogrisel January 24, 2017 00:31

raghavrv mentioned this pull request Jan 24, 2017

[MRG+2] Early stopping for Gradient Boosting Classifier/Regressor #7071

Merged

jnothman reviewed Jan 24, 2017

View reviewed changes

jnothman mentioned this pull request Jan 24, 2017

[MRG] GridSearchCV.use_warm_start parameter for efficiency #8230

Open

5 tasks

raghavrv closed this Jan 24, 2017

raghavrv reopened this Jan 25, 2017

raghavrv added 5 commits January 30, 2017 17:14

Remove scafolding. Add Vighnesh's and my name to the authors

92bbedb

Use range(1, cv_n_estimators) if given as an int

3ad9f4a

cv_n_estimators --> n_estimators_range

a831151

Do not pass sample_weights to the scorer

45bf87b

Remove example

131f09e

raghavrv closed this Feb 13, 2017

Uh oh!

Conversation

raghavrv commented Jan 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results

TODO

Uh oh!

jnothman Jan 24, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavrv commented Jan 24, 2017

Uh oh!

raghavrv commented Jan 25, 2017

Uh oh!

raghavrv commented Jan 25, 2017

Uh oh!

jnothman commented Jan 26, 2017 via email

Uh oh!

raghavrv commented Jan 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raghavrv commented Feb 13, 2017

Uh oh!

jnothman commented Feb 13, 2017

Uh oh!

raghavrv commented Feb 13, 2017

Uh oh!

raghavrv commented Feb 13, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

raghavrv commented Jan 24, 2017 •

edited

Loading

jnothman Jan 24, 2017 •

edited

Loading

raghavrv commented Jan 30, 2017 •

edited

Loading