[MRG] Gradient Boosting Classifier CV by vighneshbirodkar · Pull Request #5689 · scikit-learn/scikit-learn

vighneshbirodkar · 2015-11-02T20:21:22Z

A reincarnation of #1036 with early stopping. I will add the documentation and an example soon
The early stopping API is inspired by that of xgboost

amueller · 2015-11-02T20:39:55Z

awesome! I have to work on the release at the moment, but I'll review shortly.
Maybe @pprett @arjoly or @glouppe are interested.

glouppe · 2015-11-03T07:40:57Z

Thanks for working on this! This is indeed a very welcome addition :)

glouppe · 2015-11-03T07:41:11Z

ping us when you want us to start reviewing

vighneshbirodkar · 2015-11-05T17:59:08Z

@glouppe @arjoly @pprett @amueller
I think this is ready for an initial review, I want to get the structure right before I add the example and the GradientBoostingRegressionCV class. Here are some things that still need to be taken care of.

Support sparse data
For each set of parameters, I am currently returning the mean of n_estimators from the cross validation set. Is there a better way to do this ? Maybe max ?
Should there be another class for regression ? I could do both cases by taking a base_model parameter in the constructor.

arjoly · 2015-11-06T16:20:40Z

examples/applications/plot_stock_market.py

I don't think this line should be modified.

arjoly · 2015-11-06T16:30:16Z

You don't seem to use the warm option which is crucial to get good speed performance.

vighneshbirodkar · 2015-11-06T17:08:49Z

sklearn/ensemble/gradient_boosting.py

@arjoly the warm_start parameter

should we set this True by default?

clarified IRL

amueller · 2015-11-12T22:04:52Z

I'm surprised this didn't fail the tests with kwargs

vighneshbirodkar · 2015-11-14T05:49:30Z

@glouppe @arjoly @pprett @amueller
I don't know if these tests are sufficient. predict from the new classifier is the same as GridSearchCV, but predict_proba is not

vighneshbirodkar · 2015-11-20T03:37:51Z

I have absolutely no idea why these tests are failing on Python 3.5. I haven't removed or changed any part of the code, just added. Moreover, I ran the tests locally in a Python 3.5 conda environment and all tests passed.

Any ideas @MechCoder @amueller ?

vighneshbirodkar · 2015-12-08T19:56:38Z

Here is the plot generated by the example

raghavrv · 2016-05-11T16:23:03Z

I'm gonna give a shot at reviewing this!

vighneshbirodkar · 2016-05-11T19:53:03Z

@rvraghav93 Please, be my guest.

raghavrv · 2016-05-12T09:57:18Z

Should there be another class for regression ? I could do both cases by taking a base_model parameter in the constructor.

I think separate classes would be the way to go.

raghavrv · 2016-05-12T11:25:30Z

predict from the new classifier is the same as GridSearchCV, but predict_proba is not

I fail to understand how. Could you explain a bit?

raghavrv · 2016-05-12T14:17:52Z

sklearn/ensemble/gradient_boosting.py

+              cross-validation folds
+            * ``cv_validation_scores``, the list of scores for each fold
+
+    best_score_tuple_ : named tuple


Why is this different from GridSearchCV? (not best_score_)

Cause apart from the score, I thought of adding attributes telling the user how his/her score varies with with train/test split. For example, if it varies significantly, something might be wrong

vighneshbirodkar · 2016-05-12T18:26:34Z

@rvraghav93
I was referring to the tests. What I meant is the the result of predict from grid search and my class is the same, but the result of predict_proba is not

raghavrv · 2016-05-12T19:00:26Z

Ah okay that makes sense now... But still the predict_proba result is expected to be the same IIUC?

BTW are you in the middle of exams? If so I could ping you later.

vighneshbirodkar · 2016-05-12T19:15:20Z

@rvraghav93
No I'm not. What's IIUC ?

raghavrv · 2016-05-12T19:16:35Z

If I Understand Correctly.

vighneshbirodkar · 2016-05-12T19:39:16Z

@rvraghav93 I am not sure, because both the classes use significantly different number of estimators.

ogrisel · 2016-06-03T16:31:34Z

sklearn/ensemble/gradient_boosting.py

+    Parameters
+    ----------
+    n_stop_rounds : int, optional, default=10
+        If the score on the test set rounded off to `score_precision` decimal


I would use rather "validation set" instead of "test set" throughout this class. This class is about cross-validation for model selection. We are not allowed to use the "final" test set here as our interest is not model evaluation per-se but model selection.

ogrisel · 2016-06-13T09:47:15Z

Also could you please add GradientBoostingRegressorCV ? I see no reason to include only classification and not regression in this PR.

raghavrv · 2016-07-16T07:29:35Z

@vighneshbirodkar Are you planning to continue this anytime soon? Or if it's okay by you I could lend a hand. (Either cherry-picking your commits or push access to this branch?). I have a couple of weeks to work on this.

vighneshbirodkar · 2016-07-17T02:23:47Z

@raghavrv. I don't think I will be working on this anytime soon. Which of these 2 ways is more convenient to you ? Most of Olivier's comments are minor refactoring changes. About the benchmarking for threading backend, do you plan to do it now ?

raghavrv · 2016-07-18T00:03:09Z

If it's okay by you I can cherry-pick into a new PR?

About the benchmarking for threading backend, do you plan to do it now ?

Yes...

raghavrv · 2016-07-21T12:57:59Z

@jnothman Should we also use a dict of np (ma) arrays here like we did at GridSearchCV instead of using the _CVScoreTuple?

raghavrv · 2016-07-21T17:10:50Z

Also ping @amueller reg this...

raghavrv · 2016-07-25T13:49:08Z

sklearn/ensemble/gradient_boosting.py

+    params['warm_start'] = True
+    gb = estimator(**params)
+    scorer = check_scoring(estimator, scoring=scoring)
+    scores = np.ones((stop_rounds,))


Shouldn't this be np.full((stop_rounds,), -np.inf)?

Zhuifeng414 · 2018-06-01T14:54:57Z

wonderful!!!
give me lots of help, thank you .

amueller · 2018-09-27T02:00:59Z

replaced by #7071

arjoly reviewed Nov 6, 2015
View reviewed changes

examples/applications/plot_stock_market.py Outdated

Copy link
Copy Markdown

Member

arjoly Nov 6, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this line should be modified.

vighneshbirodkar reviewed Nov 6, 2015
View reviewed changes

vighneshbirodkar closed this Nov 16, 2015

vighneshbirodkar reopened this Nov 16, 2015

vighneshbirodkar force-pushed the gbcv branch from 42f0fd7 to 3620f90 Compare November 19, 2015 23:58

This was referenced Nov 23, 2015

Segmentation fault with TSNE fit_transform #5903

Closed

TSNE memory leak #5916

Closed

raghavrv mentioned this pull request Dec 2, 2015

WIP GBRT with built-in cross-validation #1036

Closed

Added GradientBoostingClassifierCV with unit tests

5509e46

vighneshbirodkar force-pushed the gbcv branch from 8366e44 to 5509e46 Compare December 4, 2015 19:35

Added classname to __init__

6a6d324

vighneshbirodkar mentioned this pull request Dec 8, 2015

[MRG+1] Fix memory leak in Barnes-Hut SNE #5983

Closed

Added example for early stopping

1ce4c2a

vighneshbirodkar added 2 commits December 8, 2015 15:27

Changed wording of example

8272b3e

made estimator_class private

6839363

docstring fixes and use atleast_1d

a6178fa

raghavrv reviewed May 12, 2016
View reviewed changes

ogrisel reviewed Jun 3, 2016
View reviewed changes

raghavrv mentioned this pull request Jul 24, 2016

[MRG+2] Early stopping for Gradient Boosting Classifier/Regressor #7071

Merged

raghavrv reviewed Jul 25, 2016
View reviewed changes

amueller closed this Sep 27, 2018

Uh oh!

Conversation

vighneshbirodkar commented Nov 2, 2015

Uh oh!

amueller commented Nov 2, 2015

Uh oh!

glouppe commented Nov 3, 2015

Uh oh!

glouppe commented Nov 3, 2015

Uh oh!

vighneshbirodkar commented Nov 5, 2015

Uh oh!

arjoly Nov 6, 2015

Choose a reason for hiding this comment

Uh oh!

arjoly commented Nov 6, 2015

Uh oh!

vighneshbirodkar Nov 6, 2015

Choose a reason for hiding this comment

Uh oh!

MechCoder Jan 16, 2016

Choose a reason for hiding this comment

Uh oh!

MechCoder Jan 16, 2016

Choose a reason for hiding this comment

Uh oh!

amueller commented Nov 12, 2015

Uh oh!

vighneshbirodkar commented Nov 14, 2015

Uh oh!

vighneshbirodkar commented Nov 20, 2015

Uh oh!

vighneshbirodkar commented Dec 8, 2015

Uh oh!

raghavrv commented May 11, 2016

Uh oh!

vighneshbirodkar commented May 11, 2016

Uh oh!

raghavrv commented May 12, 2016

Uh oh!

raghavrv commented May 12, 2016

Uh oh!

raghavrv May 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vighneshbirodkar May 12, 2016

Choose a reason for hiding this comment

Uh oh!

vighneshbirodkar commented May 12, 2016

Uh oh!

raghavrv commented May 12, 2016

Uh oh!

vighneshbirodkar commented May 12, 2016

Uh oh!

raghavrv commented May 12, 2016

Uh oh!

vighneshbirodkar commented May 12, 2016

Uh oh!

ogrisel Jun 3, 2016

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Jun 13, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

raghavrv commented Jul 16, 2016

Uh oh!

vighneshbirodkar commented Jul 17, 2016

Uh oh!

raghavrv commented Jul 18, 2016

Uh oh!

raghavrv commented Jul 21, 2016

Uh oh!

raghavrv commented Jul 21, 2016

Uh oh!

raghavrv Jul 25, 2016

Choose a reason for hiding this comment

Uh oh!

Zhuifeng414 commented Jun 1, 2018

Uh oh!

raghavrv May 12, 2016 •

edited

Loading

ogrisel commented Jun 13, 2016 •

edited

Loading