[MRG] Gradient Boosting Classifier CV#5689
[MRG] Gradient Boosting Classifier CV#5689vighneshbirodkar wants to merge 7 commits intoscikit-learn:masterfrom
Conversation
|
Thanks for working on this! This is indeed a very welcome addition :) |
|
ping us when you want us to start reviewing |
|
@glouppe @arjoly @pprett @amueller
|
There was a problem hiding this comment.
I don't think this line should be modified.
|
You don't seem to use the warm option which is crucial to get good speed performance. |
There was a problem hiding this comment.
should we set this True by default?
|
I'm surprised this didn't fail the tests with kwargs |
42f0fd7 to
3620f90
Compare
|
I have absolutely no idea why these tests are failing on Python 3.5. I haven't removed or changed any part of the code, just added. Moreover, I ran the tests locally in a Python 3.5 conda environment and all tests passed. Any ideas @MechCoder @amueller ? |
8366e44 to
5509e46
Compare
|
I'm gonna give a shot at reviewing this! |
|
@rvraghav93 Please, be my guest. |
I think separate classes would be the way to go. |
I fail to understand how. Could you explain a bit? |
| cross-validation folds | ||
| * ``cv_validation_scores``, the list of scores for each fold | ||
|
|
||
| best_score_tuple_ : named tuple |
There was a problem hiding this comment.
Why is this different from GridSearchCV? (not best_score_)
There was a problem hiding this comment.
Cause apart from the score, I thought of adding attributes telling the user how his/her score varies with with train/test split. For example, if it varies significantly, something might be wrong
|
@rvraghav93 |
|
Ah okay that makes sense now... But still the BTW are you in the middle of exams? If so I could ping you later. |
|
@rvraghav93 |
|
If I Understand Correctly. |
|
@rvraghav93 I am not sure, because both the classes use significantly different number of estimators. |
| Parameters | ||
| ---------- | ||
| n_stop_rounds : int, optional, default=10 | ||
| If the score on the test set rounded off to `score_precision` decimal |
There was a problem hiding this comment.
I would use rather "validation set" instead of "test set" throughout this class. This class is about cross-validation for model selection. We are not allowed to use the "final" test set here as our interest is not model evaluation per-se but model selection.
|
Also could you please add GradientBoostingRegressorCV ? I see no reason to include only classification and not regression in this PR. |
|
@vighneshbirodkar Are you planning to continue this anytime soon? Or if it's okay by you I could lend a hand. (Either cherry-picking your commits or push access to this branch?). I have a couple of weeks to work on this. |
|
@raghavrv. I don't think I will be working on this anytime soon. Which of these 2 ways is more convenient to you ? Most of Olivier's comments are minor refactoring changes. About the benchmarking for threading backend, do you plan to do it now ? |
|
If it's okay by you I can cherry-pick into a new PR?
Yes... |
|
@jnothman Should we also use a dict of np (ma) arrays here like we did at |
|
Also ping @amueller reg this... |
| params['warm_start'] = True | ||
| gb = estimator(**params) | ||
| scorer = check_scoring(estimator, scoring=scoring) | ||
| scores = np.ones((stop_rounds,)) |
There was a problem hiding this comment.
Shouldn't this be np.full((stop_rounds,), -np.inf)?
|
wonderful!!! |
|
replaced by #7071 |

A reincarnation of #1036 with early stopping. I will add the documentation and an example soon
The early stopping API is inspired by that of xgboost