[WIP] GradientBoostingClassifierCV without early stopping#8226
[WIP] GradientBoostingClassifierCV without early stopping#8226raghavrv wants to merge 7 commits intoscikit-learn:masterfrom
Conversation
| Sample groups for the cross-validation splitter. | ||
| """ | ||
| if isinstance(self.cv_n_estimators, (numbers.Integral, np.integer)): | ||
| print('heee') |
There was a problem hiding this comment.
Arghh sorry forgot to remove the scaffold :@
jnothman
left a comment
There was a problem hiding this comment.
I think this is probably useful in practice. However, I think adding a use_warm_start parameter to GridSearchCV would automatically handle this case, RandomForests, SGD, etc. without defining a new API. WDYT?
Otherwise, please add this to the list in doc/modules/grid_search.rst, and to appropriate "see also"s.
| @@ -0,0 +1,77 @@ | |||
| """ | |||
There was a problem hiding this comment.
I'm not convinced this example is worth having. A nice benchmark, but users don't gain from playing with it; as much can be said ("Will always improve performance over GridSearchCV for searching over n_estimators") in narrative docs and what's new.
There was a problem hiding this comment.
Ok, I'll add it as a gist snippet at the PR description...
| """ | ||
| if isinstance(self.cv_n_estimators, (numbers.Integral, np.integer)): | ||
| print('heee') | ||
| cv_n_estimators = np.array([self.cv_n_estimators, ], dtype=np.int) |
There was a problem hiding this comment.
perhaps this case should be interpreted as range(1, cv_n_estimators + 1)
There was a problem hiding this comment.
Ah yea. That would be more useful...
| learning rate shrinks the contribution of each tree by `learning_rate`. | ||
| There is a trade-off between learning_rate and n_estimators. | ||
|
|
||
| cv_n_estimators : int or array-like of shape (n_cv_stages), (default=100) |
There was a problem hiding this comment.
Do we use this cv_ prefix elsewhere for *CV objects? We use Cs, alphas, etc. That convention is hard to adopt here. learning_curve uses param_range, and I think n_estimators_range would be okay here.
There was a problem hiding this comment.
+1 for n_estimators_range... Thx!
| estimator.set_params(n_estimators=n_estimators) | ||
| estimator.fit(X_train, y_train, sample_weight=weight_train) | ||
| all_stage_scores[i] = scorer(estimator, X_test, y_test, | ||
| sample_weight=weight_test) |
There was a problem hiding this comment.
This use of sample_weight differs from GridSearchCV. Might be best to leave it out for now, else to note it prominently.
There was a problem hiding this comment.
But eventually we would want to support sample_weights at GridSearchCV no? I'll push a commit which documents this... Let me know if you still want it removed... (Moreover the GradientBoostingClassifier.fit supports sw, and users might expect a similar interface maybe?)
|
Closing in favor of #8230 |
|
@jnothman From recent discussions, I think this is simpler and also easier to use... |
|
I'm opening this now. I'll address your comments on this soon... |
|
We support sample weights to grid search that are off to fit but not to
score.
…On 26 Jan 2017 8:48 am, "(Venkat) Raghav (Rajagopalan)" < ***@***.***> wrote:
Reopened #8226 <#8226>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#8226 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yYzMroxNB9knK_7jraRPiFiqqKSks5rV8MngaJpZM4LruL2>
.
|
|
Have removed example and addressed your comments. Will add tests, clean up docs and ping you back! |
|
Closing again in favor of updated #8230 ;) |
|
Hahaha you can reopen to implement early stopping...? |
|
:p I was wondering about it... What would be the API? Do we perform cv or specify a validation set? |
|
Also ping @agramfort! :) we discussed about this and decided to split it into two different problems the GradientBoostingCV part and the early stopping part... Now with #8230, using |
Spin off from #7071 without the complications of early stopping API
This PR tries to implement just
GradientBoostingClassifierCV(And I intend to restrict it to GBCCV / GBRCV alone without early stopping support). It takes advantage of the incremental boosting stages and for the same performance is much faster thanGridSearchCV.Results
Code for the plot - https://gist.github.com/raghavrv/21d59453de5c6890c89e9f907bcd4044
Thanks @agramfort for IRL discussions leading to this simpler PR!!
Also ping @amueller, @jnothman, @vighneshbirodkar, @ogrisel and @pprett
TODO
Polish example's docRemove exampleGradientBoostingRegressorCVGBRCV.GridSearchCV