WIP GBRT with built-in cross-validation#1036
WIP GBRT with built-in cross-validation#1036pprett wants to merge 7 commits intoscikit-learn:masterfrom
Conversation
There was a problem hiding this comment.
I wonder if the sub model instantiation should not be deferred to the fit call and the sub-estimator storing attribute be renamed to self.model_. That will require to store the model init kwargs as an attribute though.
There was a problem hiding this comment.
Actually that would break the __setattr__ and __getattr__ override so please ignore my previous comment.
|
Nice, this sounds like a nice feature for kaggle competitors :)
|
|
@ogrisel I am not entirely sure what you want to check. Just that GridSearchCV works? |
|
indeed, maye a custom test for GBRT*CV models then. The params for grid is model specific anyway. |
|
It may be nice to support tuning other parameters than |
|
thanks for the feedback! @mblondel I'm not a friend of supporting tuning other parameters; If computational resources are an issue, I'd use |
|
@pprett Would be great to wrap up this in a "Parameters selection tips" section in the narrative doc of the GBRT models. |
|
I agree but I would certainly end up copying Greg Ridgeways definitive guide to parameter selection in GBM (it is linked in the docs but it deserves more promotion) http://cran.r-project.org/web/packages/gbm/gbm.pdf . |
So, to summarize, you would use a sane default learning_rate and optimize n_estimators only in most cases?
I don't understand that. For me |
Sorry, I meant I'd use |
|
@pprett yes :) |
|
@mblondel please ignore my first response to your comment - I was thinking about this issue yesterday and it does make sense to wrap ** also described in this thread on the ML http://www.mail-archive.com/scikit-learn-general@lists.sourceforge.net/msg03395.html PS: I promise next time I'll think before I write |
|
No worries. To tune other parameters than One thing that worries me about using Supporting other parameters in |
|
2012/8/22 Mathieu Blondel notifications@github.com
thanks!
Peter Prettenhofer |
|
On Wed, Aug 22, 2012 at 01:18:28AM -0700, Mathieu Blondel wrote:
That's one reason why we need to be able to have cross-validation like G |
|
Did someone say api design? If we where able to pass the CV-like object the test-split of the grid-search, we'd be fine, right? |
|
Revived by @vighneshbirodkar in #5689 |
|
Fixed in #7071 |
Two new classes
GradientBoostingClassifierCVandGradientBoostingRegressorCVwhich pickn_estimatorsbased on cross-validation.GradientBoostingClassifierCVfits aGradientBoostingClassifierwithmax_estimatorsfor each fold; it picksn_estimatorsbased on the min deviance averaged over all test sets. Finally, it trains the model on the whole training set using the foundn_estimators.GradientBoostingClassifierCVis implemented as aGradientBoostingClassifierdecorator. It soley implementsfit, otherwise it delegates toGradientBoostingClassifier(see__getattr__and__setattr__).The current implementation might pose some problems if the client uses
isinstancerather than duck typing: aGradientBoostingClassifierCVinstance is not an instance ofGradientBoostingClassifier. I would really appreciate any remarks/feedback to this issue.I tried to adhere the interface of
RidgeCV.Additionally, I refactored the prediction routines in order to remove code duplication.
staged_predictandstaged_predict_probahas been added toGradientBoostingClassifier.Limitations:
n_estimtorsbased on deviance (no support for custom loss function yet) - is this needed?