[MRG] DOC add examples to GradientBoostingClassifier and GradientBoostingRegressor#15151
Conversation
NicolasHug
left a comment
There was a problem hiding this comment.
Thanks for the PR @pspachtholz .
I think we usually have much simpler examples for the docstring examples (see e.g. the other GBDT examples). Yours looks like something we would generally have in the user guide.
| >>> X, y = make_classification(n_samples=20, random_state=22) | ||
| >>> X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=5, | ||
| ... random_state=22) | ||
| >>> clf = GradientBoostingClassifier(n_estimators=2, max_depth=1, |
There was a problem hiding this comment.
Why not just use the default n_estimators and max_depth? 2 and 1 aren't good values in general
There was a problem hiding this comment.
I chose these values mainly to demonstrate warm starting and staged prediction. With more estimators the output of staged_predict would have been very confusing.
If we keep the example simpler (which I will do based on your suggestion) then of course these values make no good sense.
By the way is there an example about warm starting in general? I found this but its quite hidden in there. So if not it may be worth to add one.
There was a problem hiding this comment.
There is one in the UG but it's quite hard to find I would agree https://scikit-learn.org/dev/modules/ensemble.html#fitting-additional-weak-learners
NicolasHug
left a comment
There was a problem hiding this comment.
some comments but looks good
| GradientBoostingClassifier() | ||
| >>> clf.predict(X_test) | ||
| array([...]) | ||
| >>> clf.score(X_test, y_test) # doctest: +SKIP |
There was a problem hiding this comment.
why skip doctest? maybe pass a random seed to train_test_split for the scores to be deterministic
There was a problem hiding this comment.
thank you for your feedback. made both examples deterministic.
|
@NicolasHug Tests are failing only on pylatest. Do you have an idea why output score is non-deterministic there despite seeting all random_states, while for the other environments it is? |
I'm quite puzzled here :/ @thomasjpfan you're the CI king now, any idea about the randomness? (Results are consistently the same on my laptop) |
|
@adrinjalali this is still open there is also #15197, also for GradientBoostingRegressor, which was closed because this one was still open. |
Reference Issues/PRs
towards #3846
What does this implement/fix? Explain your changes.
Adds examples to GradientBoostingClassifier and GradientBoostingRegressor, also demonstrating
warm starting.