Skip to content

[MRG] accelerate plot_gradient_boosting_regularization.py example #21598#21611

Merged
adrinjalali merged 4 commits intoscikit-learn:mainfrom
sply88:accelerate-plot_gradient_boosting_regularization.py
Nov 29, 2021
Merged

[MRG] accelerate plot_gradient_boosting_regularization.py example #21598#21611
adrinjalali merged 4 commits intoscikit-learn:mainfrom
sply88:accelerate-plot_gradient_boosting_regularization.py

Conversation

@sply88
Copy link
Copy Markdown
Contributor

@sply88 sply88 commented Nov 9, 2021

Speeds up ../examples/ensemble/plot_gradient_boosting_regularization.py (Issue #21598) by

  • reducing number of samples in train and test datasets from 2000 to 1500
  • reducing n_estimators from 1000 to 600

Reduction of n_estimators is compensated by increasing the learning rate from 0.1 to 0.2 (for models with shrinkage).

For me example runs in 13 sec now (previously plus 30).

Main message of final figure does not change:
image

@sply88 sply88 changed the title accelerate plot_gradient_boosting_regularization.py example #21598 [MRG] accelerate plot_gradient_boosting_regularization.py example #21598 Nov 9, 2021
@adrinjalali adrinjalali mentioned this pull request Nov 10, 2021
41 tasks
@adrinjalali
Copy link
Copy Markdown
Member

If the final output hasn't changed, we may be able to push further and speed up the example even more. Thanks for the work @sply88

@sply88
Copy link
Copy Markdown
Contributor Author

sply88 commented Nov 11, 2021

Original figure in example looks like this:
image

Could speed it up a bit more to around 9s by only using 400 boosting iterations. So the x-Axis of the figure in my original PR comment would end at 400 and the yellow and blue lines would not cross anymore. I don't think this would be a big issue because it would still be obvious that shrinkage is good and no-shrinkage (e.g. blue and yellow lines) is bad.
What do you think @adrinjalali?

Comment on lines +41 to +42
X_train, X_test = X[:1500], X[1500:]
y_train, y_test = y[:1500], y[1500:]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also try reducing the number of samples in the make_hastie_10_2 line above

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have reduced both number of samples and number of estimators to get down to 5s.
Output below. Main message is still obvious I think.
image

Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @sply88 !



X, y = datasets.make_hastie_10_2(n_samples=12000, random_state=1)
X, y = datasets.make_hastie_10_2(n_samples=3000, random_state=1)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think of setting the following settings?

from sklearn.model_selection import train_test_split

X, y = datasets.make_hastie_10_2(n_samples=4000, random_state=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.8, random_state=0)

original_params = {
    "n_estimators": 400,
    ...
}

It looks like it keeps a very similar message as the original:

Figure_1

Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update @sply88 !

LGTM

@adrinjalali adrinjalali merged commit f19bf4c into scikit-learn:main Nov 29, 2021
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
…t-learn#21598 (scikit-learn#21611)

* accelerate plot_gradient_boosting_regularization.py example scikit-learn#21598

* speed up by less samples and less trees

* use train_test_split instead of slicing
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
…t-learn#21598 (scikit-learn#21611)

* accelerate plot_gradient_boosting_regularization.py example scikit-learn#21598

* speed up by less samples and less trees

* use train_test_split instead of slicing
glemaitre pushed a commit that referenced this pull request Dec 25, 2021
#21611)

* accelerate plot_gradient_boosting_regularization.py example #21598

* speed up by less samples and less trees

* use train_test_split instead of slicing
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants