DOC plot gradient boosting regression changed to diabetes dataset by maikia · Pull Request #16400 · scikit-learn/scikit-learn

maikia · 2020-02-06T11:15:18Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Exchanged the Boston dataset for Breast cancer dataset (although it is classification dataset it does not matter in this example)

Improved the comments and the layout of the example

The figure
Before:

After:

Any other comments?

…into boston_plot_gradient_boosting_regression

ogrisel

Thanks for the contribution @maika. Overall LGTM. Here are a few suggestions.

One could also add the permutation feature importance to the plot to address the last remaining point of #14528. Feel free to do this as part of this PR or keep it for a later PR (as you prefer).

examples/ensemble/plot_gradient_boosting_regression.py

…into boston_plot_gradient_boosting_regression

Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

…thub.com/maikia/scikit-learn into boston_plot_gradient_boosting_regression

Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

…thub.com/maikia/scikit-learn into boston_plot_gradient_boosting_regression

maikia · 2020-02-17T11:32:00Z

Thanks for the contribution @maika. Overall LGTM. Here are a few suggestions.

One could also add the permutation feature importance to the plot to address the last remaining point of #14528. Feel free to do this as part of this PR or keep it for a later PR (as you prefer).

Thanks @ogrisel
I will keep the permutation feature importance for the next PR once this one is accepted :-)

ogrisel

LGTM! Thanks @maikia.

ogrisel · 2020-02-19T10:37:34Z

For the next PR it would be interesting to:

add a short paragraph to suggest to use HistGradientBoostingClassifier instead of GradientBoostingClassifier for larger datasets (e.g. more than 10000 observations);
compare results of impurity-based feature importance, permutation feature importance and shap-based feature importances: with shap.summary_plot(shap_values_test, X_test, plot_type="bar").

This requires an external dependency on shap but I think this is fine in examples.

ogrisel

Actually we have a problem:

The Breast cancer dataset is a binary classification task. Target is 0-1. So instead it would make more sense to use a GradientBoostingClassifier.

The problem is that we would need to change the title to "Gradient Boosting Regression Trees for classification" or something. But then the filename (plot_gradient_boosting_regression.py) is misleading but changing it would also change the URL of the example break some links to our documentation...

i am not sure what to do.

ogrisel · 2020-02-19T10:44:53Z

examples/ensemble/plot_gradient_boosting_regression.py

+
 mse = mean_squared_error(y_test, clf.predict(X_test))
-print("MSE: %.4f" % mse)
+print("The mean squared error (MSE) on test set: {:.4f}".format(mse))


Computing the MSE loss on a classification task is misleading. We should rather use accuracy of ROC AUC. The data is approximately balanced: 62% for the positive class.

maikia · 2020-02-19T10:46:01Z

Actually we have a problem:

The Breast cancer dataset is a binary classification task. Target is 0-1. So instead it would make more sense to use a GradientBoostingClassifier.

The problem is that we would need to change the title to "Gradient Boosting Regression Trees for classification" or something. But then the filename (plot_gradient_boosting_regression.py) is misleading but changing it would also change the URL of the example break some links to our documentation...

i am not sure what to do.

How about instead of Cancer dataset we take yet another one..? Diabetes or Ames would be ok? I chose Cancer simply to differentiate a bit and not choose Diabetes each time

maikia · 2020-02-24T13:49:59Z

Here how it looks like now with the diabetes dataset:

thomasjpfan

LGTM

nilichen · 2020-03-19T07:43:12Z

hmm. I'm working on #16023 and plan to update all relevant examples to favor permutation_importance, including this one. May I ask why it is still not merged?

ogrisel · 2020-03-19T08:27:54Z

Merged! Thank you @maikia!

@nilichen feel free to go ahead with #16023.

…ikit-learn#16400)

maikia added 9 commits February 5, 2020 17:10

exchanged boston for diabetes

70cd0e1

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

fea2769

…into boston_plot_gradient_boosting_regression

changed heading

9502dd7

added infor about data preprocessing

c92a0de

added message on fit regression model section

7bd396b

changed message on plot training deviance

70f48ea

removed diabetes dataset and changed the message on the second plot

bdc7605

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

c87a0f1

…into boston_plot_gradient_boosting_regression

flake8

0902b8d

ogrisel reviewed Feb 9, 2020

View reviewed changes

maikia and others added 13 commits February 17, 2020 11:03

Merge branch 'master' of https://github.com/scikit-learn/scikit-learn …

7bd01f5

…into boston_plot_gradient_boosting_regression

Update examples/ensemble/plot_gradient_boosting_regression.py

f0fcc9e

Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

Update examples/ensemble/plot_gradient_boosting_regression.py

9054453

Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

changing the phrasing

cb09eb1

phrasing

a13c281

Update examples/ensemble/plot_gradient_boosting_regression.py

d3f2e34

Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

dropping second dataset

a043f04

Merge branch 'boston_plot_gradient_boosting_regression' of https://gi…

36d2812

…thub.com/maikia/scikit-learn into boston_plot_gradient_boosting_regression

Update examples/ensemble/plot_gradient_boosting_regression.py

e3721c1

Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>

cleaning up

2fcbd1f

Merge branch 'boston_plot_gradient_boosting_regression' of https://gi…

a898a91

…thub.com/maikia/scikit-learn into boston_plot_gradient_boosting_regression

cleaning up

93421c8

flake8

762bceb

maikia requested a review from thomasjpfan February 19, 2020 09:44

ogrisel approved these changes Feb 19, 2020

View reviewed changes

ogrisel added the Waiting for Reviewer label Feb 19, 2020

ogrisel reviewed Feb 19, 2020

View reviewed changes

exchanged cancer dataset for diabetes dataset

5a4e00e

thomasjpfan approved these changes Feb 24, 2020

View reviewed changes

thomasjpfan changed the title ~~plot gradient boosting regression changed to breast cancer dataset~~ DOC plot gradient boosting regression changed to breast cancer dataset Feb 24, 2020

maikia changed the title ~~DOC plot gradient boosting regression changed to breast cancer dataset~~ DOC plot gradient boosting regression changed to diabetes dataset Mar 4, 2020

ogrisel merged commit b65c53d into scikit-learn:master Mar 19, 2020

lucyleeow mentioned this pull request Apr 8, 2020

DOC Replace boston in ensemble.rst #16876

Merged

gio8tisu pushed a commit to gio8tisu/scikit-learn that referenced this pull request May 15, 2020

DOC plot gradient boosting regression changed to diabetes dataset (sc…

a8ad172

…ikit-learn#16400)

Uh oh!

Conversation

maikia commented Feb 6, 2020

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ogrisel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

maikia commented Feb 17, 2020

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Feb 19, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Feb 19, 2020

Choose a reason for hiding this comment

Uh oh!

maikia commented Feb 19, 2020

Uh oh!

maikia commented Feb 24, 2020

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

nilichen commented Mar 19, 2020

Uh oh!

ogrisel commented Mar 19, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ogrisel left a comment •

edited

Loading

ogrisel commented Feb 19, 2020 •

edited

Loading