DOC plot gradient boosting regression changed to diabetes dataset#16400
Conversation
…into boston_plot_gradient_boosting_regression
…into boston_plot_gradient_boosting_regression
…into boston_plot_gradient_boosting_regression
Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>
Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>
Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>
…thub.com/maikia/scikit-learn into boston_plot_gradient_boosting_regression
Co-Authored-By: Olivier Grisel <olivier.grisel@ensta.org>
…thub.com/maikia/scikit-learn into boston_plot_gradient_boosting_regression
Thanks @ogrisel |
|
For the next PR it would be interesting to:
This requires an external dependency on shap but I think this is fine in examples. |
ogrisel
left a comment
There was a problem hiding this comment.
Actually we have a problem:
The Breast cancer dataset is a binary classification task. Target is 0-1. So instead it would make more sense to use a GradientBoostingClassifier.
The problem is that we would need to change the title to "Gradient Boosting Regression Trees for classification" or something. But then the filename (plot_gradient_boosting_regression.py) is misleading but changing it would also change the URL of the example break some links to our documentation...
i am not sure what to do.
|
|
||
| mse = mean_squared_error(y_test, clf.predict(X_test)) | ||
| print("MSE: %.4f" % mse) | ||
| print("The mean squared error (MSE) on test set: {:.4f}".format(mse)) |
There was a problem hiding this comment.
Computing the MSE loss on a classification task is misleading. We should rather use accuracy of ROC AUC. The data is approximately balanced: 62% for the positive class.
How about instead of Cancer dataset we take yet another one..? Diabetes or Ames would be ok? I chose Cancer simply to differentiate a bit and not choose Diabetes each time |
|
hmm. I'm working on #16023 and plan to update all relevant examples to favor permutation_importance, including this one. May I ask why it is still not merged? |

Reference Issues/PRs
What does this implement/fix? Explain your changes.
Towards #16155
Exchanged the Boston dataset for Breast cancer dataset (although it is classification dataset it does not matter in this example)
Improved the comments and the layout of the example
The figure

Before:
After:

Any other comments?