MNT Speedup example plot_select_from_model_diabetes.py#21738
MNT Speedup example plot_select_from_model_diabetes.py#21738glemaitre merged 5 commits intoscikit-learn:mainfrom
Conversation
|
From a pedagogical point of view, I find it weird to do an example on automated feature selection on a dataset which was prepared with manual feature selection. However, since the example does not really rely on the choice of the Lasso penalty, maybe we could use just a faster base linear model, for instance |
I agree with @ogrisel. In the original example, the penalty chosen will let out only a single feature. So using |
|
Using RidgeCV the results are much faster on the original feature set: Should I move forward and update the file? |
Yes this is expected. Can you make the change? |
|
The test failure is unrelated. |
ogrisel
left a comment
There was a problem hiding this comment.
Great, this is a very nice speed improvement with the same pedagogical message!
Just 2 things to fix:
| @@ -46,10 +46,10 @@ | |||
| # :ref:`sphx_glr_auto_examples_inspection_plot_linear_model_coefficient_interpretation.py`. | |||
There was a problem hiding this comment.
The comment above needs to be updated to replace LassoCV by RidgeCV.
| # | ||
| # We also note that the features selected by SFS differ from those selected by | ||
| # feature importance: SFS selects `bmi` instead of `s1`. This does sound | ||
| # feature importance: SFS selects `bmi` instead of `s1`. This does sounds |
There was a problem hiding this comment.
The original sentence was grammatically correct.
| # feature importance: SFS selects `bmi` instead of `s1`. This does sounds | |
| # feature importance: SFS selects `bmi` instead of `s1`. This does sound |
There was a problem hiding this comment.
Oops, I missed 'does' in the sentence, I have updated the documentation!
|
Thanks @ArthDh Merging since the failure in the CI is unrelated with your changes. |

Reference Issues/PRs
#21598
What does this implement/fix? Explain your changes.
Speeds up plot_select_model_diabetes.py by using a subset of features while preserving the original message of the example. The major bottleneck, in this case, was the backward SFS which could be sped up using fewer features.
Any other comments?