MNT Speedup example plot_select_from_model_diabetes.py by ArthDh · Pull Request #21738 · scikit-learn/scikit-learn

ArthDh · 2021-11-21T22:52:11Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Speeds up plot_select_model_diabetes.py by using a subset of features while preserving the original message of the example. The major bottleneck, in this case, was the backward SFS which could be sped up using fewer features.

Features selected by SelectFromModel: ['s1' 's5']
Done in 0.046s
Features selected by forward sequential selection: ['bmi' 's5']
Done in 2.907s
Features selected by backward sequential selection: ['bmi' 's5']
Done in 8.544s
real 15.43
user 13.27
sys 0.96


Features selected by SelectFromModel: ['bmi' 's5']
Done in 0.033s
Features selected by forward sequential selection: ['bmi' 's5']
Done in 1.407s
Features selected by backward sequential selection: ['bmi' 's5']
Done in 1.848s
real 6.87
user 5.11
sys 0.87

Any other comments?

ogrisel · 2021-11-22T14:51:45Z

From a pedagogical point of view, I find it weird to do an example on automated feature selection on a dataset which was prepared with manual feature selection.

However, since the example does not really rely on the choice of the Lasso penalty, maybe we could use just a faster base linear model, for instance RidgeCV(alphas=np.logspace(-6, 6, num=5)) on the original feature set.

glemaitre · 2021-11-22T15:11:49Z

However, since the example does not really rely on the choice of the Lasso penalty, maybe we could use just a faster base linear model, for instance RidgeCV(alphas=np.logspace(-6, 6, num=5)) on the original feature set.

I agree with @ogrisel. In the original example, the penalty chosen will let out only a single feature. So using RidgeCV will be much cheaper and keep the description more or less the same. I think that what is really expensive here is indeed the SequentialFeatureSelector. We could reduce to cv=3 and check if this is enough (maybe with an additional note to mention that in practice one should consider to increase the number of folds)

ArthDh · 2021-11-22T20:59:25Z

Using RidgeCV the results are much faster on the original feature set:

Features selected by SelectFromModel: ['s1' 's5']
Done in 0.001s
Features selected by forward sequential selection: ['bmi' 's5']
Done in 0.116s
Features selected by backward sequential selection: ['bmi' 's5']
Done in 0.305s
real 6.60
user 2.20
sys 0.88

Graph generated:

Should I move forward and update the file?

glemaitre · 2021-11-23T09:40:56Z

Using RidgeCV the results are much faster on the original feature set:

Yes this is expected. Can you make the change?

into plot_select_new

ogrisel · 2021-11-24T09:48:02Z

The test failure is unrelated.

ogrisel

Great, this is a very nice speed improvement with the same pedagogical message!

Just 2 things to fix:

ogrisel · 2021-11-24T09:50:18Z

examples/feature_selection/plot_select_from_model_diabetes.py

@@ -46,10 +46,10 @@
 # :ref:`sphx_glr_auto_examples_inspection_plot_linear_model_coefficient_interpretation.py`.


The comment above needs to be updated to replace LassoCV by RidgeCV.

ogrisel · 2021-11-24T09:56:56Z

examples/feature_selection/plot_select_from_model_diabetes.py

 #
 # We also note that the features selected by SFS differ from those selected by
-# feature importance: SFS selects `bmi` instead of `s1`. This does sound
+# feature importance: SFS selects `bmi` instead of `s1`. This does sounds


The original sentence was grammatically correct.

Suggested change

# feature importance: SFS selects `bmi` instead of `s1`. This does sounds

# feature importance: SFS selects `bmi` instead of `s1`. This does sound

Oops, I missed 'does' in the sentence, I have updated the documentation!

glemaitre

LGTM

glemaitre · 2021-11-25T18:15:46Z

Thanks @ArthDh Merging since the failure in the CI is unrelated with your changes.

…21738)

Speedup examples/feature_selection/plot_select_model_diabetes.py

b78e8a5

adrinjalali changed the title ~~[MRG] Speedup examples/feature_selection/plot_select_model_diabetes.py~~ [MRG] Speedup example plot_select_from_model_diabetes.py Nov 22, 2021

adrinjalali mentioned this pull request Nov 22, 2021

Accelerate slow examples #21598

Closed

41 tasks

glemaitre changed the title ~~[MRG] Speedup example plot_select_from_model_diabetes.py~~ MNT Speedup example plot_select_from_model_diabetes.py Nov 23, 2021

ArthDh added 3 commits November 23, 2021 10:15

Updated linear model to RidgeCV

85f7d2e

Updated linear model to RidgeCV

493cb9a

Merge branch 'plot_select_new' of https://github.com/ArthDh/scikit-learn

3ecd713

into plot_select_new

ogrisel approved these changes Nov 24, 2021

View reviewed changes

Updated Documentation

83d9233

ArthDh requested a review from ogrisel November 24, 2021 16:30

glemaitre approved these changes Nov 24, 2021

View reviewed changes

glemaitre merged commit 974cd77 into scikit-learn:main Nov 25, 2021

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Nov 29, 2021

MNT Speedup example plot_select_from_model_diabetes.py (scikit-learn#…

b241f6b

…21738)

samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021

MNT Speedup example plot_select_from_model_diabetes.py (scikit-learn#…

19a001a

…21738)

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021

MNT Speedup example plot_select_from_model_diabetes.py (scikit-learn#…

b1c4287

…21738)

glemaitre pushed a commit that referenced this pull request Dec 25, 2021

MNT Speedup example plot_select_from_model_diabetes.py (#21738)

fa97334

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

MNT Speedup example plot_select_from_model_diabetes.py#21738

MNT Speedup example plot_select_from_model_diabetes.py#21738
glemaitre merged 5 commits intoscikit-learn:mainfrom
ArthDh:plot_select_new

ArthDh commented Nov 21, 2021

Uh oh!

ogrisel commented Nov 22, 2021 •

edited

Loading

Uh oh!

glemaitre commented Nov 22, 2021

Uh oh!

ArthDh commented Nov 22, 2021

Uh oh!

glemaitre commented Nov 23, 2021

Uh oh!

ogrisel commented Nov 24, 2021

Uh oh!

ogrisel left a comment

Uh oh!

ogrisel Nov 24, 2021

Uh oh!

ogrisel Nov 24, 2021 •

edited

Loading

Uh oh!

ArthDh Nov 24, 2021

Uh oh!

glemaitre left a comment

Uh oh!

glemaitre commented Nov 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

		@@ -46,10 +46,10 @@
		# :ref:`sphx_glr_auto_examples_inspection_plot_linear_model_coefficient_interpretation.py`.

	# feature importance: SFS selects `bmi` instead of `s1`. This does sounds
	# feature importance: SFS selects `bmi` instead of `s1`. This does sound

Uh oh!

Conversation

ArthDh commented Nov 21, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

ogrisel commented Nov 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Nov 22, 2021

Uh oh!

ArthDh commented Nov 22, 2021

Uh oh!

glemaitre commented Nov 23, 2021

Uh oh!

ogrisel commented Nov 24, 2021

Uh oh!

ogrisel left a comment

Choose a reason for hiding this comment

Uh oh!

ogrisel Nov 24, 2021

Choose a reason for hiding this comment

Uh oh!

ogrisel Nov 24, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ArthDh Nov 24, 2021

Choose a reason for hiding this comment

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Nov 25, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ogrisel commented Nov 22, 2021 •

edited

Loading

ogrisel Nov 24, 2021 •

edited

Loading