FIX params validation in SelectFromModel with prefit=True by glemaitre · Pull Request #23271 · scikit-learn/scikit-learn

glemaitre · 2022-05-03T18:25:47Z

This PR introduces:

The possibility to call fit to always validate attributes
Force calling fit when max_features is callable to ensure it is validated. There are no clean nor straightforward solutions to validate max_features at transform.

In a future PR, I think that we should deprecate the possibility to call transform without calling fit. This PR still keep the backward compatibility.

thomasjpfan

Thank you for the PR!

thomasjpfan · 2022-05-03T18:48:48Z

sklearn/feature_selection/_from_model.py

+                    "When `prefit=True`, `estimator` is expected to be a fitted "
+                    "estimator."
+                ) from exc
+            self.estimator_ = deepcopy(self.estimator)


We have not been consistent about this. I'm guessing you want to always deepcopy when an estimator when using prefit?

Yep. We do that in stacking. In calibration, we fit a calibrated estimator so we don't need to make a deep copy. I think that this the behaviour that we want.

It's not done in stacking yet:

scikit-learn/sklearn/ensemble/_stacking.py

Line 184 in 9258e32

self.estimators_.append(estimator)

I'm fine with changing stacking to deep copy to be consistent.

Uhm you are right. The deep copy is on the CV object.

SelectFromModel will support partial_fit. It seems then important to take a deep copy to not modify the passed estimator.

In stacking, I don't think that anything would alter the estimators. So it might be unnecessary to enforce the deep copy there but it might be safer. We should work on SLEP017 :)

Actually, I am remembering why I liked to have it in the first place. If we set the attribute of the external estimator, it will impact the internal estimator as well. The deep copy prevents this.

doc/whats_new/v1.1.rst

thomasjpfan

Otherwise LGTM

sklearn/feature_selection/tests/test_from_model.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

sklearn/feature_selection/_from_model.py

jeremiedbb

Just the docstring simplification above. Otherwise LGTM.

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

…rn#23271) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

FIX params validation in SelectFromModel with prefit=True

d1c7ff4

github-actions bot added the module:feature_selection label May 3, 2022

glemaitre added the No Changelog Needed label May 3, 2022

thomasjpfan reviewed May 3, 2022

View reviewed changes

add pr number

17ef353

jeremiedbb added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label May 3, 2022

thomasjpfan approved these changes May 3, 2022

View reviewed changes

sklearn/feature_selection/tests/test_from_model.py Outdated Show resolved Hide resolved

sklearn/feature_selection/tests/test_from_model.py Outdated Show resolved Hide resolved

glemaitre and others added 2 commits May 3, 2022 22:52

Update sklearn/feature_selection/tests/test_from_model.py

b9220b8

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

Update sklearn/feature_selection/tests/test_from_model.py

17ab0e5

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

jeremiedbb reviewed May 4, 2022

View reviewed changes

sklearn/feature_selection/_from_model.py Outdated Show resolved Hide resolved

sklearn/feature_selection/_from_model.py Show resolved Hide resolved

glemaitre added 5 commits May 4, 2022 11:42

TST enforce the behaviour of fit and partial_fit with prefit

8214d92

DOC add more details regarding behaviour

56d2e2e

TST add check for max_features=None

cf457bc

DOC update whats new

47d536f

doc

e224ef5

jeremiedbb reviewed May 4, 2022

View reviewed changes

sklearn/feature_selection/_from_model.py Outdated Show resolved Hide resolved

jeremiedbb mentioned this pull request May 4, 2022

partial_fit from SelectFromModel doesn't validate the parameters #23277

Closed

jeremiedbb approved these changes May 4, 2022

View reviewed changes

Update sklearn/feature_selection/_from_model.py

ce19768

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>

glemaitre added this to the 1.1 milestone May 4, 2022

glemaitre merged commit 6bbb3cb into scikit-learn:main May 4, 2022

ogrisel mentioned this pull request May 17, 2022

KeyError raised when using pandas DataFrame in SelectFromModel.fit() #23393

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

FIX params validation in SelectFromModel with prefit=True#23271

FIX params validation in SelectFromModel with prefit=True#23271
glemaitre merged 10 commits intoscikit-learn:mainfrom
glemaitre:is/23267

glemaitre commented May 3, 2022

Uh oh!

thomasjpfan left a comment

Uh oh!

thomasjpfan May 3, 2022

Uh oh!

glemaitre May 3, 2022

Uh oh!

thomasjpfan May 3, 2022

Uh oh!

glemaitre May 3, 2022

Uh oh!

glemaitre May 4, 2022

Uh oh!

Uh oh!

thomasjpfan left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

glemaitre commented May 3, 2022

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

thomasjpfan May 3, 2022

Choose a reason for hiding this comment

Uh oh!

glemaitre May 3, 2022

Choose a reason for hiding this comment

Uh oh!

thomasjpfan May 3, 2022

Choose a reason for hiding this comment

Uh oh!

glemaitre May 3, 2022

Choose a reason for hiding this comment

Uh oh!

glemaitre May 4, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomasjpfan left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants