Skip to content

FIX params validation in SelectFromModel with prefit=True#23271

Merged
glemaitre merged 10 commits intoscikit-learn:mainfrom
glemaitre:is/23267
May 4, 2022
Merged

FIX params validation in SelectFromModel with prefit=True#23271
glemaitre merged 10 commits intoscikit-learn:mainfrom
glemaitre:is/23267

Conversation

@glemaitre
Copy link
Copy Markdown
Member

closes #23267

This PR introduces:

  • The possibility to call fit to always validate attributes
  • Force calling fit when max_features is callable to ensure it is validated. There are no clean nor straightforward solutions to validate max_features at transform.

In a future PR, I think that we should deprecate the possibility to call transform without calling fit. This PR still keep the backward compatibility.

Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR!

"When `prefit=True`, `estimator` is expected to be a fitted "
"estimator."
) from exc
self.estimator_ = deepcopy(self.estimator)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have not been consistent about this. I'm guessing you want to always deepcopy when an estimator when using prefit?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. We do that in stacking. In calibration, we fit a calibrated estimator so we don't need to make a deep copy. I think that this the behaviour that we want.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not done in stacking yet:

self.estimators_.append(estimator)

I'm fine with changing stacking to deep copy to be consistent.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Uhm you are right. The deep copy is on the CV object.

SelectFromModel will support partial_fit. It seems then important to take a deep copy to not modify the passed estimator.

In stacking, I don't think that anything would alter the estimators. So it might be unnecessary to enforce the deep copy there but it might be safer. We should work on SLEP017 :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I am remembering why I liked to have it in the first place. If we set the attribute of the external estimator, it will impact the internal estimator as well. The deep copy prevents this.

@jeremiedbb jeremiedbb added the To backport PR merged in master that need a backport to a release branch defined based on the milestone. label May 3, 2022
Copy link
Copy Markdown
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Otherwise LGTM

glemaitre and others added 2 commits May 3, 2022 22:52
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Copy Markdown
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the docstring simplification above. Otherwise LGTM.

Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
@glemaitre glemaitre added this to the 1.1 milestone May 4, 2022
@glemaitre glemaitre merged commit 6bbb3cb into scikit-learn:main May 4, 2022
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request May 10, 2022
…rn#23271)

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:feature_selection No Changelog Needed To backport PR merged in master that need a backport to a release branch defined based on the milestone.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Regression in SelectFromModel where max_features_ does not exist with prefit=True.

3 participants