FIX params validation in SelectFromModel with prefit=True#23271
FIX params validation in SelectFromModel with prefit=True#23271glemaitre merged 10 commits intoscikit-learn:mainfrom
Conversation
| "When `prefit=True`, `estimator` is expected to be a fitted " | ||
| "estimator." | ||
| ) from exc | ||
| self.estimator_ = deepcopy(self.estimator) |
There was a problem hiding this comment.
We have not been consistent about this. I'm guessing you want to always deepcopy when an estimator when using prefit?
There was a problem hiding this comment.
Yep. We do that in stacking. In calibration, we fit a calibrated estimator so we don't need to make a deep copy. I think that this the behaviour that we want.
There was a problem hiding this comment.
It's not done in stacking yet:
scikit-learn/sklearn/ensemble/_stacking.py
Line 184 in 9258e32
I'm fine with changing stacking to deep copy to be consistent.
There was a problem hiding this comment.
Uhm you are right. The deep copy is on the CV object.
SelectFromModel will support partial_fit. It seems then important to take a deep copy to not modify the passed estimator.
In stacking, I don't think that anything would alter the estimators. So it might be unnecessary to enforce the deep copy there but it might be safer. We should work on SLEP017 :)
There was a problem hiding this comment.
Actually, I am remembering why I liked to have it in the first place. If we set the attribute of the external estimator, it will impact the internal estimator as well. The deep copy prevents this.
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
jeremiedbb
left a comment
There was a problem hiding this comment.
Just the docstring simplification above. Otherwise LGTM.
Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
…rn#23271) Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Jérémie du Boisberranger <34657725+jeremiedbb@users.noreply.github.com>
closes #23267
This PR introduces:
fitto always validate attributesfitwhenmax_featuresis callable to ensure it is validated. There are no clean nor straightforward solutions to validatemax_featuresattransform.In a future PR, I think that we should deprecate the possibility to call
transformwithout callingfit. This PR still keep the backward compatibility.