[MRG] simplify check_is_fitted to use any fitted attributes by amueller · Pull Request #14545 · scikit-learn/scikit-learn

amueller · 2019-08-01T17:22:48Z

This simplifies check_is_fitted to error if no fitted attribute is found.
This clearly is less strict than what we had before, but I did not need to change any tests, so according to our tests (i.e. the guaranteed functionality), this implementation is as good as the previous one.

The main motivation for this change is to allow us to reduce boiler-plate in the future. If we introduce a validation method as in #13603, we could now include the check_is_fitted there.

NicolasHug · 2019-08-01T21:59:27Z

sklearn/utils/validation.py



-def check_is_fitted(estimator, attributes, msg=None, all_or_any=all):
+def check_is_fitted(estimator, *, msg=None):


aren't you changing the signature here? did you mean

check_is_fitted(estimator, *args, msg=None): to preserve backward compatibility?

I'm changing the signature. We could have args here if we consider this public, which maybe we should?

ok actually using args doesn't work unless I also add *kwargs. So if we want backward-compatibility we need to just do a usual deprecation cycle.

Yes I think we consider the validation utils public. But I'm happy to see them go private.

jnothman · 2019-08-02T02:26:33Z

sklearn/utils/validation.py


-    if not isinstance(attributes, (list, tuple)):
-        attributes = [attributes]
+    attrs = [v for v in vars(estimator) if v.endswith("_")


I think NearedtNeighbors has stored only _fit_X

It has this:

scikit-learn/sklearn/neighbors/base.py

Lines 166 to 169 in 7c60ead

if self.metric_params is None:

self.effective_metric_params_ = {}

else:

self.effective_metric_params_ = self.metric_params.copy()

and it was very recently documented.

Common tests pass so it must work ;)

amueller · 2019-08-02T19:24:32Z

hm vectorizer were not caught by common tests of course :-/ TfidfTransformer only has _idf_diag

amueller · 2019-08-02T19:27:08Z

And CountVectorizer is misbehaving....

…fore transform

amueller · 2019-08-02T19:54:40Z

See #14559, but should be passing now. This is not the cleanest work-around but that's mostly because CountVectorizer doesn't adhere to conventions. This is a minimum change (that also fixes a bug where you had to call transform before being able to call inverse_transform).

glemaitre

This is actually handy. I will need it in #14028.
Could you add an entry in the what's new?

I wonder if there is an issue with some of the meta-estimator which are note tested with the common tests. I'll check that.

amueller · 2019-08-05T15:42:12Z

I'm ambivalent about adding a whatsnew but I can do it if you think it's worth it. Probably should add a versionchanged?

glemaitre · 2019-08-05T15:49:49Z

I'm ambivalent about adding a whatsnew but I can do it if you think it's worth it. Probably should add a versionchanged?

I would say that a note in the versionchanged could be nice. In fact only a version changed would be needed I think. This is handy for 3rd party library to know what is going on where their tests are failing :)

glemaitre · 2019-08-05T15:51:42Z

sklearn/utils/validation.py



-def check_is_fitted(estimator, attributes, msg=None, all_or_any=all):
+def check_is_fitted(estimator, attributes='deprecated', msg=None):


The behavior is already not back-compatible. Since we mention in the documentation that these utils can change from a version to another, I would not bother with a deprecation warning for the attributes parameters knowing that one can have some side-effect with all_or_any.

How is it not backward compatible? Oh I could deprecate all_or_any as well?

If somebody is using all_or_any now, nothing would happen and this is not an attribute of the function as well. But as I mentioned, we clearly state in the documentation that utils are not following the deprecation cycle and can change: https://scikit-learn.org/stable/developers/utilities.html

True, only deprecating one doesn't make sense. But also see the discussion at #6616. Basically, the docs say that but people ignore it and it might not be good if we enforce it and should make things private instead.

OK I see. I am sure I was one of these people that complain at least once (then @lesteve show me the red box :))
I really feel that having the utils private could help to move quickly sometimes and help third-party project (at the cost of potential breaking if they use them). So deprecation it is :)

so should I add one for all_or_any then?

sklearn/utils/validation.py

glemaitre · 2019-08-05T16:21:43Z

What is the behaviour expected on Pipeline?
This will fail:

from sklearn.datasets import load_iris                                               
from sklearn.preprocessing import StandardScaler                                     
from sklearn.linear_model import LogisticRegression                                  
from sklearn.pipeline import make_pipeline                                           
from sklearn.utils.validation import check_is_fitted                                 
                                                                                     
X, y = load_iris(return_X_y=True)                                                    
pipe = make_pipeline(StandardScaler(), LogisticRegression())                         
pipe.fit(X, y)                                                                       
check_is_fitted(pipe)

---------------------------------------------------------------------------
NotFittedError                            Traceback (most recent call last)
/tmp/tmp.py in <module>
      8 pipe = make_pipeline(StandardScaler(), LogisticRegression())
      9 pipe.fit(X, y)
---> 10 check_is_fitted(pipe)

~/Documents/code/toolbox/scikit-learn/sklearn/utils/validation.py in check_is_fitted(estimator, msg)
    910 
    911     if not len(attrs):
--> 912         raise NotFittedError(msg % {'name': type(estimator).__name__})
    913 
    914 

NotFittedError: This Pipeline instance is not fitted yet. Call 'fit' with appropriate arguments before using this method.

glemaitre · 2019-08-05T16:23:25Z

we should probably make a recursive call on each element of the Pipeline instance then?

sklearn/utils/validation.py

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

…to anything_fitted

glemaitre · 2019-08-08T10:09:28Z

Do you still want to make a deprecation?

thomasjpfan · 2019-08-08T20:40:40Z

Since we clearly state that the utils are not guaranteed to be stable, I would prefer not go through a deprecation cycle.

amueller · 2019-08-09T19:17:48Z

@thomasjpfan I would say that remark's there mostly to limit liability ;) see my remarks above. I think I'll edit to also deprecate the all_or_any and then we can merge?

glemaitre

If you could add at least the suggestion in the docstring. It could be easier to find it for removal.
Otherwise LGTM

sklearn/utils/validation.py

glemaitre · 2019-08-12T16:26:09Z

sklearn/utils/tests/test_validation.py

+    assert check_is_fitted(ard) is None
+    assert check_is_fitted(svr) is None
+
+    assert_warns_message(


@pytest.mark.parametrize("params", [{'attributes': ['coefs_']}, {all_or_any=any}] def test_check_is_fitted_deprecation(params): # FIXME: to be removed in 0.23 warn_msg = 'Passing {} to check_is_fitted is deprecated'.format(list(params.keys())[0]) with pytest.warns(DeprecationWarning, match=warn_msg): check_is_fitted(ard, **params)

It could be handy to have a separated test function to be removed next version.
We might use pytest (but the test will be removed anyway).

Is it easier to remove a test than to remove the asserts? A comment might be nice but it will also just fail and so we won't forget ;)

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

glemaitre · 2019-08-13T20:09:13Z

Good to go

amueller · 2019-08-14T19:45:11Z

yay thanks for the reviews :)

amueller added 4 commits August 1, 2019 13:02

make check_is_fitted not take attributes

18fba6c

cleanup, remove any_or_all

e034ed8

fix LOF, birch, mixtures

1dc9258

remove unused method

d6034ea

NicolasHug reviewed Aug 1, 2019

View reviewed changes

jnothman reviewed Aug 2, 2019

View reviewed changes

amueller added 2 commits August 2, 2019 14:43

fix partial dependence function

3cb95ac

make change backward-compatible

4d3a8b4

also allow private fitted attributes

1181982

amueller added 2 commits August 2, 2019 15:48

slight refactoring in CountVectorizer to mess less with the vocabulary

7ed876d

added regression test for not being able to call inverse_transform be…

8701cc0

…fore transform

amueller added 2 commits August 2, 2019 16:16

add special check for classes

be4a90f

more functions to fix

7e33027

glemaitre reviewed Aug 5, 2019

View reviewed changes

sklearn/utils/validation.py Show resolved Hide resolved

glemaitre mentioned this pull request Aug 5, 2019

[MRG] ENH Add support for dataframe in PDP #14028

Merged

glemaitre reviewed Aug 5, 2019

View reviewed changes

sklearn/utils/validation.py Show resolved Hide resolved

sklearn/utils/validation.py Outdated Show resolved Hide resolved

amueller and others added 3 commits August 6, 2019 17:50

Update sklearn/utils/validation.py

09e4192

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

fix whitespace, keyword args

40af13e

Merge branch 'anything_fitted' of github.com:amueller/scikit-learn in…

ed957e3

…to anything_fitted

remove extra blank line

86aebe7

fix CI hopefully

ec25b3c

amueller added 3 commits August 9, 2019 15:20

deprecate all_or_any in check_is_fittec

9038c62

fix typo, add test for deprecation

9862529

Merge branch 'master' into anything_fitted

da382b8

glemaitre approved these changes Aug 12, 2019

View reviewed changes

amueller and others added 4 commits August 12, 2019 15:48

add comment on 0.23 removal of deprecated arguments to check_is_fitted

e958e62

Apply suggestions from code review

0538f91

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

Update sklearn/utils/validation.py

11995c8

Co-Authored-By: Guillaume Lemaitre <g.lemaitre58@gmail.com>

pep8

7463363

glemaitre merged commit 92af3da into scikit-learn:master Aug 13, 2019

arpanchowdhry added a commit to arpanchowdhry/scikit-learn that referenced this pull request Aug 15, 2019

Merged changes from scikit-learn#14545 and further improvements

47d24af

alegonz mentioned this pull request Dec 9, 2019

check_is_fitted has false positives on custom subclasses with private attributes #15845

Closed

bellet mentioned this pull request Dec 12, 2019

Fix failing tests in last build scikit-learn-contrib/metric-learn#270

Merged

This was referenced Dec 14, 2019

WIP FIX Revert changes in check_is_fitted #15885

Closed

RFC Support for Some Developer Utilities #15801

Open

bellet mentioned this pull request Jan 13, 2020

Revert changes in #270 due to revert decision in sklearn scikit-learn-contrib/metric-learn#273

Merged



		def check_is_fitted(estimator, attributes, msg=None, all_or_any=all):
		def check_is_fitted(estimator, *, msg=None):

	if self.metric_params is None:
	self.effective_metric_params_ = {}
	else:
	self.effective_metric_params_ = self.metric_params.copy()

Uh oh!

Conversation

amueller commented Aug 1, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller commented Aug 2, 2019

Uh oh!

amueller commented Aug 2, 2019

Uh oh!

amueller commented Aug 2, 2019

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

amueller commented Aug 5, 2019

Uh oh!

glemaitre commented Aug 5, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glemaitre commented Aug 5, 2019

Uh oh!

glemaitre commented Aug 5, 2019

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Aug 8, 2019

Uh oh!

thomasjpfan commented Aug 8, 2019

Uh oh!

amueller commented Aug 9, 2019

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glemaitre commented Aug 13, 2019

Uh oh!

amueller commented Aug 14, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

glemaitre commented Aug 5, 2019 •

edited

Loading