[MRG+1] MAINT Replace manual checks with `check_is_fitted` by agamemnonc · Pull Request #13013 · scikit-learn/scikit-learn

agamemnonc · 2019-01-18T10:56:58Z

Reference Issues/PRs

Fixes #12991.

What does this implement/fix? Explain your changes.

Replaces manual checks with check_is_fitted utility function in various places.

Any other comments?

All modified files have been checked with flake8 and autopep8 and any formatting issues have been addressed.

sklearn/decomposition/online_lda.py

adrinjalali · 2019-01-18T11:12:07Z

Although there are quite some changes that are PEP8 related and not directly related to this PR, LGTM, if tests pass.

agamemnonc · 2019-01-21T07:43:00Z

Although there are quite some changes that are PEP8 related and not directly related to this PR, LGTM, if tests pass.

Thanks. Indeed, as per the description above, I addressed all formatting issues in the modified files so that autopep8/flake8 checks passed before submitting the PR.

jnothman

The core parts of this PR seem okay, when I can find them

sklearn/ensemble/forest.py

sklearn/exceptions.py

agamemnonc · 2019-01-21T10:34:36Z

The core parts of this PR seem okay, when I can find them

Thanks for reviewing,

OK, I could revert the formatting changes if that would be preferred (your first two comments above)?
Regarding the modification in exceptions.py please see my response above.

glemaitre · 2019-01-28T17:36:37Z

@agamemnonc Could you revert the style changes. I can make a review then.

agamemnonc · 2019-01-29T15:53:41Z

@agamemnonc Could you revert the style changes. I can make a review then.

OK, done; @glemaitre please review.

Of course, there are now some flake8 warnings on the modified files (mostly due to long lines).

glemaitre · 2019-01-29T18:40:18Z

~~I think that we have some missing occurrences:~~
EDIT: sorry I did not checkout the good PR

I also think that we should change the common test:

scikit-learn/sklearn/utils/estimator_checks.py

Lines 1586 to 1616 in fdf2f38

    
           def check_estimators_unfitted(name, estimator_orig): 
        
               """Check that predict raises an exception in an unfitted estimator. 
        
               Unfitted estimators should raise either AttributeError or ValueError. 
        
               The specific exception type NotFittedError inherits from both and can 
        
               therefore be adequately raised for that purpose. 
        
               """ 
        
               # Common test for Regressors, Classifiers and Outlier detection estimators 
        
               X, y = _boston_subset() 
        
               estimator = clone(estimator_orig) 
        
               msg = "fit" 
        
               if hasattr(estimator, 'predict'): 
        
                   assert_raise_message((AttributeError, ValueError), msg, 
        
                                        estimator.predict, X) 
        
               if hasattr(estimator, 'decision_function'): 
        
                   assert_raise_message((AttributeError, ValueError), msg, 
        
                                        estimator.decision_function, X) 
        
               if hasattr(estimator, 'predict_proba'): 
        
                   assert_raise_message((AttributeError, ValueError), msg, 
        
                                        estimator.predict_proba, X) 
        
               if hasattr(estimator, 'predict_log_proba'): 
        
                   assert_raise_message((AttributeError, ValueError), msg, 
        
                                        estimator.predict_log_proba, X)

with something like:

@ignore_warnings
def check_estimators_unfitted(name, estimator_orig):
    """Check that predict raises an exception in an unfitted estimator.

    Unfitted estimators should raise a NotFittedError.
    """

    # Common test for Regressors, Classifiers and Outlier detection estimators
    X, y = _boston_subset()

    estimator = clone(estimator_orig)

    msg = ("{} instance is not fitted yet. Call 'fit' with appropriate "
           "arguments".format(estimator.__class__.__name__))
    for method in ('decision_function', 'predict', 'predict_proba',
                   'predict_log_proba'):
        if getattr(estimator, method, None) is not None:
            assert_raises_regex(NotFittedError, msg,
                                getattr(estimator, method), X)

glemaitre · 2019-01-29T18:41:05Z

@jnothman Do you know why we were checking AttributeError and ValueError instead of directly NotFittedError?

glemaitre

I will check if we don't have redundant tests but you can already address those comments.

sklearn/cluster/birch.py

sklearn/decomposition/online_lda.py

sklearn/decomposition/tests/test_online_lda.py

sklearn/exceptions.py

sklearn/linear_model/logistic.py

glemaitre · 2019-01-29T19:14:19Z

Regarding the tests, I would propose to change the following:

scikit-learn/sklearn/ensemble/tests/test_forest.py

Lines 371 to 378 in fdf2f38

    
           def check_unfitted_feature_importances(name): 
        
               assert_raises(ValueError, getattr, FOREST_ESTIMATORS[name](random_state=0), 
        
                             "feature_importances_") 
        
           @pytest.mark.parametrize('name', FOREST_ESTIMATORS) 
        
           def test_unfitted_feature_importances(name): 
        
               check_unfitted_feature_importances(name)

@pytest.mark.parametrize('name', FOREST_ESTIMATORS)
def test_unfitted_feature_importances(name):
    err_msg = ('This {} instance is not fitted yet. Call 'fit' with appropriate '
               'arguments before using this method.'.format(name))
    pytest.raises(NotFittedError, match=err_msg):
         gettattr(FOREST_ESTIMATORS[name](), 'feature_importances')

glemaitre · 2019-01-29T19:15:28Z

Even if we don't touch a public API, I would add an entry in the what's new since we modified the tests and error message.

agamemnonc · 2019-01-30T12:05:01Z

I think that we have some missing occurrences:
EDIT: sorry I did not checkout the good PR

I also think that we should change the common test:

scikit-learn/sklearn/utils/estimator_checks.py

Lines 1586 to 1616 in fdf2f38

def check_estimators_unfitted(name, estimator_orig):
"""Check that predict raises an exception in an unfitted estimator.

 Unfitted estimators should raise either AttributeError or ValueError. 
 The specific exception type NotFittedError inherits from both and can 
 therefore be adequately raised for that purpose. 
 """ 

 # Common test for Regressors, Classifiers and Outlier detection estimators 
 X, y = _boston_subset() 

 estimator = clone(estimator_orig) 

 msg = "fit" 

 if hasattr(estimator, 'predict'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.predict, X) 

 if hasattr(estimator, 'decision_function'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.decision_function, X) 

 if hasattr(estimator, 'predict_proba'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.predict_proba, X) 

 if hasattr(estimator, 'predict_log_proba'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.predict_log_proba, X)

with something like:

@ignore_warnings
def check_estimators_unfitted(name, estimator_orig):
    """Check that predict raises an exception in an unfitted estimator.

    Unfitted estimators should raise a NotFittedError.
    """

    # Common test for Regressors, Classifiers and Outlier detection estimators
    X, y = _boston_subset()

    estimator = clone(estimator_orig)

    msg = ("{} instance is not fitted yet. Call 'fit' with appropriate "
           "arguments".format(estimator.__class__.__name__))
    for method in ('decision_function', 'predict', 'predict_proba',
                   'predict_log_proba'):
        if getattr(estimator, method, None) is not None:
            assert_raises_regex(NotFittedError, msg,
                                getattr(estimator, method), X)

This is now fixed. thanks!

glemaitre · 2019-02-07T17:38:42Z

You can use getattr(estimator, method, None) is not None: to get the default to None.
It avoids the try ... except

agamemnonc · 2019-02-08T11:02:26Z

Yes, that's what the code currently looks like:

msg = ("{} instance is not fitted yet. Call 'fit' with appropriate "
       "arguments".format(estimator.__class__.__name__))

for method in ('decision_function', 'predict', 'predict_proba',
               'predict_log_proba'):
    if getattr(estimator, method, None) is not None:
        assert_raises_regex(NotFittedError, msg,
                            getattr(estimator, method), X)

The problem is I am not too sure how to include the deprecation here, i.e. cover the case where an Attribute or ValueError is raised instead with the previous error message ("fit"), in order to allow that and issue a DeprecationWarning.

glemaitre · 2019-02-08T12:34:53Z

Oh I see why you wanted a try except then. Basically use pytest.raises and check https://docs.pytest.org/en/latest/assert.html

I think something around:

with pytest.raises(NotFittedError) as excinfo:
    getattr(estimator, method), X)

if not str(excinfo.value) in msg and 'fit' in str.excinfo.value):
    # raise deprecation waring
else:
    assert 'fit' in str.excinfo.value

I wrote this pretty quickly. That might be buggy

jnothman · 2019-03-12T11:11:59Z

Please resolve conflicts with master.

jnothman

I appreciate the consistent use of check_is_fitted within the library, but I'm not entirely sure we should be forcing all library developers to use the exact same error message. The default message does not, for instance, mention partial_fit.

sklearn/utils/estimator_checks.py

doc/whats_new/v0.21.rst

jnothman · 2019-05-01T06:57:42Z

When you get around to resolving my comments, please also move your change log entry to v0.22.rst as version 0.21 has been released.

glemaitre · 2019-05-02T12:28:29Z

@jnothman GaussianProcessRegressor can work without calling fit. I added a tag requires_fit but I am not sure if it would be something that we want.

WDYT?

jnothman · 2019-05-02T23:37:22Z

I think we have explicitly tried to support GPs without fit. Otherwise you upset all the bayesians.

sklearn/utils/estimator_checks.py

glemaitre · 2019-05-03T08:25:53Z

I think we have explicitly tried to support GPs without fit. Otherwise you upset all the bayesians.

And do you think that having the tag requires_fit (default to True) is a good idea?

glemaitre · 2019-05-03T08:34:18Z

@jnothman could you have another look. Apart of the tag I think that the PR is good to be merged.

NicolasHug

A few niticks.

I think the introduction of the estimator tag is appropriate here.

I'm not pressing "Approve" because TBH I'm not entirely sure what should be done regarding the deprecation, but I'm tending towards LGTM.

sklearn/utils/estimator_checks.py

doc/whats_new/v0.22.rst

doc/developers/contributing.rst

agamemnonc · 2019-05-06T10:10:58Z

Thanks @NicolasHug for reviewing.

I think I have now addressed the issues you raised, otherwise please let me know.

jnothman

Thanks @agamemnonc!

glemaitre · 2019-05-07T07:34:43Z

Thanks @agamemnonc

agamemnonc · 2019-05-07T08:04:42Z

Thanks @agamemnonc

Thank you @glemaitre for your input and everyone else for reviewing.

agamemnonc added 7 commits January 17, 2019 15:14

online_lda and tests

d72aa97

forest

7012b2e

linear model base

cdcb151

logistic

9a43ca9

Merge branch 'master' into check_is_fitted_replacements

00f9f0c

birch

d4dc367

pep8 fix

cf20a7d

adrinjalali reviewed Jan 18, 2019

View reviewed changes

sklearn/decomposition/online_lda.py Outdated Show resolved Hide resolved

agamemnonc added 2 commits January 18, 2019 11:12

fix formatting issue

594c495

exceptions doc example

874b17f

jnothman reviewed Jan 21, 2019

View reviewed changes

sklearn/ensemble/forest.py Outdated Show resolved Hide resolved

sklearn/ensemble/forest.py Outdated Show resolved Hide resolved

sklearn/exceptions.py Show resolved Hide resolved

agamemnonc added 2 commits January 29, 2019 15:49

revert style changes

2ed78de

one more style revert

e3cc8e3

glemaitre requested changes Jan 29, 2019

View reviewed changes

glemaitre self-requested a review January 29, 2019 18:53

agamemnonc added 4 commits January 30, 2019 11:41

update utils/estimator_checks and tests

03084c9

style fix

7616d2b

Merge branch 'master' into check_is_fitted_replacements

0f5f1a0

remove ellipsis

1996aab

style fix

3e97a04

jnothman reviewed Mar 12, 2019

View reviewed changes

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved

doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved

doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved

doc/whats_new/v0.21.rst Outdated Show resolved Hide resolved

glemaitre self-assigned this May 2, 2019

glemaitre added 3 commits May 2, 2019 12:14

Merge remote-tracking branch 'origin/master' into pr/agamemnonc/13013

e384f66

reviews

3f45e05

add tag

6bd4d06

jnothman reviewed May 2, 2019

View reviewed changes

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved

be more permissive and document tag

c95d73f

Merge remote-tracking branch 'origin/master' into pr/agamemnonc/13013

7fa2e7e

NicolasHug reviewed May 5, 2019

View reviewed changes

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved

doc/whats_new/v0.22.rst Outdated Show resolved Hide resolved

doc/developers/contributing.rst Outdated Show resolved Hide resolved

doc/developers/contributing.rst Outdated Show resolved Hide resolved

agamemnonc added 4 commits May 6, 2019 10:54

replace getattr with hasattr

0141773

fix rst file entry

0dce273

fix typo in developers/contributing

8d46b8f

developers/contributing links to glossary

f043105

jnothman approved these changes May 6, 2019

View reviewed changes

jnothman merged commit 19192c0 into scikit-learn:master May 7, 2019

agamemnonc deleted the check_is_fitted_replacements branch May 7, 2019 05:28

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

MAINT Replace manual checks with check_is_fitted (scikit-learn#13013)

d5bf14f

glemaitre mentioned this pull request Jul 16, 2019

Nicer error in num_samples if shape is not valid and there's no __len__ #14369

Merged

Uh oh!

Conversation

agamemnonc commented Jan 18, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

Uh oh!

adrinjalali commented Jan 18, 2019

Uh oh!

agamemnonc commented Jan 21, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

agamemnonc commented Jan 21, 2019

Uh oh!

glemaitre commented Jan 28, 2019

Uh oh!

agamemnonc commented Jan 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Jan 29, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

glemaitre commented Jan 29, 2019

Uh oh!

glemaitre left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glemaitre commented Jan 29, 2019

Uh oh!

glemaitre commented Jan 29, 2019

Uh oh!

agamemnonc commented Jan 30, 2019

Uh oh!

glemaitre commented Feb 7, 2019

Uh oh!

agamemnonc commented Feb 8, 2019

Uh oh!

glemaitre commented Feb 8, 2019

Uh oh!

jnothman commented Mar 12, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jnothman commented May 1, 2019

Uh oh!

glemaitre commented May 2, 2019

Uh oh!

jnothman commented May 2, 2019 via email

Uh oh!

Uh oh!

glemaitre commented May 3, 2019

Uh oh!

glemaitre commented May 3, 2019

Uh oh!

NicolasHug left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

agamemnonc commented May 6, 2019

agamemnonc commented Jan 29, 2019 •

edited

Loading

glemaitre commented Jan 29, 2019 •

edited

Loading