Skip to content

[MRG+1] MAINT Replace manual checks with check_is_fitted #13013

Merged
jnothman merged 36 commits intoscikit-learn:masterfrom
agamemnonc:check_is_fitted_replacements
May 7, 2019
Merged

[MRG+1] MAINT Replace manual checks with check_is_fitted #13013
jnothman merged 36 commits intoscikit-learn:masterfrom
agamemnonc:check_is_fitted_replacements

Conversation

@agamemnonc
Copy link
Copy Markdown
Contributor

Reference Issues/PRs

Fixes #12991.

What does this implement/fix? Explain your changes.

Replaces manual checks with check_is_fitted utility function in various places.

Any other comments?

All modified files have been checked with flake8 and autopep8 and any formatting issues have been addressed.

@adrinjalali
Copy link
Copy Markdown
Member

Although there are quite some changes that are PEP8 related and not directly related to this PR, LGTM, if tests pass.

@agamemnonc
Copy link
Copy Markdown
Contributor Author

Although there are quite some changes that are PEP8 related and not directly related to this PR, LGTM, if tests pass.

Thanks. Indeed, as per the description above, I addressed all formatting issues in the modified files so that autopep8/flake8 checks passed before submitting the PR.

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The core parts of this PR seem okay, when I can find them

@agamemnonc
Copy link
Copy Markdown
Contributor Author

The core parts of this PR seem okay, when I can find them

Thanks for reviewing,

OK, I could revert the formatting changes if that would be preferred (your first two comments above)?
Regarding the modification in exceptions.py please see my response above.

@glemaitre
Copy link
Copy Markdown
Member

@agamemnonc Could you revert the style changes. I can make a review then.

@agamemnonc
Copy link
Copy Markdown
Contributor Author

agamemnonc commented Jan 29, 2019

@agamemnonc Could you revert the style changes. I can make a review then.

OK, done; @glemaitre please review.

Of course, there are now some flake8 warnings on the modified files (mostly due to long lines).

@glemaitre
Copy link
Copy Markdown
Member

glemaitre commented Jan 29, 2019

I think that we have some missing occurrences:
EDIT: sorry I did not checkout the good PR

I also think that we should change the common test:

def check_estimators_unfitted(name, estimator_orig):
"""Check that predict raises an exception in an unfitted estimator.
Unfitted estimators should raise either AttributeError or ValueError.
The specific exception type NotFittedError inherits from both and can
therefore be adequately raised for that purpose.
"""
# Common test for Regressors, Classifiers and Outlier detection estimators
X, y = _boston_subset()
estimator = clone(estimator_orig)
msg = "fit"
if hasattr(estimator, 'predict'):
assert_raise_message((AttributeError, ValueError), msg,
estimator.predict, X)
if hasattr(estimator, 'decision_function'):
assert_raise_message((AttributeError, ValueError), msg,
estimator.decision_function, X)
if hasattr(estimator, 'predict_proba'):
assert_raise_message((AttributeError, ValueError), msg,
estimator.predict_proba, X)
if hasattr(estimator, 'predict_log_proba'):
assert_raise_message((AttributeError, ValueError), msg,
estimator.predict_log_proba, X)

with something like:

@ignore_warnings
def check_estimators_unfitted(name, estimator_orig):
    """Check that predict raises an exception in an unfitted estimator.

    Unfitted estimators should raise a NotFittedError.
    """

    # Common test for Regressors, Classifiers and Outlier detection estimators
    X, y = _boston_subset()

    estimator = clone(estimator_orig)

    msg = ("{} instance is not fitted yet. Call 'fit' with appropriate "
           "arguments".format(estimator.__class__.__name__))
    for method in ('decision_function', 'predict', 'predict_proba',
                   'predict_log_proba'):
        if getattr(estimator, method, None) is not None:
            assert_raises_regex(NotFittedError, msg,
                                getattr(estimator, method), X)

@glemaitre
Copy link
Copy Markdown
Member

@jnothman Do you know why we were checking AttributeError and ValueError instead of directly NotFittedError?

Copy link
Copy Markdown
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check if we don't have redundant tests but you can already address those comments.

@glemaitre glemaitre self-requested a review January 29, 2019 18:53
@glemaitre
Copy link
Copy Markdown
Member

Regarding the tests, I would propose to change the following:

def check_unfitted_feature_importances(name):
assert_raises(ValueError, getattr, FOREST_ESTIMATORS[name](random_state=0),
"feature_importances_")
@pytest.mark.parametrize('name', FOREST_ESTIMATORS)
def test_unfitted_feature_importances(name):
check_unfitted_feature_importances(name)

@pytest.mark.parametrize('name', FOREST_ESTIMATORS)
def test_unfitted_feature_importances(name):
    err_msg = ('This {} instance is not fitted yet. Call 'fit' with appropriate '
               'arguments before using this method.'.format(name))
    pytest.raises(NotFittedError, match=err_msg):
         gettattr(FOREST_ESTIMATORS[name](), 'feature_importances')

@glemaitre
Copy link
Copy Markdown
Member

Even if we don't touch a public API, I would add an entry in the what's new since we modified the tests and error message.

@agamemnonc
Copy link
Copy Markdown
Contributor Author

I think that we have some missing occurrences:
EDIT: sorry I did not checkout the good PR

I also think that we should change the common test:

scikit-learn/sklearn/utils/estimator_checks.py

Lines 1586 to 1616 in fdf2f38

def check_estimators_unfitted(name, estimator_orig):
"""Check that predict raises an exception in an unfitted estimator.

 Unfitted estimators should raise either AttributeError or ValueError. 
 The specific exception type NotFittedError inherits from both and can 
 therefore be adequately raised for that purpose. 
 """ 

 # Common test for Regressors, Classifiers and Outlier detection estimators 
 X, y = _boston_subset() 

 estimator = clone(estimator_orig) 

 msg = "fit" 

 if hasattr(estimator, 'predict'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.predict, X) 

 if hasattr(estimator, 'decision_function'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.decision_function, X) 

 if hasattr(estimator, 'predict_proba'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.predict_proba, X) 

 if hasattr(estimator, 'predict_log_proba'): 
     assert_raise_message((AttributeError, ValueError), msg, 
                          estimator.predict_log_proba, X) 

with something like:

@ignore_warnings
def check_estimators_unfitted(name, estimator_orig):
    """Check that predict raises an exception in an unfitted estimator.

    Unfitted estimators should raise a NotFittedError.
    """

    # Common test for Regressors, Classifiers and Outlier detection estimators
    X, y = _boston_subset()

    estimator = clone(estimator_orig)

    msg = ("{} instance is not fitted yet. Call 'fit' with appropriate "
           "arguments".format(estimator.__class__.__name__))
    for method in ('decision_function', 'predict', 'predict_proba',
                   'predict_log_proba'):
        if getattr(estimator, method, None) is not None:
            assert_raises_regex(NotFittedError, msg,
                                getattr(estimator, method), X)

This is now fixed. thanks!

@glemaitre
Copy link
Copy Markdown
Member

You can use getattr(estimator, method, None) is not None: to get the default to None.
It avoids the try ... except

@agamemnonc
Copy link
Copy Markdown
Contributor Author

Yes, that's what the code currently looks like:

msg = ("{} instance is not fitted yet. Call 'fit' with appropriate "
       "arguments".format(estimator.__class__.__name__))

for method in ('decision_function', 'predict', 'predict_proba',
               'predict_log_proba'):
    if getattr(estimator, method, None) is not None:
        assert_raises_regex(NotFittedError, msg,
                            getattr(estimator, method), X)

The problem is I am not too sure how to include the deprecation here, i.e. cover the case where an Attribute or ValueError is raised instead with the previous error message ("fit"), in order to allow that and issue a DeprecationWarning.

@glemaitre
Copy link
Copy Markdown
Member

Oh I see why you wanted a try except then. Basically use pytest.raises and check https://docs.pytest.org/en/latest/assert.html

I think something around:

with pytest.raises(NotFittedError) as excinfo:
    getattr(estimator, method), X)

if not str(excinfo.value) in msg and 'fit' in str.excinfo.value):
    # raise deprecation waring
else:
    assert 'fit' in str.excinfo.value

I wrote this pretty quickly. That might be buggy

@jnothman
Copy link
Copy Markdown
Member

Please resolve conflicts with master.

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate the consistent use of check_is_fitted within the library, but I'm not entirely sure we should be forcing all library developers to use the exact same error message. The default message does not, for instance, mention partial_fit.

@jnothman
Copy link
Copy Markdown
Member

jnothman commented May 1, 2019

When you get around to resolving my comments, please also move your change log entry to v0.22.rst as version 0.21 has been released.

@glemaitre glemaitre self-assigned this May 2, 2019
@glemaitre
Copy link
Copy Markdown
Member

@jnothman GaussianProcessRegressor can work without calling fit. I added a tag requires_fit but I am not sure if it would be something that we want.

WDYT?

@jnothman
Copy link
Copy Markdown
Member

jnothman commented May 2, 2019 via email

@glemaitre
Copy link
Copy Markdown
Member

I think we have explicitly tried to support GPs without fit. Otherwise you upset all the bayesians.

And do you think that having the tag requires_fit (default to True) is a good idea?

@glemaitre
Copy link
Copy Markdown
Member

@jnothman could you have another look. Apart of the tag I think that the PR is good to be merged.

Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few niticks.

I think the introduction of the estimator tag is appropriate here.

I'm not pressing "Approve" because TBH I'm not entirely sure what should be done regarding the deprecation, but I'm tending towards LGTM.

@agamemnonc
Copy link
Copy Markdown
Contributor Author

Thanks @NicolasHug for reviewing.

I think I have now addressed the issues you raised, otherwise please let me know.

Copy link
Copy Markdown
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agamemnonc!

@jnothman jnothman merged commit 19192c0 into scikit-learn:master May 7, 2019
@agamemnonc agamemnonc deleted the check_is_fitted_replacements branch May 7, 2019 05:28
@glemaitre
Copy link
Copy Markdown
Member

Thanks @agamemnonc

@agamemnonc
Copy link
Copy Markdown
Contributor Author

Thanks @agamemnonc

Thank you @glemaitre for your input and everyone else for reviewing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Make use of check_is_fitted instead of manual checks

5 participants