ENH Adds support to xfail in check_estimator.#16963
ENH Adds support to xfail in check_estimator.#16963thomasjpfan wants to merge 20 commits intoscikit-learn:masterfrom
Conversation
|
I'm just concerned about complexity of that test logic. IMO if the test is marked as xfail it should be skipped in check_estimator (possibly with a warning) and that's it. If users want nuanced xfail behavior they should use a proper testing framework i.e. pytest with the Sorry I should have commented on the issue earlier. |
|
Since there is indeed a bug, I would just propose to bypass the test with possibly a warning that it was skipped. That's what We are using more advanced pytest features to run the tests and check the output status as XPASS/XFAIL, but we shouldn't need to re-implement it for check_estimator IMO. Or anything else that allows fixing the linked issue without adding too much complexity/code. |
|
+1 to keep the logic simple here and just not run the checks marked as xfail. |
|
Updated PR with (hopefully) simpler logic. |
| # When a _xfail_checks check passes, raise an error stating that the test | ||
| # passes and can be removed from the tag |
There was a problem hiding this comment.
My understanding was that we dont't care about that scenario because we don't want to reimplement pytest's logic?
There was a problem hiding this comment.
I changed it to issuing a warning.
From an implementation point of view, it only takes 2 lines to implement this logic in _check_warns_on_fail. I would not want to keep silent about this, since a developer would not know if a xfail check is now passing.
sklearn/utils/estimator_checks.py
Outdated
| if not hasattr(estimator, "_get_tags"): | ||
| return {} |
There was a problem hiding this comment.
why is this needed? estimators inheriting from BaseEstimator should have a _get_tags method
There was a problem hiding this comment.
The following
would fail at the incorrect place.
There was a problem hiding this comment.
I think we should just change the test then. Which is what I did for the one just below. The previous error message wasn't that great either anyway
sklearn/utils/estimator_checks.py
Outdated
| """Get xfail_check from estimator""" | ||
| if isinstance(estimator, type): | ||
| # try to construct estimator instance, if it is unable to then | ||
| # return then ignore xfail tag |
There was a problem hiding this comment.
| # return then ignore xfail tag | |
| # ignore xfail tag |
sklearn/utils/estimator_checks.py
Outdated
|
|
||
|
|
||
| def _get_xfail_checks(estimator): | ||
| """Get xfail_check from estimator""" |
There was a problem hiding this comment.
| """Get xfail_check from estimator""" | |
| """Get checks marked with xfail_checks tag from estimator""" |
sklearn/utils/estimator_checks.py
Outdated
| """Mark (estimator, check) pairs with xfail according to the | ||
| _xfail_checks_ tag""" |
sklearn/utils/estimator_checks.py
Outdated
| return estimator._get_tags()['_xfail_checks'] or {} | ||
|
|
||
|
|
||
| def _check_warns_on_fail(estimator, check, xfail_checks_tag): |
There was a problem hiding this comment.
Do we really need to pass the estimator? It looks like the logic only affects the check.
There was a problem hiding this comment.
It was not needed.
To address https://github.com/scikit-learn/scikit-learn/pull/16963/files#r414661838 it would be needed now.
sklearn/utils/estimator_checks.py
Outdated
| return estimator._get_tags()['_xfail_checks'] or {} | ||
|
|
||
|
|
||
| def _check_warns_on_fail(estimator, check, xfail_checks_tag): |
There was a problem hiding this comment.
call this _make_check_warn_on_fail?
sklearn/utils/estimator_checks.py
Outdated
| warnings.warn(f"{check_name} passed, it can be removed " | ||
| f"from the _xfail_check tag", SkipTestWarning) |
There was a problem hiding this comment.
should we also print the estimator name?
sklearn/utils/estimator_checks.py
Outdated
| # did not fail | ||
| check_name = _set_check_estimator_ids(check) | ||
| warnings.warn(f"{check_name} passed, it can be removed " | ||
| f"from the _xfail_check tag", SkipTestWarning) |
There was a problem hiding this comment.
Should this be a FutureWarning?
sklearn/utils/estimator_checks.py
Outdated
|
|
||
|
|
||
| def _check_warns_on_fail(estimator, check, xfail_checks_tag): | ||
| """Convert assertion errors to warnings if reason is given""" |
There was a problem hiding this comment.
We can probably provide more details here:
| """Convert assertion errors to warnings if reason is given""" | |
| """ | |
| Wrap the check so that a warning is raised: | |
| - when the check is in the `xfail_checks` tag and the check properly failed as expected | |
| - when the check is in the `xfail_checks` tag and the check didnt fail but should have | |
| Checks that aren't in the xfail_checks tag aren't wrapped and are returned as-is. | |
| This wrapper basically simulates what pytest would do with the @xfail decorator, but this one can be used with check_estimator() which doesn't rely on pytest. | |
| """ |
sklearn/utils/estimator_checks.py
Outdated
| return estimator, check | ||
|
|
||
| reason = xfail_checks_tag[check_name] | ||
| name = _get_estimator_name(estimator) |
There was a problem hiding this comment.
| name = _get_estimator_name(estimator) | |
| est_name = _get_estimator_name(estimator) |
sklearn/utils/estimator_checks.py
Outdated
| return estimator | ||
|
|
||
|
|
||
| def _get_estimator_name(estimater): |
There was a problem hiding this comment.
| def _get_estimator_name(estimater): | |
| def _get_estimator_name(estimator): |
sklearn/utils/estimator_checks.py
Outdated
| # try to construct estimator instance, if it is unable to then | ||
| # return xfail tag |
There was a problem hiding this comment.
| # try to construct estimator instance, if it is unable to then | |
| # return xfail tag | |
| # try to construct estimator instance, if it is unable to then | |
| # xfail_checks tag is ignored``` |
| # skips check_estimators_fit_returns_self based on _xfail_checks | ||
|
|
||
| assert_warns_message(SkipTestWarning, "This is a bad check", | ||
| check_estimator, LRXDoesNotRaise) |
| assert_warns_message(FutureWarning, | ||
| "LRXFailTags:check_complex_data passed, it can " | ||
| "be removed from the _xfail_check tag", | ||
| check_estimator, LRXFailTags) |
| def fit(self, X, y): | ||
| # do not raise error for complex check | ||
| try: | ||
| return super().fit(X, y) | ||
| except ValueError as e: | ||
| if "Complex data not supported" not in str(e): | ||
| raise | ||
|
|
||
| return self |
There was a problem hiding this comment.
Sorry I don't understand what this class and the check below are supposed to test
Also what's the relation with check_estimators_fit_returns_self?
There was a problem hiding this comment.
The comment needed to be updated to:
# skips check_complex_data based on _xfail_checks| check_estimator, MyEstimator) | ||
|
|
||
|
|
||
| class LRXFailTags(LogisticRegression): |
There was a problem hiding this comment.
Should this be called LRXFailTagButCheckPasses
|
|
||
| for estimator, check in checks_generator: | ||
| check_name = _set_check_estimator_ids(check) | ||
| if check_name in xfail_checks_tag: |
There was a problem hiding this comment.
Any reason to remove the previous comments? These can't hurt IMO
| return self | ||
|
|
||
|
|
||
| def test_check_estimator_xfail_tag_skips(): |
There was a problem hiding this comment.
My understanding is that check_estimator does not skip anything?
There was a problem hiding this comment.
This checks that it raises a "SkipTestWarning". Renamed to test_check_estimator_xfail_tag_raises_skip_test_warning.
|
|
||
|
|
||
| def test_check_estimator_xfail_tag_skips(): | ||
| # skips check_complex_data based on _xfail_checks |
There was a problem hiding this comment.
Can't we just check on an estimator with a properly set xfail_tags instead, like the Dummy one?
I don't understand why we need to create a new estimator here
There was a problem hiding this comment.
I was thinking that if the estimator gets fixed and the xfail_tags gets removed this test would start failing.
Changed to use Dummy.
|
I think I'd be OK with the proposed changes but I still agree with @rth that it would be simpler to just purely ignore the xfail_checks in |
sklearn/utils/estimator_checks.py
Outdated
| try: | ||
| check(*args, **kwargs) | ||
| except AssertionError: | ||
| warnings.warn(reason, SkipTestWarning) |
There was a problem hiding this comment.
I find it a bit strange to raise a SkipTestWarning when the test wasn't actually skipped: the check was run, but we got a failure as expected.
There was a problem hiding this comment.
I do not think we have a good warning for "expected to fail".
There was a problem hiding this comment.
Maybe a fairly ambiguous "UserWarning".
rth
left a comment
There was a problem hiding this comment.
Thanks! A couple of comments, where I agree is that it's definitely not the most straightforward part of the codebase. Maybe some type annotation couldn't have hurt here, just for redabilty, particularly for e.g. Estimator input that's actually either a class or an instance. Not asking for this PR but it could have been nice.
sklearn/utils/estimator_checks.py
Outdated
| # got a class | ||
| return estimator.__name__ | ||
| # got an instance | ||
| return type(estimator).__name__ |
There was a problem hiding this comment.
Could you re-use this function in _set_check_estimator_ids where there is currently redundant code?
There was a problem hiding this comment.
_set_check_estimator_ids gets the ID from an estimator using str.
I removed this function to reduce the size of the diff, since it is not really needed anymore.
sklearn/utils/estimator_checks.py
Outdated
| def wrapped(*args, **kwargs): | ||
| try: | ||
| check(*args, **kwargs) | ||
| except AssertionError: |
There was a problem hiding this comment.
xfailed tests could fail with any exception, not just AssertionError
|
This will need an update considering that classes aren't supported anymore. That being said, I proposed an alternative in #17219 |
Reference Issues/PRs
Fixes #16958
What does this implement/fix? Explain your changes.
Allows
check_estimatorto warn according to_xfail_checkstag._xfail_checksand an assert is raise, then a warning will appear._xfail_checksand NO assert is raised, then a warning is raised stating that the check can now be removed from_xfail_checks.