[MRG] Prototype 3 for strict check_estimator mode#17252
[MRG] Prototype 3 for strict check_estimator mode#17252NicolasHug wants to merge 6 commits intoscikit-learn:masterfrom
Conversation
|
Third time's the charm, right? This is my preferred version so far. |
adrinjalali
left a comment
There was a problem hiding this comment.
This looks very neat to me! Thanks @NicolasHug
sklearn/tests/test_common.py
Outdated
|
|
||
| @parametrize_with_checks([LogisticRegression(), NuSVC()], strict_mode=False) | ||
| def test_strict_mode_parametrize_with_checks(estimator, check): | ||
| # Not sure how to test parametrize_with_checks correctly?? |
There was a problem hiding this comment.
would it be possible to have a dummy estimator to check the strict_mode={False, True}? We probably shouldn't rely on the existing estimators here, we may end up fixing them :P
There was a problem hiding this comment.
These dummy estimators that we'd create would still need to depend on some of our builtin estimators, otherwise the checks won't pass. So wouldn't that just shift the issue?
…rict_mode_using_xfails_tag
rth
left a comment
There was a problem hiding this comment.
This version treats strict checks as if they were in the _xfail_checks tag of the estimator.
I think it's a good idea! Just one comment otherwise looks good,
| strict_checks = { | ||
| _set_check_estimator_ids(check): | ||
| 'The check is strict and strict mode is off' # the reason | ||
| for check in _STRICT_CHECKS |
There was a problem hiding this comment.
I think it might be earlier to maintain if it was a decorator on the checks rather than a global list of functions. Didn't we try that in one of the N earlier versions or were there arguments against it ? :)
Edit: although maybe it would make the traceback harder to read?
There was a problem hiding this comment.
Didn't we try that in one of the N earlier versions
One version required a global var, and another one required to pass a strict param to every single check, so they're less ideal.
I guess we could have a @is_strict decorator that would set an attribute on the check, but it seems a bit overkill?
There was a problem hiding this comment.
OK, let's start with the current approach, we can always add decorators later.
jnothman
left a comment
There was a problem hiding this comment.
Yes, this is much more elegant, but I'm still not certain that it should be a param to check_estimator, when estimator check conditions are generally defined by estimator tags.
rth
left a comment
There was a problem hiding this comment.
LGTM, thanks @NicolasHug !
+1 for https://github.com/scikit-learn/scikit-learn/pull/17252/files#r427283093 to maybe creating dummy estimators for tests instead of relying on the behavior of existing estimators.
|
|
||
| if not strict_mode: | ||
| strict_checks = { | ||
| _set_check_estimator_ids(check): |
There was a problem hiding this comment.
This one is a bit unfortunate. I mean we already have the raw function objects, but I guess not re-using this mechanism means we have to handle partials functions. So it's probably OK for now.
There was a problem hiding this comment.
Alternatively we can also directly use _set_check_estimator_ids(check) in the _STRICT_CHECKS dict? that would be a dict of strings instead of a dict of functions
I'm only using the same logic as for the _xfail_checks tag (which I find... a bit sloppy, I'll give you that ;) )
There was a problem hiding this comment.
Alternatively we can also directly use _set_check_estimator_ids(check) in the _STRICT_CHECKS dict?
No strong feeling about it. It's more a side comment, I'm OK with the proposed solution as well.
It's true that we could also set |
…rict_mode_using_xfails_tag
|
Thanks for the reviews and comments, I've cleaned it up and this is now MRG! |
From the perspective of a third party developer, I find it more intuitive to develop an estimator which follows the API, but doesn't necessarily pass all the tests, and only test for the ones that I think are relevant, like API checks. This doesn't mean estimator tags are irrelevant, they're still pretty nice to inspect them and make decisions based on it in a meta-estimator for instance, but that may not apply to all the tags we have right now. I think a nice path forward would be to group tests into "api", "sanity", "input validation", "sample weight", etc., and have a def check_esimator(..., check_api=True, check_sanity=True, ...):
...Then, in our common tests, we can have something like: CHECK_ESTIMATOR_PARAMS = {
'SVC': {check_sample_weight=False, ...},
...
}And use it when calling check_estimator for our estimators. |
I agree with @adrinjalali that it would be good to have as parameter (and apparently that's what we just agreed on in the meeting). I think it's quite important to allow strict and non-strict parts in a single test, say for the error messages. |
Closes #14929
Fixes #13969
Fixes #16241
Alternative to #16882 and #16890
This version treats strict checks as if they were in the
_xfail_checkstag of the estimator.If we want to, I think this can be extended reasonably easily so that checks have a strict part and a non-strict part.