MRG adding test of fit attributes#16286
Conversation
rth
left a comment
There was a problem hiding this comment.
Thanks, a few comments below! Should be move the status to MRG?
| 'StackingRegressor', 'TfidfVectorizer', 'VotingClassifier', | ||
| 'VotingRegressor'] | ||
| if Estimator.__name__ in IGNORED or Estimator.__name__.startswith('_'): | ||
| pytest.xfail( |
There was a problem hiding this comment.
Maybe pytest.skip here since realistically we don't intend to make these work in the future.
| X_reg -= X_reg.min() | ||
|
|
||
| if is_classifier(est): | ||
| X, y = X_classif, y_classif |
There was a problem hiding this comment.
Nit: maybe call make_classification / regression here only when necessary, and then,
X -= X.min()once.
| est.fit(X, y) | ||
|
|
||
| for attr in attributes: | ||
| desc = ' '.join(attr.desc).lower() |
There was a problem hiding this comment.
Maybe we could add a comment what this checks for since it's not very clear after reading the ode.
| continue | ||
| if attr.startswith('_'): | ||
| continue | ||
| assert attr in fit_attr_names |
There was a problem hiding this comment.
I think here it might be better to the filtered fit_attr (with removed private attributes and known exceptions), than do,
undocumented_attrs = set(fit_attr_names).difference(fit_attr)
assert not undocumented_attrs, "Undocumented attributes: {}".format(undocumented_attrs)that way all the undocumented attributes are printed at once, and the user doesn't have to iteratively run this test.
| fit_attr = [k for k in est.__dict__.keys() if k.endswith('_')] | ||
| fit_attr_names = [attr.name for attr in attributes] | ||
| for attr in fit_attr: | ||
| if attr in ['X_offset_', 'X_scale_', 'fit_', 'partial_fit_', 'x_mean_', |
There was a problem hiding this comment.
Maybe let's add a comment that these should be removed from the public API.
…uded in new test for doc of all attributes
…tion of tag requires_positive_X broke test of sklearn/tests/test_common.py
…into doc_check_attributes_fit
|
thx @judithabk6 for taking over. @rth I addressed your last comments. I think it's good enough from my end. more reviews are welcome |
|
@judithabk6 @agramfort the failing test is related to #16545 : could you please sync with upstream? This will hopefully solve the issue. Thanks! |
rth
left a comment
There was a problem hiding this comment.
A few minor comments otherwise LGTM, thanks!
| est.k = 2 | ||
|
|
||
| if Estimator.__name__ == 'DummyClassifier': | ||
| est.strategy = "stratified" |
There was a problem hiding this comment.
I think the following might work for a larger number of estimators,
from sklearn.utils.estimator_checks import _construct_instance, _set_checking_parameters
est = _construct_instance(Estimator)
_set_checking_parameters(est)Not asking to do it now, I can change it in a follow up PR.
| 'SkewedChi2Sampler'} | ||
| if Estimator.__name__ in IGNORED: | ||
| pytest.xfail( | ||
| reason="Classifier has too many undocumented attributes.") |
There was a problem hiding this comment.
FYI: we can now also put these in _xfail_test estimator tag for individual estimators (https://scikit-learn.org/dev/developers/develop.html#estimator-tags) but it's not critical. Not asking to do it.
|
Now tests fails because |
rth
left a comment
There was a problem hiding this comment.
LGTM, assuming CI passes. Thanks!
|
Merging +1 as this is fairly low risk (extends an existing test). Thanks! |
|
Yay! |
What does this implement/fix? Explain your changes.
aims to makes sure that all attributes that appear in the fit are documented and vice versa.