[WIP] Adding tests for estimators implementing partial_fit and a few other related fixes / enhancements#3907
Conversation
|
That seems a nice list (although testing that models are the same currently is done by testing the output of I'm not altogether certain about the |
|
What is the current semantics? |
7284706 to
8bfbd75
Compare
|
I am stuck with the 2nd alone... The case when ( assuming that the equality checks for the accuracy ( Kindly review my current implementation of the other tests. |
c3089fd to
ee13683
Compare
|
You should probably use predict/transform directly rather than score, as do the existing invariance tests. I think |
|
You're assuming all def _assert_same_model_method(method, X, estimator1, estimator2, msg):
if hasattr(estimator1, method):
assert hasattr(estimator2, method), '{!r} has transform, but {!r} does not'.format(estimator1, estimator2)
assert_array_almost_equal(getattr(estimator1, method)(X), getattr(estimator2, method)(X), 2, 'When testing {}: {}'.format(method, msg))
def assert_same_model(X, estimator1, estimator2, msg):
_assert_same_model_method('predict')
_assert_same_model_method('transform') |
sklearn/utils/estimator_checks.py
Outdated
There was a problem hiding this comment.
Even I thought so... but this seems to be the convention followed by other checks...
Could I change all the others uniformly to Estimator too or I'll change this one alone?
Also what does Alg mean btw?
There was a problem hiding this comment.
Feel free to change everything to Estimator, I might have picked the name, but don't have strong opinions about it.
|
@jnothman For your 2nd comment, thanks a lot for the review... I'll make the changes :) |
@amueller I'm not quite sure. I think it should either raise an error or it should overwrite and do a new fit from the beginning, if one calls partial_fit after fit. If partial_fit starts from where fit left, then it does not make a difference whether you fit or partial_fit, right? ;) |
|
@MechCoder @amueller From what I observed But as you and others suggested, perhaps |
|
We could either.
I'm inclined towards the second one but I'm not sure what is the best thing to do. |
|
I feel 2b is already implemented... I guess its between 2a or 1... Anyway thanks for your comments :) |
|
A code revision related question... When revisions are suggested to the code, am I expected to add a new commit reflecting those revision or do I (soft) reset my previous work, make the changes and recommit? |
|
I think, you can do it either way. But if those revisions are minor, and if you feel it worthless to add a new commit for those (like pep8, cosmits and stuff), you can just amend your commit and push by force (at least, that's what I do). |
|
Thanks... I'll also follow the same :) |
562bed4 to
133778a
Compare
|
The point is that the interaction between |
|
@jnothman To your earlier comment,
Could you expand a little by what you meant? I understood the part where you said all those estimators should not be tested assuming they are predictors and that we should separately test them based on whether they are a Transformer or Predictor ... But I fail to understand the two code snippets... [ @amueller perhaps such a functionality could be useful for So for estimators that are not Transformers: And for Transformers with Excuse me if I understood your comment incorrectly... |
133778a to
bcb342d
Compare
partial_fitpartial_fit and tests for utils.testing.all_estimators function.
672b06a to
3baac44
Compare
Noted Thanks :)) |
sklearn/tests/test_common.py
Outdated
There was a problem hiding this comment.
why did you do this in its own test? This means they are not part of check_estimator, right?
There was a problem hiding this comment.
refactored (haven't pushed yet) :)
|
I'm not entirely convinced of the usefulness of a general |
sklearn/utils/testing.py
Outdated
There was a problem hiding this comment.
I don't understand that. Why are these skipped? What is the point of the test if we skip these?
There was a problem hiding this comment.
Spectral embedding is still non-deterministic due to the zero eigen value problem which I am attempting to solve at #4299 ;)
I forgot the algorithm which uses the weights_ attribute ;) will comment on why I had skipped it for checking shortly :)
There was a problem hiding this comment.
In sklearn/utils/testing.py
#3907 (comment):
- any_method_differed = any((t1, t2, t3, t4))
- if not any_method_differed:
# If all methods return similar results for both ests or if those# methods are not presentassert_fitted_attributes_not_equal(estimator1, estimator2)+def _attributes_equal(estimator1, estimator2):
- """Helper function to check if fitted model attributes are equal or not
- Raises AssertionError if the attributes are not equal.
- """
A list of attributes which are known to be inconsistent.
- skip_attributes = ('embedding_', 'weights_')
Spectral embedding is still non-deterministic due to the zero eigen
value problem which I am attempting to solve at #4299
#4299 ;)I forgot which algorithm uses this |weights_| attribute ;) will
comment why I put the same in this list shortly :)You should not skip it for all estimators. Hard-coding this in this
place is really bad!
If you know your estimator is a predictor, then yes. The I'd be interested in seeing these helpers moved to a smaller PR, like #4162, so that it can be reviewed and merged without a large change to the tests. Then work on merging this PR, and refactoring existing tests to use the helpers can begin. |
|
Yes that will be really better! I'll cherry pick the helpers into a new branch and raise a PR!! |
FIX Return self in partial_fit of BernoulliRBM
* Check if no error is raised when fit is called with a different number of features * Check if estimator is reset when fit is called after a partial_fit * Check if error is raised when no of features changes between different partial_fit calls * Check if partial_fit does not overwrite the previous model * Check if classes argument is parsed and validated correctly * Check if clone(estimator).partial_fit == estimator.partial_fit * Check if partial_fit returns self
|
Not sure what's the state of this PR at the moment, but please consider #5416. In the case of scalers, the issue came from the fact that we are now implementing Ping @lesteve |
|
Is this PR just waiting for review or is there still work to be done? |
|
This is gonna need a fresh start if we're going to do it. And we'd need to do significant work on the semantics of |

Fixes #3896
New
partial_fittests:The below tests are based off @arjoly's suggestion.
partial_fitreturnsself. - Thanks @jnothman for the suggestion.assert clone(est).partial_fit == est.partial_fitfitafter a set offit/partial_fitrestarts the estimator.partial_fitand thenfitwith a different number of features works without raising any Exceptions... - Thanks @amueller for the suggestionpartial_fitdoes not overwrite the previous model.partial_fit.classesargument andnp.unique(y_i)raisesValueError.classesis not specified during firstpartial_fitcall.classesis not specified during subsequent calls._validate_y_against_classes- Check label mismatch betweenyandclassesargassert_same_model(X, est1, est2)assert_not_same_model(X, est1, est2)predict,transform,decision_functionandpredict_probato check equality of models.sklearn.utils.estimator_checks_partial_fitand_fitto use the appropriate parameters.assert_attributes_equal(est1, est2)assert_attributes_not_equal(est1, est2)assert_array_not_equalRefs
Also see #406 - This PR fixes 1st under not so easy for pfit-able estimators, via 3a ( of this PR )
This change is