[MRG + 1] Add test for __dict__#7553
[MRG + 1] Add test for __dict__#7553jnothman merged 3 commits intoscikit-learn:masterfrom kiote:feature-test-__dict__
Conversation
|
We certainly change the |
|
Got it. Now I'm checking that
|
|
You should check any of |
|
(Before deprecation of the feature selection transformation, there certainly should have been models that have all of these, such as logistic regression) |
|
okay, what I've done by now:
Can you please tell if it's closer to the desired result or not? Thanks! |
|
I've added some special cases for
error. Any suggestions? |
sklearn/utils/estimator_checks.py
Outdated
| def check_dict(name, Estimator): | ||
| rnd = np.random.RandomState(0) | ||
| if name in ['SpectralCoclustering']: | ||
| return 0 |
There was a problem hiding this comment.
just "return", no 0.
The issues with SpectralCoclustering and SpectralBiclustering may be resolved in #6141
sklearn/utils/estimator_checks.py
Outdated
|
|
||
|
|
||
| @ignore_warnings | ||
| def check_dict(name, Estimator): |
| if name in ['RANSACRegressor']: | ||
| X = 3 * rnd.uniform(size=(20, 3)) | ||
| else: | ||
| X = 2 * rnd.uniform(size=(20, 3)) |
There was a problem hiding this comment.
We can't use 3 * in all cases?
There was a problem hiding this comment.
If I use 3 * for all, I get ValueError: _BinaryGaussianProcessClassifierLaplace supports only binary classification. y contains classes [0 1 2]. Which is sounds fair enough.
With 2 * for all I get on the other side ValueError: No inliers found, possible cause is setting residual_threshold (None) too low. for RANSACRegressor :(
sklearn/utils/estimator_checks.py
Outdated
| set_testing_parameters(estimator) | ||
|
|
||
| if hasattr(estimator, "n_components"): | ||
| estimator.n_components = 3 |
There was a problem hiding this comment.
why 3? other tests with this parameter setting use 1.
There was a problem hiding this comment.
well yes, but for some reason with 1 I catch ValueError: n_best cannot be larger than n_components, but 3 > 1 from SpectralBiclustering
There was a problem hiding this comment.
Hm maybe SpectralBiclustering needs n_best=1?
| estimator = Estimator() | ||
| set_testing_parameters(estimator) | ||
|
|
||
| if hasattr(estimator, "n_components"): |
There was a problem hiding this comment.
*Hmm... I wonder if these should be in set_testing_parameters)
There was a problem hiding this comment.
Can't say anything here, I've seen the same part on several places, for example here: https://github.com/kiote/scikit-learn/blob/feature-test-__dict__/sklearn/utils/estimator_checks.py#L443
So, it looks like it could be moved there, do you think it should?
There was a problem hiding this comment.
yeah feel free to move it there.
sklearn/utils/estimator_checks.py
Outdated
| for method in ["predict", "transform", "decision_function", | ||
| "predict_proba"]: | ||
| if hasattr(estimator, method): | ||
| dict_before = estimator.__dict__ |
There was a problem hiding this comment.
You need to perform .copy() at the end of this. Otherwise, in almost all cases, testing equivalence doesn't test that __dict__ is unchanged, as you're actually comparing an object to itself.
There was a problem hiding this comment.
haha good point, after the real check I have to skip two more estimators: 'NMF' and 'ProjectedGradientNMF', since they actually change __dict__ on given steps.
sklearn/utils/estimator_checks.py
Outdated
| if hasattr(estimator, method): | ||
| dict_before = estimator.__dict__ | ||
| getattr(estimator, method)(X) | ||
| assert_equal(estimator.__dict__, dict_before) |
|
Thanks for working on this, though I suspect you will find many more failures when you correctly perform |
sklearn/utils/estimator_checks.py
Outdated
| rnd = np.random.RandomState(0) | ||
| if name in ['SpectralCoclustering']: | ||
| return 0 | ||
| if name in ['SpectralCoclustering', 'NMF', 'ProjectedGradientNMF']: |
There was a problem hiding this comment.
How do they change the dict? Please leave them in so we can see the error and fix them.
There was a problem hiding this comment.
Both NMF and ProjectedGradientNMF changes n_iter value.
With SpectralCoclustering situation is different, I just can't make it work. It fails with error:
ValueError: Found array with 0 feature(s) (shape=(23, 0)) while a minimum of 1 is required.
sklearn/tests/test_common.py
Outdated
|
|
||
|
|
||
| @ignore_warnings | ||
| def test_predict_does_not_change_estimator_state(): |
There was a problem hiding this comment.
I think you should add this test to the generator that generates all tests in estimator_checks.py and not add a test here. Though for working on it this might be easier.
There was a problem hiding this comment.
got it, I'll remove this from here afterwards then!
There was a problem hiding this comment.
looks good to me once you move this.
|
|
||
| set_random_state(estimator, 1) | ||
|
|
||
| if name in ['SpectralBiclustering']: |
There was a problem hiding this comment.
SpectralBiclustering doesn't take y? I guess that's fixed in #6141.
There was a problem hiding this comment.
yep! those changes helped (I merged with maniteja123:issue6126 branch to check), so I suppose I could remove this after #6141 merge, right?
|
Not really sure, what should I do with failing tests for |
It would be acceptable to skip them for now (by checking for their name) and then creating a new issue for this to be solved. |
sklearn/utils/estimator_checks.py
Outdated
| if not isinstance(estimator, ProjectedGradientNMF): | ||
| estimator.set_params(solver='cd') | ||
|
|
||
| if hasattr(estimator, "n_components"): |
There was a problem hiding this comment.
It seems setting these parameters here is a bad idea. It severely affects other tests.
There was a problem hiding this comment.
Okay, I'll move that back to the concrete test.
How did you check that, btw?
There was a problem hiding this comment.
Click "details" next to "continuous-integration/travis-ci/pr" under "Some changes were not successful"
|
I added some comments right in the code to explain some things, also I'm gong to remove new code from |
|
This looks okay... could you change the title from WIP to MRG and put the code into mergeable form? |
sklearn/utils/estimator_checks.py
Outdated
| def check_dict_unchanged(name, Estimator): | ||
| # these two estimators change the state of __dict__ | ||
| # and need to be fixed | ||
| if name in ['NMF', 'ProjectedGradientNMF']: |
There was a problem hiding this comment.
Actually, it might be simpler to remove this line instead, that should get rid of the issue.
|
done! |
|
can you please rebase? Otherwise LGTM. |
|
okay! now it's rebased from master |
|
Can you check the changed files as shown by github? I think something went wrong during the rebase. |
|
hmm, could you please tell more, what exactly do you mean? Can't see what's wrong there 😕 |
|
@kiote never mind, I didn't have enough coffee... |
|
@jnothman wanna have another look? |
|
Can you please add a test to |
|
done, please see here |
| nls_max_iter=self.nls_max_iter, sparseness=self.sparseness, | ||
| beta=self.beta, eta=self.eta) | ||
|
|
||
| self.n_iter_ = n_iter_ |
There was a problem hiding this comment.
Very awkwardly, there is an Attributes section in the docstring above (and similar ones in the docs for fit and fit_transform!). Please remove it. I suppose this change should also be documented ("Fixed bug where NMF's n_iter_ attribute was set by calls to transform"), though it's a very strange feature.
There was a problem hiding this comment.
I removed all self.n_iter = something in transform, but not really sure where this change should be documented. Could you please point me to the right place for that?
There was a problem hiding this comment.
Our changelog is in doc/whats_new.rst
sklearn/utils/estimator_checks.py
Outdated
| getattr(estimator, method)(X) | ||
| assert_dict_equal(estimator.__dict__, dict_before, | ||
| ('Estimator changes __dict__ during' | ||
| 'predict, transform, decision_function' |
There was a problem hiding this comment.
Could we just specify the one that's being tested?
There was a problem hiding this comment.
Or would we rather list all to make it clear what the API violation is?
There was a problem hiding this comment.
The idea was to list them all, even if only one violates the new rule, but it might be clearer to point directly to the one that's being tested.
sklearn/decomposition/nmf.py
Outdated
| self.l1_ratio = l1_ratio | ||
| self.verbose = verbose | ||
| self.shuffle = shuffle | ||
| self.n_iter_ = 1 |
There was a problem hiding this comment.
without having this attribute here I got
File "/home/travis/sklearn_build_latest/scikit-learn/sklearn/utils/estimator_checks.py", line 1600, in check_transformer_n_iter
assert_greater_equal(estimator.n_iter_, 1)
AttributeError: 'ProjectedGradientNMF' object has no attribute 'n_iter_'
from tests
jnothman
left a comment
There was a problem hiding this comment.
Sorry for the misunderstandings.
| components_ : array-like, shape (n_components, n_features) | ||
| Factorization matrix, sometimes called 'dictionary'. | ||
|
|
||
| n_iter_ : int |
There was a problem hiding this comment.
I was commenting that there should be no Attributes section at all in a method docstring. it belongs in the class docstring.
|
|
||
| self.n_components_ = H.shape[0] | ||
| self.components_ = H | ||
| self.n_iter_ = n_iter_ |
There was a problem hiding this comment.
You should not have removed this. We need it in fit_transform, not in transform
|
okay, now I put back some lines I removed accidentally with I wonder should I add doc about the new check as well? |
sklearn/decomposition/nmf.py
Outdated
| @@ -1066,9 +1059,6 @@ def fit(self, X, y=None, **params): | |||
| components_ : array-like, shape (n_components, n_features) | |||
There was a problem hiding this comment.
Please drop all attributes here too
doc/whats_new.rst
Outdated
| (`#7680 <https://github.com/scikit-learn/scikit-learn/pull/7680>`_). | ||
| By `Ibraim Ganiev`_. | ||
|
|
||
| - Remove params changing inside of `transform` method of :class:`decomposition.NMF` |
There was a problem hiding this comment.
The following would suffice:
- Fixed a bug where :class:`decomposition.NMF` sets its `n_iters_`
attribute in `transform()`. :issue:`7553` by `Ekaterina Krivich`_.
And under Enhancements, something like "check_estimator now attempts to ensure that methods transform, predict, etc. do not set attributes on the estimator."
sklearn/decomposition/nmf.py
Outdated
| self.l1_ratio = l1_ratio | ||
| self.verbose = verbose | ||
| self.shuffle = shuffle | ||
| self.n_iter_ = 1 |
doc/whats_new.rst
Outdated
|
|
||
| - ``check_estimator`` now attempts to ensure that methods transform, predict, etc. | ||
| do not set attributes on the estimator. | ||
| (`#7553 <https://github.com/scikit-learn/scikit-learn/pull/7553>`_) |
There was a problem hiding this comment.
again, please use
:issue:`7533`
There was a problem hiding this comment.
okay, anyway nor my version, not this one is not clickable for me (looks like https://github.com/scikit-learn/scikit-learn/blob/master/doc/whats_new.rst#id17). Not the scope of this issue, obviously. I'll fix that!
doc/whats_new.rst
Outdated
| (`#7553 <https://github.com/scikit-learn/scikit-learn/pull/7553>`_). Since it violates | ||
| new represented rule "estimator state does not change at transform/predict/predict_proba time". | ||
| By `Ekaterina Krivich`_. | ||
| - Fixed a bug where :class:`decomposition.NMF` sets its `n_iters_` |
There was a problem hiding this comment.
to format n_iters_ correctly, you need double-backticks:
``n_iters_``
|
Sigh. Can you update your master, rebase or merge, and fix conflicts in whats_new? |
check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see #7297
that shows that check_estimator fails on an estimator that violates this
|
oh, it's done |
|
Thanks! |
…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform
…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform
…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform
* tag '0.18.1': (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ...
* releases: (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ... Conflicts: removed sklearn/externals/joblib/__init__.py sklearn/externals/joblib/_parallel_backends.py sklearn/externals/joblib/testing.py
* dfsg: (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ...
…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform
…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform
…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform
Reference Issue
see #7297
What does this implement/fix? Explain your changes.
Check that estimator does not change the state of dict during
fitphase.Any other comments?
I add "WIP = work in progress" for this PR, since I'm not sure it does what it should do. So please give me some feedback first :)
After this, I'll add checking for
transformstage as well.