[MRG + 1] Add test for __dict__ by kiote · Pull Request #7553 · scikit-learn/scikit-learn

kiote · 2016-10-03T08:42:13Z

Reference Issue

What does this implement/fix? Explain your changes.

Check that estimator does not change the state of dict during fit phase.

Any other comments?

I add "WIP = work in progress" for this PR, since I'm not sure it does what it should do. So please give me some feedback first :)

After this, I'll add checking for transform stage as well.

amueller · 2016-10-03T13:17:39Z

We certainly change the __dict__ during fit. The issue is not to change it during any other method.

kiote · 2016-10-05T06:59:56Z

Got it. Now I'm checking that __dict__ is not changing during transform.
But that made me have a lot of questions.

I can see that not all estimators have transform method. For now, I just check if it has, is it okay?
I see a lot of deprecation warning like this: "DeprecationWarning: Function transform is deprecated; Support to use estimators as feature selectors will be removed in version 0.19. Use SelectFromModel instead." Which made me think I shouldn't use transform anymore... Right?
Here I had to skip some estimators, because data doesn't fit, or who knows why. Seems like dirty hack, not sure how should I do that in correct way.

amueller · 2016-10-05T19:12:31Z

You should check any of transform, predict, predict_proba and decision_function. I don't think there's a model that has all of these, but you should check any of them if they are available (starting with transform is fine). You can just ignore the deprecation warnings. Check out how that is done in check_estimators.py. Why were the ones failing that you are passing? These are surprising to me.

jnothman · 2016-10-05T23:27:38Z

(Before deprecation of the feature selection transformation, there certainly should have been models that have all of these, such as logistic regression)

kiote · 2016-10-12T07:55:55Z

okay, what I've done by now:

add check_dict method to estimator_checks
call this method in test_common for (almost) all estimators.

Can you please tell if it's closer to the desired result or not? Thanks!

kiote · 2016-10-13T07:50:48Z

I've added some special cases for RANSACRegressor and SpectralBiclustering in the new check. but still need to skip SpectralCoclustering because of

ValueError: Found array with 0 feature(s) (shape=(23, 0)) while a minimum of 1 is required.

error. Any suggestions?

jnothman · 2016-10-13T12:20:22Z

sklearn/utils/estimator_checks.py

+def check_dict(name, Estimator):
+    rnd = np.random.RandomState(0)
+    if name in ['SpectralCoclustering']:
+        return 0


just "return", no 0.

The issues with SpectralCoclustering and SpectralBiclustering may be resolved in #6141

jnothman · 2016-10-13T12:21:21Z

sklearn/utils/estimator_checks.py



 @ignore_warnings
+def check_dict(name, Estimator):


Maybe check_dict_unchanged

jnothman · 2016-10-13T12:21:46Z

sklearn/utils/estimator_checks.py

+    if name in ['RANSACRegressor']:
+        X = 3 * rnd.uniform(size=(20, 3))
+    else:
+        X = 2 * rnd.uniform(size=(20, 3))


We can't use 3 * in all cases?

If I use 3 * for all, I get ValueError: _BinaryGaussianProcessClassifierLaplace supports only binary classification. y contains classes [0 1 2]. Which is sounds fair enough.

With 2 * for all I get on the other side ValueError: No inliers found, possible cause is setting residual_threshold (None) too low. for RANSACRegressor :(

jnothman · 2016-10-13T12:23:02Z

sklearn/utils/estimator_checks.py

+    set_testing_parameters(estimator)
+
+    if hasattr(estimator, "n_components"):
+        estimator.n_components = 3


why 3? other tests with this parameter setting use 1.

well yes, but for some reason with 1 I catch ValueError: n_best cannot be larger than n_components, but 3 > 1 from SpectralBiclustering

Hm maybe SpectralBiclustering needs n_best=1?

thanks, that helped!

jnothman · 2016-10-13T12:23:17Z

sklearn/utils/estimator_checks.py

+    estimator = Estimator()
+    set_testing_parameters(estimator)
+
+    if hasattr(estimator, "n_components"):


*Hmm... I wonder if these should be in set_testing_parameters)

Can't say anything here, I've seen the same part on several places, for example here: https://github.com/kiote/scikit-learn/blob/feature-test-__dict__/sklearn/utils/estimator_checks.py#L443

So, it looks like it could be moved there, do you think it should?

yeah feel free to move it there.

jnothman · 2016-10-13T12:23:59Z

sklearn/utils/estimator_checks.py

+    for method in ["predict", "transform", "decision_function",
+                   "predict_proba"]:
+        if hasattr(estimator, method):
+            dict_before = estimator.__dict__


You need to perform .copy() at the end of this. Otherwise, in almost all cases, testing equivalence doesn't test that __dict__ is unchanged, as you're actually comparing an object to itself.

haha good point, after the real check I have to skip two more estimators: 'NMF' and 'ProjectedGradientNMF', since they actually change __dict__ on given steps.

jnothman · 2016-10-13T12:24:24Z

sklearn/utils/estimator_checks.py

+        if hasattr(estimator, method):
+            dict_before = estimator.__dict__
+            getattr(estimator, method)(X)
+            assert_equal(estimator.__dict__, dict_before)


use assert_dict_equal

jnothman · 2016-10-13T12:25:15Z

Thanks for working on this, though I suspect you will find many more failures when you correctly perform .copy()

amueller · 2016-10-14T18:25:22Z

sklearn/utils/estimator_checks.py

    rnd = np.random.RandomState(0)
-    if name in ['SpectralCoclustering']:
-        return 0
+    if name in ['SpectralCoclustering', 'NMF', 'ProjectedGradientNMF']:


How do they change the dict? Please leave them in so we can see the error and fix them.

Both NMF and ProjectedGradientNMF changes n_iter value.

With SpectralCoclustering situation is different, I just can't make it work. It fails with error:

ValueError: Found array with 0 feature(s) (shape=(23, 0)) while a minimum of 1 is required.

amueller · 2016-10-14T18:26:21Z

sklearn/tests/test_common.py

+
+
+@ignore_warnings
+def test_predict_does_not_change_estimator_state():


I think you should add this test to the generator that generates all tests in estimator_checks.py and not add a test here. Though for working on it this might be easier.

got it, I'll remove this from here afterwards then!

looks good to me once you move this.

amueller · 2016-10-17T21:18:44Z

sklearn/utils/estimator_checks.py

+
+    set_random_state(estimator, 1)
+
+    if name in ['SpectralBiclustering']:


SpectralBiclustering doesn't take y? I guess that's fixed in #6141.

yep! those changes helped (I merged with maniteja123:issue6126 branch to check), so I suppose I could remove this after #6141 merge, right?

kiote · 2016-10-18T06:43:31Z

Not really sure, what should I do with failing tests for NMF and ProjectedGradientNMF estimators. Should I try to fix them in the scope of current issue as well?

jnothman · 2016-10-18T11:37:44Z

what should I do with failing tests for NMF and ProjectedGradientNMF estimators. Should I try to fix them in the scope of current issue as well?

It would be acceptable to skip them for now (by checking for their name) and then creating a new issue for this to be solved.

jnothman

Getting there!

jnothman · 2016-10-18T11:40:55Z

sklearn/utils/estimator_checks.py

        if not isinstance(estimator, ProjectedGradientNMF):
            estimator.set_params(solver='cd')

+    if hasattr(estimator, "n_components"):


It seems setting these parameters here is a bad idea. It severely affects other tests.

Okay, I'll move that back to the concrete test.

How did you check that, btw?

Click "details" next to "continuous-integration/travis-ci/pr" under "Some changes were not successful"

oh I see, thanks!

kiote · 2016-10-19T07:08:42Z

I added some comments right in the code to explain some things, also I'm gong to remove new code from tests/test_common.py when you approve this changes.

jnothman · 2016-10-19T12:32:23Z

This looks okay... could you change the title from WIP to MRG and put the code into mergeable form?

amueller · 2016-10-19T18:29:27Z

sklearn/utils/estimator_checks.py

+def check_dict_unchanged(name, Estimator):
+    # these two estimators change the state of __dict__
+    # and need to be fixed
+    if name in ['NMF', 'ProjectedGradientNMF']:


Actually, it might be simpler to remove this line instead, that should get rid of the issue.

kiote · 2016-10-20T15:45:10Z

done!

amueller · 2016-10-20T15:54:07Z

can you please rebase? Otherwise LGTM.

kiote · 2016-10-21T07:27:00Z

okay! now it's rebased from master

amueller · 2016-10-21T18:31:18Z

Can you check the changed files as shown by github? I think something went wrong during the rebase.

kiote · 2016-10-24T08:14:21Z

hmm, could you please tell more, what exactly do you mean? Can't see what's wrong there 😕

amueller · 2016-10-24T16:26:43Z

@kiote never mind, I didn't have enough coffee...

amueller · 2016-10-24T16:27:50Z

@jnothman wanna have another look?

amueller · 2016-10-26T19:28:28Z

Can you please add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this?

kiote · 2016-10-30T15:44:50Z

done, please see here

jnothman

Otherwise LGTM

jnothman · 2016-10-31T11:08:29Z

sklearn/decomposition/nmf.py

            nls_max_iter=self.nls_max_iter, sparseness=self.sparseness,
            beta=self.beta, eta=self.eta)

-        self.n_iter_ = n_iter_


Very awkwardly, there is an Attributes section in the docstring above (and similar ones in the docs for fit and fit_transform!). Please remove it. I suppose this change should also be documented ("Fixed bug where NMF's n_iter_ attribute was set by calls to transform"), though it's a very strange feature.

I removed all self.n_iter = something in transform, but not really sure where this change should be documented. Could you please point me to the right place for that?

Our changelog is in doc/whats_new.rst

jnothman · 2016-10-31T11:12:58Z

sklearn/utils/estimator_checks.py

+            getattr(estimator, method)(X)
+            assert_dict_equal(estimator.__dict__, dict_before,
+                              ('Estimator changes __dict__ during'
+                               'predict, transform, decision_function'


Could we just specify the one that's being tested?

Or would we rather list all to make it clear what the API violation is?

The idea was to list them all, even if only one violates the new rule, but it might be clearer to point directly to the one that's being tested.

Changed here: 4bd07c7

kiote · 2016-11-01T09:34:47Z

sklearn/decomposition/nmf.py

        self.l1_ratio = l1_ratio
        self.verbose = verbose
        self.shuffle = shuffle
+        self.n_iter_ = 1


without having this attribute here I got

File "/home/travis/sklearn_build_latest/scikit-learn/sklearn/utils/estimator_checks.py", line 1600, in check_transformer_n_iter
assert_greater_equal(estimator.n_iter_, 1)
AttributeError: 'ProjectedGradientNMF' object has no attribute 'n_iter_'

from tests

You can remove this now.

jnothman

Sorry for the misunderstandings.

jnothman · 2016-11-01T10:41:30Z

sklearn/decomposition/nmf.py

        components_ : array-like, shape (n_components, n_features)
            Factorization matrix, sometimes called 'dictionary'.

-        n_iter_ : int


I was commenting that there should be no Attributes section at all in a method docstring. it belongs in the class docstring.

jnothman · 2016-11-01T10:41:32Z

sklearn/decomposition/nmf.py


        self.n_components_ = H.shape[0]
        self.components_ = H
-        self.n_iter_ = n_iter_


You should not have removed this. We need it in fit_transform, not in transform

woops! placed back

kiote · 2016-11-02T08:48:31Z

okay, now I put back some lines I removed accidentally with n_iter_, also removed docstrings you pointed to and add documentation part about this change (mostly about NMF change).

I wonder should I add doc about the new check as well?

jnothman

Otherwise LGTM.

jnothman · 2016-11-02T10:17:59Z

sklearn/decomposition/nmf.py

@@ -1066,9 +1059,6 @@ def fit(self, X, y=None, **params):
        components_ : array-like, shape (n_components, n_features)


Please drop all attributes here too

jnothman · 2016-11-02T10:26:06Z

doc/whats_new.rst

     (`#7680 <https://github.com/scikit-learn/scikit-learn/pull/7680>`_).
     By `Ibraim Ganiev`_.

+   - Remove params changing inside of `transform` method of :class:`decomposition.NMF`


The following would suffice:

- Fixed a bug where :class:`decomposition.NMF` sets its `n_iters_` attribute in `transform()`. :issue:`7553` by `Ekaterina Krivich`_.

And under Enhancements, something like "check_estimator now attempts to ensure that methods transform, predict, etc. do not set attributes on the estimator."

jnothman · 2016-11-02T10:27:00Z

sklearn/decomposition/nmf.py

        self.l1_ratio = l1_ratio
        self.verbose = verbose
        self.shuffle = shuffle
+        self.n_iter_ = 1


You can remove this now.

jnothman · 2016-11-03T11:41:03Z

doc/whats_new.rst


+   - ``check_estimator`` now attempts to ensure that methods transform, predict, etc.
+     do not set attributes on the estimator.
+     (`#7553 <https://github.com/scikit-learn/scikit-learn/pull/7553>`_)


again, please use

:issue:`7533`

okay, anyway nor my version, not this one is not clickable for me (looks like https://github.com/scikit-learn/scikit-learn/blob/master/doc/whats_new.rst#id17). Not the scope of this issue, obviously. I'll fix that!

jnothman · 2016-11-03T11:41:56Z

doc/whats_new.rst

-     (`#7553 <https://github.com/scikit-learn/scikit-learn/pull/7553>`_). Since it violates
-     new represented rule "estimator state does not change at transform/predict/predict_proba time".
-     By `Ekaterina Krivich`_.
+   - Fixed a bug where :class:`decomposition.NMF` sets its `n_iters_`


to format n_iters_ correctly, you need double-backticks:

``n_iters_``

jnothman · 2016-11-03T12:28:11Z

Sigh. Can you update your master, rebase or merge, and fix conflicts in whats_new?

check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see #7297

that shows that check_estimator fails on an estimator that violates this

kiote · 2016-11-04T06:53:02Z

oh, it's done

jnothman · 2016-11-05T10:44:11Z

Thanks!

…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform

* tag '0.18.1': (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ...

* releases: (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ... Conflicts: removed sklearn/externals/joblib/__init__.py sklearn/externals/joblib/_parallel_backends.py sklearn/externals/joblib/testing.py

* dfsg: (144 commits) skip tree-test on 32bit do the warning test as we do it in other places. Replase assert_equal by assert_almost_equal in cosine test version bump 0.18.1 fix merge conflict mess in whatsnew add the python2.6 warning to 0.18.1 fix learning_curve test that I messed up in cherry-picking the "reentrant cv" PR. sync whatsnew with master [MRG] TST Ensure __dict__ is unmodified by predict, transform, etc (scikit-learn#7553) FIX scikit-learn#6420: Cloning decision tree estimators breaks criterion objects (scikit-learn#7680) Add whats new entry for scikit-learn#6282 (scikit-learn#7629) [MGR + 2] fix selectFdr bug (scikit-learn#7490) fixed whatsnew cherry-pick mess (somewhat) [MRG + 2] FIX LogisticRegressionCV to correctly handle string labels (scikit-learn#5874) [MRG + 2] Fixed parameter setting in SelectFromModel (scikit-learn#7764) [MRG+2] DOC adding separate `fit()` methods (and docstrings) for DecisionTreeClassifier and DecisionTreeRegressor (scikit-learn#7824) Fix docstring typo (scikit-learn#7844) n_features --> n_components [MRG + 1] DOC adding :user: role to whats_new (scikit-learn#7818) [MRG+1] label binarizer not used consistently in CalibratedClassifierCV (scikit-learn#7799) DOC : fix docstring of AIC/BIC in GMM ...

…cikit-learn#7553) * Add test for __dict__ for estimator checks check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see scikit-learn#7297 * Add a test to test_check_estimator that shows that check_estimator fails on an estimator that violates this * Fixed bug where NMF's n_iter_ attribute was set by calls to transform

jnothman requested changes Oct 13, 2016

View reviewed changes

amueller reviewed Oct 14, 2016

View reviewed changes

amueller reviewed Oct 17, 2016

View reviewed changes

RPGOne approved these changes Oct 17, 2016

View reviewed changes

jnothman requested changes Oct 18, 2016

View reviewed changes

amueller reviewed Oct 19, 2016

View reviewed changes

kiote changed the title ~~WIP: Add test for __dict__~~ [MRG] Add test for __dict__ Oct 20, 2016

amueller changed the title ~~[MRG] Add test for __dict__~~ [MRG + 1] Add test for __dict__ Oct 20, 2016

amueller added the Waiting for Reviewer label Oct 24, 2016

amueller mentioned this pull request Oct 26, 2016

rng attribute is set in sklearn.gaussian_process.GaussianProcessRegressor at fit time #7752

Closed

amueller mentioned this pull request Oct 26, 2016

ensure that estimators only add private attributes and attributes with trailing _ #7763

Closed

jnothman requested changes Oct 31, 2016

View reviewed changes

kiote commented Nov 1, 2016

View reviewed changes

jnothman requested changes Nov 1, 2016

View reviewed changes

jnothman requested changes Nov 2, 2016

View reviewed changes

jnothman approved these changes Nov 3, 2016

View reviewed changes

jnothman requested changes Nov 3, 2016

View reviewed changes

jnothman approved these changes Nov 3, 2016

View reviewed changes

kiote added 3 commits November 3, 2016 18:00

Add test for __dict__ for estimator checks

b908947

check that "predict", "transform", "decision_function" or "predict_proba" methods do not change the state of __dict__ of any estimator see #7297

Add a test to test_check_estimator

018dbea

that shows that check_estimator fails on an estimator that violates this

Fixed bug where NMF's n_iter_ attribute was set by calls to transform

64adf73

jnothman merged commit 02cc6f5 into scikit-learn:master Nov 5, 2016

kiote mentioned this pull request Nov 11, 2016

[MRG + 1] Add check for estimator: parameters not modified by fit #7846

Merged

rth mentioned this pull request Jun 21, 2019

Common test to check that estimator state does not change at transform/predict/predict_proba time #7297

Closed



		@ignore_warnings
		def test_predict_does_not_change_estimator_state():


		set_random_state(estimator, 1)

		if name in ['SpectralBiclustering']:

		@@ -1066,9 +1059,6 @@ def fit(self, X, y=None, **params):
		components_ : array-like, shape (n_components, n_features)

Uh oh!

Conversation

kiote commented Oct 3, 2016

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

amueller commented Oct 3, 2016

Uh oh!

kiote commented Oct 5, 2016

Uh oh!

amueller commented Oct 5, 2016

Uh oh!

jnothman commented Oct 5, 2016

Uh oh!

kiote commented Oct 12, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kiote commented Oct 13, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Oct 13, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiote Oct 17, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiote Oct 18, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kiote commented Oct 18, 2016

Uh oh!

jnothman commented Oct 18, 2016

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

kiote commented Oct 12, 2016 •

edited

Loading

kiote Oct 17, 2016 •

edited

Loading

kiote Oct 18, 2016 •

edited

Loading

kiote commented Oct 19, 2016 •

edited

Loading