[MRG] Add few more tests + Documentation for re-entrant cross-validation estimators by raghavrv · Pull Request #7823 · scikit-learn/scikit-learn

raghavrv · 2016-11-04T10:45:02Z

TODO

Addres Joel's comment GridSearchCV does not work with StratifiedKFold (fails to get_n_splits) #7808 (comment)
Document that multiple split calls may not return identical train/test sets if random_state is not set.
There was a warning that suppressed a test. Using np.testing.assert_equal to test for equality of nested lists fixes that.
~~The word rank in the mock table view of cv_results_ is assumed as a link.~~ Fixed in [MRG + 2] ENH Allow cross_val_score, GridSearchCV et al. to evaluate on multiple metrics #7388
Address Andy's comments
Refactor @lesteve's [MRG + 1] Fix sklearn.model_selection.tests.test_split:test_cv_iterable_wrapper on numpy master #7946 out of this PR

@jnothman @amueller Pl. review :)

…ical

jnothman · 2016-11-05T11:21:39Z

Maybe it's easier just to document under random_state that the parameter is read in each call to split

jnothman

Could you explain the issue with assert_equal a little more? perhaps it needs a comment.

jnothman · 2016-11-05T11:17:40Z

sklearn/model_selection/_search.py


        +------------+-----------+------------+-----------------+---+---------+
-        |param_kernel|param_gamma|param_degree|split0_test_score|...|rank_....|
+        |param_kernel|param_gamma|param_degree|split0_test_score|...| rank... |


Not better as rank_t...?

jnothman · 2016-11-05T11:18:53Z

sklearn/model_selection/_split.py

+
+        Note
+        ----
+


I'd prefer no blank line here...

jnothman · 2016-11-05T11:19:52Z

sklearn/model_selection/_split.py

+        ----
+
+        Multiple calls to the ``split`` method will not return identical
+        training or testing sets unless ``random_state`` is set to an integer


No such param

TimeSeriesSplit has a random_state...

jnothman · 2016-11-05T11:20:15Z

sklearn/model_selection/_split.py

+        ----
+
+        Multiple calls to the ``split`` method will not return identical
+        training or testing sets unless ``random_state`` is set to an integer


only if shuffle=True

raghavrv · 2016-11-06T20:37:10Z

@jnothman Have addressed your comments... Another look please?

jnothman · 2016-11-07T09:12:39Z

sklearn/model_selection/tests/test_split.py

+        np.testing.assert_equal(
+            np.array(list(kf_iter_wrapped.split(X, y))),
+            np.array(list(kf_randomized_iter_wrapped.split(X, y))))
+    except AssertionError:


What's the point?

Ah sorry you asked this before and I failed to give a response!

So the nested lists comparison raises a deprecation warning...

>>> assert_true(np.array(list(kfold.split(X, y))) != np.array(list(kfold.split(X, y)))) DeprecationWarning: elementwise != comparison failed; this will raise an error in the future.

np.testing.assert_equal on the other hand handles list of np.ndarrays gracefully...

Argh wait I forgot the else clause. Sorry for that...

jnothman · 2016-11-07T09:13:50Z

sklearn/model_selection/tests/test_search.py

                       cv=KFold(n_splits=n_splits))
    gs2.fit(X, y)

+    # Give generator as a cv parameter


To be sure, we've not got anything that ensures this will remain a generator. Either use a generator expression or test for its type.

jnothman · 2016-11-07T23:07:55Z

sklearn/model_selection/tests/test_split.py

-                       list(kf_randomized_iter_wrapped.split(X, y)))
-    assert_true(np.any(np.array(list(kf_iter_wrapped.split(X, y))) !=
-                       np.array(list(kf_randomized_iter_wrapped.split(X, y)))))
+    np.testing.assert_array_equal(


still don't understand why you've resorted to np.testing.assert_array_equal

np.testing.assert_array_equal handles nested lists unlike sklearn.testing.assert_array_equal...

Well, I think this should at least be commented on somewhere in the file, if not imported into sklearn.utils.testing and given a different name, or the tests rewritten to do this comparison of nested lists explicitly.

raghavrv · 2016-11-09T20:42:23Z

@jnothman @amueller Reviews?

jnothman · 2016-11-10T00:00:21Z

sklearn/model_selection/tests/test_split.py

-                       list(kf_randomized_iter_wrapped.split(X, y)))
-    assert_true(np.any(np.array(list(kf_iter_wrapped.split(X, y))) !=
-                       np.array(list(kf_randomized_iter_wrapped.split(X, y)))))
+    np.testing.assert_array_equal(


Well, I think this should at least be commented on somewhere in the file, if not imported into sklearn.utils.testing and given a different name, or the tests rewritten to do this comparison of nested lists explicitly.

raghavrv · 2016-11-10T12:28:59Z

sklearn/model_selection/tests/test_split.py

-                       list(kf_randomized_iter_wrapped.split(X, y)))
-    assert_true(np.any(np.array(list(kf_iter_wrapped.split(X, y))) !=
-                       np.array(list(kf_randomized_iter_wrapped.split(X, y)))))
+    # numpy's assert_array_equal properly compares nested lists


@jnothman would this suffice? or would you prefer having this imported into sklearn.utils.testing rather?

I suppose this comment is okay. Ideally I think we want all our asserts to come from one place and have clear naming for where they should be applied.

amueller

We say that the splits are different with random_state=None, but that's not tested anywhere, is it?

amueller · 2016-11-22T19:50:26Z

sklearn/model_selection/_split.py

+        Note
+        ----
+        Multiple calls to the ``split`` method will not return identical
+        training or testing sets if ``random_state`` parameter exists and is


Maybe "if splitting is randomized and the random_state parameter is not set to an integer"? I had to check where this was and what it means for a parameter to exist. Also, is that true? This class can not know how the subclasses treat the random_state.

So maybe rather "this might not be deterministic" which is not very explicit. Maybe describe this in the class docstring for each class and link there?

Maybe it's my lack of coffee, but why does split use _iter_test_masks? Currently we create indices, transform them to booleans and then transform them back to indices. It is for making the negation easy?

Sorry for coming to this very late. What do you suggest as the right thing to do?

"This *may not* be deterministic if `random_state`, if available, is not explicitly set while initializing the class"

Sounds okay to you?

How about "Randomized CV splitters may return different results for each call of split. This can be avoided (and identical results returned for each split) by setting random_state to an integer."

I think this belongs in the narrative docs too.

amueller · 2016-11-22T19:52:17Z

sklearn/model_selection/_split.py

+        Note
+        ----
+        Multiple calls to the ``split`` method will not return identical
+        training or testing sets unless ``random_state`` is set to an integer


I'm not sure what "exists" means here. This class has a random_state parameter.

Btw the if shuffle in the _make_test_fold method seems unnecessary and makes the code harder to follow imho.

~~Same comment as above~~

I am really unable to recollect why I did that :@ :@ :@ Arghh. I guess it was done so different split calls would produce same split? But I guess that was vetoed down (see: #7935).

I'll revert for now. I can reintroduce when someone complains.

Well, personally, I think having multiple split calls produce the same split is a better design, but not one that we currently implement.

amueller · 2016-11-22T20:04:52Z

sklearn/model_selection/tests/test_search.py

        return cv_results

+    # Check if generators as supported as cv and that the splits are consistent
+    np.testing.assert_equal(_pop_time_keys(gs3.cv_results_),


wouldn't it be better to somehow test that the same samples have been used? We could have a model that stores the training data and a scorer that produces a hash of the training data in the model and the test data passed to score? Or is that too hacky for testing?

Yeah that can be done. Maybe I'm being lazy but I feel this is sufficient? I can't see a case where this would pass if different samples were used. The score is quite sensitive to sample order no?

(Especially given that the dataset is make_classification and estimator is LinearSVC and not one of our toy estimators which would return 1 / 0 as the score)

raghavrv · 2017-06-29T16:33:49Z

sklearn/model_selection/tests/test_search.py

            cv_results.pop(key)
        return cv_results

+    # Check if generators as supported as cv and that the splits are consistent


are supported

raghavrv · 2017-07-11T16:07:36Z

Gentle ping @jnothman @amueller :)

(Can we merge this now and address the rest if any later via an issue (maybe someone will pick it up during sprints)? My PR list is huge ;( )

jnothman

Thanks

jnothman · 2017-07-11T23:15:16Z

sklearn/model_selection/_split.py

+        Note
+        ----
+        Multiple calls to the ``split`` method will not return identical
+        training or testing sets if ``random_state`` parameter exists and is


How about "Randomized CV splitters may return different results for each call of split. This can be avoided (and identical results returned for each split) by setting random_state to an integer."

jnothman · 2017-07-11T23:16:36Z

sklearn/model_selection/_split.py

+        Note
+        ----
+        Multiple calls to the ``split`` method will not return identical
+        training or testing sets unless ``random_state`` is set to an integer


Well, personally, I think having multiple split calls produce the same split is a better design, but not one that we currently implement.

jnothman · 2017-07-11T23:17:11Z

sklearn/model_selection/_split.py

+        Note
+        ----
+        Multiple calls to the ``split`` method will not return identical
+        training or testing sets if ``random_state`` parameter exists and is


I think this belongs in the narrative docs too.

jnothman · 2017-07-11T23:18:57Z

sklearn/model_selection/tests/test_search.py

@@ -1075,11 +1076,15 @@ def test_search_cv_results_rank_tie_breaking():
                                cv_results['mean_test_score'][2])


pytest: assert cv_results['mean_test_score'][1] != approx(cv_results['mean_test_score'][2])

:|

How about just: assert_false(np.allclose(cv_results['mean_test_score'][1], cv_results['mean_test_score'][2]))

jnothman · 2017-07-11T23:21:02Z

sklearn/model_selection/tests/test_search.py

+                                 shuffle=True, random_state=0).split(X, y),
+                           GeneratorType))
+
+    # Give generator as a cv parameter


If you really want to test that it's a generator, you should confirm (or ensure by using a generator expression) that it is indeed a generator. Otherwise the implementation in KFold may change and this test is no longer doing the right thing.

you haven't addressed this.

There is a test a few lines above (at L1431), which confirms if KFold indeed returns a GeneratorType

Okay

But then the comment needs to appear before

raghavrv · 2017-07-12T16:12:29Z

Thanks Joel, have addressed your comments

jnothman · 2017-07-13T00:08:29Z

doc/modules/cross_validation.rst

 * To ensure results are repeatable (*on the same platform*), use a fixed value
  for ``random_state``.

+The randomized CV splitters may return different results for each call of


This is described in less clear terms in the preceding bullet point. Please merge them.

jnothman · 2017-07-13T00:09:23Z

sklearn/model_selection/tests/test_search.py

+                                 shuffle=True, random_state=0).split(X, y),
+                           GeneratorType))
+
+    # Give generator as a cv parameter


you haven't addressed this.

amueller · 2017-07-15T16:02:09Z

sklearn/model_selection/_split.py

+        Note
+        ----
+        Randomized CV splitters may return different results for each call of
+        split. This can be avoided (and identical results returned for each


This seems a bit redundant. Maybe "You can make the results identical by setting random_state to an integer"?

amueller · 2017-07-15T21:22:35Z

this looks good but I haven't double checked if it addressed all of @jnothman's comments from the original PR.

raghavrv · 2017-07-16T17:09:18Z

Done. If you guys are happy, this can be merged now. Unless travis decides to give me a headache.

jnothman · 2017-07-16T22:59:09Z

Thanks

…ion estimators (scikit-learn#7823) * DOC Add NOTE that unless random_state is set, split will not be identical * TST use np.testing.assert_equal for nested lists/arrays * TST Make sure cv param can be a generator * DOC rank_ becomes a link when rendered * Use test_... * Remove blank line; Add if shuffle is True * Fix tests * Explicitly test for GeneratorType * TST Add the else clause * TST Add comment on usage of np.testing.assert_array_equal * TYPO * MNT Remove if ; * Address Joel's comments * merge the identical points in doc * DOC address Andy's comments * Move comment to before the check for generator type

raghavrv added 3 commits November 4, 2016 11:19

DOC Add NOTE that unless random_state is set, split will not be ident…

3e70301

…ical

TST use np.testing.assert_equal for nested lists/arrays

791766f

TST Make sure cv param can be a generator

5b226fe

raghavrv changed the title ~~Model selection touchups~~ [MRG] Add few more tests + Documentation Nov 4, 2016

raghavrv added this to the 0.18.1 milestone Nov 4, 2016

raghavrv added the Waiting for Reviewer label Nov 4, 2016

DOC rank_ becomes a link when rendered

4f18825

jnothman requested changes Nov 5, 2016

View reviewed changes

raghavrv added 2 commits November 6, 2016 21:34

Use test_...

9f48382

Remove blank line; Add if shuffle is True

29eef94

jnothman requested changes Nov 7, 2016

View reviewed changes

raghavrv added 2 commits November 7, 2016 13:47

Fix tests

cb5ff5d

Explicitly test for GeneratorType

1c6d169

jnothman reviewed Nov 7, 2016

View reviewed changes

jnothman requested changes Nov 10, 2016

View reviewed changes

raghavrv added 2 commits November 10, 2016 13:24

TST Add the else clause

d355dca

TST Add comment on usage of np.testing.assert_array_equal

dff2f5a

raghavrv commented Nov 10, 2016

View reviewed changes

raghavrv modified the milestones: 0.19, 0.18.1 Nov 14, 2016

amueller reviewed Nov 22, 2016

View reviewed changes

jnothman mentioned this pull request Nov 29, 2016

[MRG + 1] Fix sklearn.model_selection.tests.test_split:test_cv_iterable_wrapper on numpy master #7946

Merged

raghavrv removed the Waiting for Reviewer label Nov 29, 2016

amueller changed the title ~~[MRG] Add few more tests + Documentation~~ [MRG] Add few more tests + Documentation for re-entrant cross-validation estimators Jun 9, 2017

Merge master

0b1f749

raghavrv commented Jun 29, 2017

View reviewed changes

TYPO

af8107d

Merge master

1469adb

Merge branch 'master' into model_selection_touchups

91963f4

raghavrv force-pushed the model_selection_touchups branch from 89593cd to 91963f4 Compare July 11, 2017 21:28

jnothman reviewed Jul 11, 2017

View reviewed changes

Address Joel's comments

f02d50e

jnothman reviewed Jul 13, 2017

View reviewed changes

merge the identical points in doc

8a91359

amueller reviewed Jul 15, 2017

View reviewed changes

DOC address Andy's comments

a234697

Move comment to before the check for generator type

b4c633f

jnothman merged commit 75d6005 into scikit-learn:master Jul 16, 2017

raghavrv deleted the model_selection_touchups branch July 17, 2017 19:57

jnothman mentioned this pull request Dec 9, 2017

StratifiedKFold with shuffle is not reproducible in 0.19. #10274

Closed

qinhanmin2014 mentioned this pull request Mar 7, 2018

[MRG+1] DOC note change in StratifiedKFold shuffle in 0.19 #10768

Merged

This was referenced Feb 9, 2019

sklearn.model_selection.StratifiedKFold either shuffling is wrong or documentation is misleading #13110

Closed

FIX Shuffle each class's samples with different random_state in StratifiedKFold #13124

Merged

rowan-stein mentioned this pull request Dec 25, 2025

StratifiedKFold shuffle shuffles only batch order; within-stratum permutations are coordinated across classes (bug) agyn-sandbox/scikit-learn#21

Open

		@@ -1075,11 +1076,15 @@ def test_search_cv_results_rank_tie_breaking():
		cv_results['mean_test_score'][2])

Uh oh!

Conversation

raghavrv commented Nov 4, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Nov 5, 2016

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavrv commented Nov 6, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavrv Nov 7, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavrv commented Nov 9, 2016

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

amueller left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavrv Jun 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavrv Jun 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

raghavrv Jun 29, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

raghavrv commented Nov 4, 2016 •

edited

Loading

raghavrv Nov 7, 2016 •

edited

Loading

raghavrv Jun 29, 2017 •

edited

Loading

raghavrv Jun 29, 2017 •

edited

Loading

raghavrv Jun 29, 2017 •

edited

Loading

raghavrv commented Jul 11, 2017 •

edited

Loading