[WIP] Test determinism of estimators by betatim · Pull Request #7270 · scikit-learn/scikit-learn

betatim · 2016-08-27T21:54:24Z

Reference Issue

What does this implement/fix? Explain your changes.

Fit two instances of the same estimator on a toy problem. Use both to predict on an unseen subset of data and compare predictions. If an estimator has a random_state argument provide it, if not then not.

Any other comments?

There are a few estimators that I am skipping at the moment because they fail the test. Most blatantly are not deterministic but two or so have a different error.

Unsure about the location, should this be a check in utils/estimator_checks.py instead?

move to check_deterministic inutils/estimator_checks.py
transformers
unsupervised algorithms
why do HuberRegressor, LogisticRegressionCV, LinearRegression, RANSACRegressor fail?
why does RadiusNeighborsClassifier fail? Failure mode is different to the above

Created a test that fits two copies of each estimators and compares their predictions on unseen data.

betatim · 2016-08-27T21:55:09Z

sklearn/tests/test_common.py

+    X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.5)
+
+    for name, Estimator in all_estimators(type_filter=['classifier',
+                                                       'regressor']):


transformers and unsupervised still to do

TomDLT · 2016-08-29T09:54:03Z

sklearn/tests/test_common.py

+        est1.fit(X_train, y_train)
+        est2.fit(X_train, y_train)
+
+        assert_array_almost_equal(est1.predict(X_test),


Shouldn't this be assert_array_equal instead of assert_array_almost_equal ?

This works for classifiers and regressors, for the latter we need almost equal.

But can we call deterministic a method that does not give always the exact same result?

on a machine with floating point precision? I would.

TomDLT · 2016-08-29T10:05:12Z

Note that for CV estimators (ElasticNetCV, ...), when no cv parameter is given, then the random state is not passed to the cross-validation split, which may lead to non-deterministic behaviors.

betatim · 2016-08-29T11:43:40Z

Note that for CV estimators (ElasticNetCV, ...), when no cv parameter is given, then the random state is not passed to the cross-validation split, which may lead to non-deterministic behaviors.

Interesting, it seems except for LogisticRegressionCV the CV estimators are passing this test. Should I consider that a bug of the test?

jnothman · 2016-08-29T11:56:01Z

Do we have good reason to believe there are multiple estimators that should fail such a test in scikit-learn?

In terms of the specifics, equivalence of predict is going to be a very weak test. Classifiers, for instance, may have subtly different boundaries, but produce the same predictions. The assert_model_equal helper proposed at #4841 might be more robust as a test.

You might also consider whether some datasets are more likely to elicit variation than others. For example, I'd consider using a dataset with a random target so that a meaningful classifier is unlikely learnable.

jnothman · 2016-08-29T12:03:31Z

sklearn/utils/estimator_checks.py

+        y_train = np.reshape(y_train, (-1, 1))
+        y_test = np.reshape(y_test, (-1, 1))
+
+    needs_state = 'random_state' in signature(Estimator.__init__).parameters


use get_params()

You end up with something like needs_state = 'random_state' in Estimator().get_params() is that really nicer than inspecting the __init__ arguments?

there's a helper set_random_state in the estimator_checks.

jnothman · 2016-08-29T12:04:25Z

I should add: without something like assert_model_equal, decision_function remains a more powerful test for classifiers than predict

betatim · 2016-08-29T12:05:25Z

Do we have good reason to believe there are multiple estimators that should fail such a test in scikit-learn?

Given @TomDLT's comment about the *CV estimators not passing on random_state I would expect all of them to fail this test. Without a fixed CV split they should give (statistical) equivalent but not identical estimators. IMHO this test should test for identity not (statistically) equivalence.

I like the idea on the random dataset.

jnothman · 2016-08-29T12:08:26Z

The other thing that comes to mind here is that we may have false negatives
if, as is often the case, a parameter controls the use of randomness. This
is a limitation of check_estimator altogether that assumes an estimator
class is homogeneous in its behaviour regardless of parameters.

On 29 August 2016 at 22:05, Tim Head notifications@github.com wrote:

Do we have good reason to believe there are multiple estimators that
should fail such a test in scikit-learn?

Given @TomDLT https://github.com/TomDLT's comment about the *CV
estimators not passing on random_state I would expect all of them to fail
this test. Without a fixed CV split they should give (statistical)
equivalent but not identical estimators. IMHO this test should test for
identity not (statistically) equivalence.

I like the idea on the random dataset.

—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
#7270 (comment),
or mute the thread
https://github.com/notifications/unsubscribe-auth/AAEz63OZlL92r-I1xpjOEBre0HshHemgks5qkssHgaJpZM4Juyv_
.

TomDLT · 2016-08-29T12:30:09Z

The other thing that comes to mind here is that we may have false negatives
if, as is often the case, a parameter controls the use of randomness.

Starting with the different solvers that may exists in one estimator.

amueller · 2016-08-29T21:28:36Z

On 08/29/2016 06:05 AM, Tom Dupré la Tour wrote:

Note that for CV estimators (|ElasticNetCV|, ...), when no |cv|
parameter is given, then the random state is not passed to the
cross-validation split, which may lead to non-deterministic behaviors.

All the default ones use deterministic splitting.

amueller · 2016-08-29T21:34:54Z

On 08/29/2016 08:08 AM, Joel Nothman wrote:

The other thing that comes to mind here is that we may have false
negatives
if, as is often the case, a parameter controls the use of randomness. This
is a limitation of check_estimator altogether that assumes an estimator
class is homogeneous in its behaviour regardless of parameters.
yeah, we could remedy that with stronger annotations.
If an estimator has several "modes" (like solvers) we could provide
dictionaries
that enumerate them. That's sort of heavy-handed, though.
The definition would basically be "the set of parameters so that trying all
explores all paths". Though there could still be data-dependent paths
in the code that are harder to explore.

If we would "just" describe the legal space of all parameters in a more
formal way
(like the python3 type annotation) and we assume that only categorical
variables
change the semantics of a model, then we could also try all combinations.
That would probably not be a unit test, though.

betatim · 2016-08-30T07:14:25Z

IMHO having this test is useful, even if it only covers the default argument case (perfection being the enemy of good or some such).

LinearRegressionCV uses StratifiedKFold without passing on random_state. ElasticNetCV (and friends) use KFold without shuffling (-> deterministic). Does someone remember the reasoning behind this choice? Can we change either check_cv to pass through a random_state or LinearRegressionCV to use KFold (without shuffling)?

jnothman · 2016-08-30T07:24:30Z

It's pretty weird that Regression uses stratified CV...? And certainly random_state should be passed on. I'd be interested to see if checking pickle equality gives the same results though.

amueller · 2016-08-30T14:39:28Z

LinearRegressionCV uses stratified? That's a bug.
Not using shuffling by default was as choice made to show users if they have correlations in their data.

jnothman · 2019-04-16T03:49:03Z

I think several comments here are red herrings.

Note that for CV estimators (ElasticNetCV, ...), when no cv parameter is given, then the random state is not passed to the cross-validation split, which may lead to non-deterministic behaviors.

Our default cv (KFold, StratifiedKFold) are not randomised by default.

Do we want to continue on with this? Marking as Stalled.

jjerphan · 2020-11-17T18:43:33Z

Do we want to continue on with this?

@jnothman: I am interested to carry on if this is relevant.

Test determinism of estimators

6333a96

Created a test that fits two copies of each estimators and compares their predictions on unseen data.

betatim reviewed Aug 27, 2016
View reviewed changes

Use signature() from utils.fixes

1d72f98

TomDLT reviewed Aug 29, 2016
View reviewed changes

Move determinism test to common estimator checks

b49e572

jnothman reviewed Aug 29, 2016
View reviewed changes

betatim mentioned this pull request Aug 29, 2016

[WIP] New assert helpers for model comparison and fit reset checks #4841

Closed

4 tasks

This was referenced Aug 29, 2016

Implement estimator tags #6599

Closed

Ability to specify parameters for common tests #7289

Closed

Use a problem with no solution for testing

52e3e72

Use pickle to determine equality

eab87de

betatim mentioned this pull request Aug 31, 2016

check_cv should have a random_state argument #7303

Closed

amueller mentioned this pull request Nov 16, 2016

RandomizedLogisticRegression not repeatable #7895

Closed

amueller added this to the 0.19 milestone Nov 16, 2016

jnothman modified the milestones: 0.20, 0.19 Jun 13, 2017

glemaitre removed this from the 0.20 milestone Jun 13, 2018

glemaitre added this to the 0.21 milestone Jun 13, 2018

jnothman modified the milestones: 0.21, 0.20.3 Apr 16, 2019

jnothman added help wanted Stalled Enhancement labels Apr 16, 2019

github-actions bot added the module:utils label Mar 2, 2020

Base automatically changed from master to main January 22, 2021 10:49

cmarmo added Needs Decision Requires decision and removed help wanted labels Aug 15, 2022

betatim closed this by deleting the head repository Sep 2, 2022

Uh oh!

Conversation

betatim commented Aug 27, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

betatim Aug 27, 2016

Choose a reason for hiding this comment

Uh oh!

TomDLT Aug 29, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

betatim Aug 29, 2016

Choose a reason for hiding this comment

Uh oh!

TomDLT Aug 29, 2016

Choose a reason for hiding this comment

Uh oh!

amueller Aug 29, 2016

Choose a reason for hiding this comment

Uh oh!

TomDLT commented Aug 29, 2016

Uh oh!

betatim commented Aug 29, 2016

Uh oh!

jnothman commented Aug 29, 2016 • edited by TomDLT Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman Aug 29, 2016

Choose a reason for hiding this comment

Uh oh!

betatim Aug 29, 2016

Choose a reason for hiding this comment

Uh oh!

amueller Aug 29, 2016

Choose a reason for hiding this comment

Uh oh!

jnothman commented Aug 29, 2016

Uh oh!

betatim commented Aug 29, 2016

Uh oh!

jnothman commented Aug 29, 2016

Uh oh!

TomDLT commented Aug 29, 2016

Uh oh!

amueller commented Aug 29, 2016

Uh oh!

amueller commented Aug 29, 2016

Uh oh!

betatim commented Aug 30, 2016

Uh oh!

jnothman commented Aug 30, 2016

Uh oh!

amueller commented Aug 30, 2016

Uh oh!

jnothman commented Apr 16, 2019

Uh oh!

jjerphan commented Nov 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

betatim commented Aug 27, 2016 •

edited

Loading

TomDLT Aug 29, 2016 •

edited

Loading

jnothman commented Aug 29, 2016 •

edited by TomDLT

Loading