TST test_k_means_fit_predict: do not test fit determinism together with predict/labels_ equality by jnothman · Pull Request #13751 · scikit-learn/scikit-learn

jnothman · 2019-04-30T08:39:19Z

I think this might fix #12644.

The comments there suggest the failure is due to fit and fit_predict learning permuted clusters. I don't think the intention of this test is to check the determinism / idempotence of fit/fit_predict, even if that would also be a good thing to ensure and to test.

The intention of the test here is:

    # check that fit.predict gives same result as fit_predict
    # There's a very small chance of failure with elkan on unstructured dataset
    # because predict method uses fast euclidean distances computation which
    # may cause small numerical instabilities.

therefore this should not call fit twice, but rather check that calling predict after fit_predict gives the same labels. This is what we now test in the present PR.

qinhanmin2014 · 2019-04-30T08:44:27Z

@jnothman I've reopened #12648, that PR is consistent with 0.20.X

qinhanmin2014 · 2019-04-30T09:09:42Z

There's a comment at the beginning of the test: # check that fit.predict gives same result as fit_predict.
Perhaps it's acceptable to merge #12648 since that PR is already approved and merged into 0.20.X?

jnothman · 2019-04-30T09:27:14Z

It was approved as a short term fix, and no other fix was merged on the basis that the problem seemed to have gone away

jnothman · 2019-04-30T09:49:09Z

I'll remove the reference to fit from the comment, but the point we to check the iterative updating when fitting matched assignment by pairwise_distances_argmin

jeremiedbb · 2019-04-30T11:05:00Z

I don't think the intention of this test is to check the determinism / idempotence of fit/fit_predict

Actually it was my intention :)
There's already a test which correspond to what your changes test: test_predict

jnothman · 2019-04-30T11:18:56Z

Oh!
Hmm. If the point is to test the consistency of multiple fits, why does comparing fit(X).predict(X) to fit_predict(X) come into it??? I agree there is redundancy between those two tests, but I think test_k_means_fit_predict should not be doing what it is, nor named as it is, if its intention is to check the consistency of multiple calls to fit.

jeremiedbb · 2019-04-30T11:24:06Z

My intention was just to test that calling fit(X).predict(X) gives the same result as calling fit_predict(X). There are very similar tests for other estimators, like test_bayesian_mixture_fit_predict for instance.

jnothman · 2019-04-30T11:29:47Z

My intention was just to test that calling fit(X).predict(X) gives the same result as calling fit_predict(X).

But under the assumption that fit_predict(X) and fit(X).labels_ return the same thing (which they do, even if it's not obviously tested), then test_predict tests exactly that... given that the fit is consistent across calls.

Can we rename the failing test to be test_fit_idempotence or something? And then skip it?? :|

jeremiedbb · 2019-04-30T11:40:43Z

given that the fit is consistent across calls.

I think it is. If I recalled correctly the failures appear when using elkan algorithm because the predict method always uses lloyd algo and even if both algo should give the same results, there might be some numerical instabilities.

I'm ok to rename and skip the test

qinhanmin2014 · 2019-04-30T13:46:01Z

@jnothman maybe close this one and merge #12648?

TST do not test fit determinism together with predict/labels_ equality

f325689

jnothman mentioned this pull request Apr 30, 2019

FIX Optics paper typo #13750

Merged

jnothman mentioned this pull request Apr 30, 2019

TST Ignore Kmeans test failures on MacOS #12648

Merged

Fix comment

b2a6347

jnothman closed this Apr 30, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TST test_k_means_fit_predict: do not test fit determinism together with predict/labels_ equality#13751

TST test_k_means_fit_predict: do not test fit determinism together with predict/labels_ equality#13751
jnothman wants to merge 2 commits intoscikit-learn:masterfrom
jnothman:fit_predict

jnothman commented Apr 30, 2019

Uh oh!

qinhanmin2014 commented Apr 30, 2019

Uh oh!

qinhanmin2014 commented Apr 30, 2019

Uh oh!

jnothman commented Apr 30, 2019

Uh oh!

jnothman commented Apr 30, 2019

Uh oh!

jeremiedbb commented Apr 30, 2019

Uh oh!

jnothman commented Apr 30, 2019

Uh oh!

jeremiedbb commented Apr 30, 2019

Uh oh!

jnothman commented Apr 30, 2019

Uh oh!

jeremiedbb commented Apr 30, 2019

Uh oh!

qinhanmin2014 commented Apr 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jnothman commented Apr 30, 2019

Uh oh!

qinhanmin2014 commented Apr 30, 2019

Uh oh!

qinhanmin2014 commented Apr 30, 2019

Uh oh!

jnothman commented Apr 30, 2019

Uh oh!

jnothman commented Apr 30, 2019

Uh oh!

jeremiedbb commented Apr 30, 2019

Uh oh!

jnothman commented Apr 30, 2019

Uh oh!

jeremiedbb commented Apr 30, 2019

Uh oh!

jnothman commented Apr 30, 2019

Uh oh!

jeremiedbb commented Apr 30, 2019

Uh oh!

qinhanmin2014 commented Apr 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants