Skip to content

[MRG] Fix missing assert and parametrize some k-means tests#12368

Merged
rth merged 5 commits intoscikit-learn:masterfrom
jeremiedbb:fix-test-k-means
Oct 13, 2018
Merged

[MRG] Fix missing assert and parametrize some k-means tests#12368
rth merged 5 commits intoscikit-learn:masterfrom
jeremiedbb:fix-test-k-means

Conversation

@jeremiedbb
Copy link
Copy Markdown
Member

Noticed a missing assert in k-means tests, meaning the test would always pass.

I took the opportunity to parametrize some of the k-means test. I did not make any changes to the tests, just avoided code redundancy. I was doing it in #11950 but it will be more reviewable if I do it here, in a separate PR.

Copy link
Copy Markdown
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very minor comment, other than that LGTM!

km = KMeans(init=centers.copy(), n_clusters=n_clusters, random_state=42,
n_init=1)
km.fit(X)
@pytest.mark.parametrize('representation', ['dense', 'sparse'])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not directly

@pytest.mark.parametrize('data', [X, X_csr])

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just for the readability when you run pytest.

With your proposition tests will appear as
test_whatever_test_name[data0]
test_whatever_test_name[data1]

Here it will appear as
test_whatever_test_name[dense]
test_whatever_test_name[sparse]

I just find it easier to track which parameters make some test fail.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's a good point

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, it's possible to provide ids as a workaround,

@pytest.mark.parametrize('data', (X, Xcsr), ids=('dense', 'sparse'))

maybe that's a bit more direct?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know that. This is better ! Thanks

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jeremiedbb this is nice. A few comments below.

km = KMeans(init=centers.copy(), n_clusters=n_clusters, random_state=42,
n_init=1)
km.fit(X)
@pytest.mark.parametrize('representation', ['dense', 'sparse'])
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, it's possible to provide ids as a workaround,

@pytest.mark.parametrize('data', (X, Xcsr), ids=('dense', 'sparse'))

maybe that's a bit more direct?


# check that models trained on sparse input also works for dense input at
# predict time
assert_array_equal(mb_k_means.predict(X), mb_k_means.labels_)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we still keep this line?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved it in a new function : test_predict_minibatch_dense_sparse.


# sanity check: predict centroid labels
pred = mb_k_means.predict(mb_k_means.cluster_centers_)
assert_array_equal(pred, np.arange(n_clusters))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we keep these 2 lines as well?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did keep them :)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in test_predict_minibatch, line 559-560

Copy link
Copy Markdown
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, true I missed those :)

Thanks @jeremiedbb and thank you for the review @NicolasHug !

@rth
Copy link
Copy Markdown
Member

rth commented Oct 13, 2018

BTW, Circle CI doesn't seem to be triggering now. https://status.circleci.com/ looks fine, so I'm not sure what happened. Anyway it should not affect this PR.

@rth rth merged commit 76b1078 into scikit-learn:master Oct 13, 2018
jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Oct 15, 2018
anuragkapale pushed a commit to anuragkapale/scikit-learn that referenced this pull request Oct 23, 2018
@jeremiedbb jeremiedbb deleted the fix-test-k-means branch October 24, 2018 11:53
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants