new tests for mean_shift algo by rajdeepd · Pull Request #13179 · scikit-learn/scikit-learn

rajdeepd · 2019-02-17T11:31:04Z

Reference Issues/PRs

none

What does this implement/fix? Explain your changes.

Add test cases to cover un-tested portions of mean_shift.py

Any other comments?

no other comments

rajdeepd · 2019-02-22T14:02:37Z

@ogrisel can help review this

jnothman · 2019-02-23T21:21:02Z

sklearn/cluster/tests/test_mean_shift.py

+def test_mean_shift_negative_bandwidth():
+    bandwidth = -1
+    ms = MeanShift(bandwidth=bandwidth)
+    msg = \


Use parentheses to enclose expressions and split them over multiple lines rather than using \ for line continuation

@jnothman this comment is not clear will following statement work?

msg = "bandwidth needs to be greater than zero or None,"
" got -1.000000"

This will:

msg = ("bandwidth needs to be greater than zero or None," " got -1.000000")

sklearn/cluster/tests/test_mean_shift.py

jnothman · 2019-02-23T21:24:47Z

sklearn/cluster/tests/test_mean_shift.py

+
+def test_seeds():
+    ms = MeanShift(seeds=None)
+    _ = ms.fit(X).labels_


Why do you get labels_?

jnothman · 2019-02-23T21:25:44Z

sklearn/cluster/tests/test_mean_shift.py

+    assert_raise_message(ValueError, msg, ms.fit, X)
+
+
+def test_seeds():


I don't get what this is testing. Checking that parameters are maintained should usually be covered by common tests not tests for each specific estimator

jnothman · 2019-02-23T21:26:17Z

sklearn/cluster/tests/test_mean_shift.py

+    labels = ms.fit(X).labels_
+    labels_unique = np.unique(labels)
+    n_clusters_ = len(labels_unique)
+    assert_equal(n_clusters_ > n_clusters, True)


Use bare assert as with seeds above

jnothman · 2019-02-23T21:27:39Z

sklearn/cluster/tests/test_mean_shift.py

+    n_clusters_ = len(labels_unique)
+    assert_equal(n_clusters_ > n_clusters, True)
+
+    cluster_centers, labels = mean_shift(X, bandwidth=bandwidth,


Rather than repeat the code, please use pytest.mark.parameterize to test multiple settings of bandwidth

changed to use
pytest.mark.parameterize

@jnothman please review

@jnothman @ogrisel please review

jnothman

I confirm this covers untested lines.

jnothman · 2019-03-12T10:00:57Z

sklearn/cluster/tests/test_mean_shift.py

+    bandwidth = -1
+    ms = MeanShift(bandwidth=bandwidth)
+    msg = ("bandwidth needs to be greater than zero or None,"
+           "            got -1.000000")


This whitespace looks like an error in the code raising the message. Please change the code to have a single space between the comma and "got"

This is unresolved. Please fix the error message in mean_shift_.py

jnothman · 2019-03-12T10:13:38Z

sklearn/cluster/tests/test_mean_shift.py

+    (1.2, True, 3),
+    (1.2, False, 4)
+])
+def test_eval(bandwidth, cluster_all, expected):


what do you mean by calling this "eval"? Can't we just paramertrize test_mean_shift above, rather than adding a new test?

But ideally we should also test that cluster_all=False is actually effective at allowing some points to be left unclustered. Create a dataset where a point will be left with label -1 to test this properly.

@jnothman fixed as suggested

jnothman · 2019-03-31T22:58:48Z

Please merge the current master

jnothman · 2019-03-31T23:00:12Z

sklearn/cluster/tests/test_mean_shift.py

-def test_mean_shift():
+@pytest.mark.parametrize("bandwidth, cluster_all, expected, "
+                         "first_cluster_label",
+                         [(1.2, True, 3, 0), (1.2, False, 4, -1)])


Much clearer, thanks!

jnothman · 2019-03-31T23:00:57Z

sklearn/cluster/tests/test_mean_shift.py

+    bandwidth = -1
+    ms = MeanShift(bandwidth=bandwidth)
+    msg = ("bandwidth needs to be greater than zero or None,"
+           "            got -1.000000")


This is unresolved. Please fix the error message in mean_shift_.py

rajdeepd · 2019-04-02T00:36:12Z

@jnothman fixed the comments

jnothman · 2019-04-02T08:45:25Z

Thanks @rajdeepd

rajdeepd · 2019-04-05T15:34:32Z

@jnothman how do we get this pull request merged into master?

jnothman · 2019-04-06T11:38:22Z

4 days is not long to wait for a second review, @rajdeepd... hopefully one will come soon.

thomasjpfan · 2019-04-06T17:53:06Z

sklearn/cluster/tests/test_mean_shift.py

+    assert n_clusters_ == expected
+    assert labels_unique[0] == first_cluster_label

-    cluster_centers, labels = mean_shift(X, bandwidth=bandwidth)


Removing this means we are not testing the mean_shift function directly anymore.

we are testing using
ms = MeanShift(bandwidth=bandwidth, cluster_all=cluster_all)
labels = ms.fit(X).labels_

The testing of mean_shift should be independent of ms.fit. At the moment, ms.fit calls mean_shift, but we do not know how the code base will change.

@thomasjpfan do we need another test for mean_shift?

Leaving the original test here will sufficiently test mean_shift.

@thomasjpfan added test for mean_shift as well

thomasjpfan · 2019-04-06T17:55:38Z

sklearn/cluster/tests/test_mean_shift.py

+    ms = MeanShift(bandwidth=bandwidth)
+    msg = ("bandwidth needs to be greater than zero or None,"
+           " got -1.000000")
+    assert_raise_message(ValueError, msg, ms.fit, X)


We are moving to using pytest.raises:

msg = (r"bandwidth needs to be greater than zero or None," r" got -1\.000000") with pytest.raises(ValueError, match=msg): ms.fit(X)

@thomasjpfan fixed

NicolasHug

LGTM otherwise

NicolasHug · 2019-04-21T14:59:15Z

sklearn/cluster/tests/test_mean_shift.py

-    n_clusters_ = len(labels_unique)
-    assert_equal(n_clusters_, n_clusters)
+    cluster_centers, labels_mean_shift = mean_shift(X, cluster_all=cluster_all)
+    print(cluster_centers)


please remove

NicolasHug · 2019-04-21T15:01:07Z

sklearn/cluster/tests/test_mean_shift.py

    # n_neighbors is set to 1.
    bandwidth = estimate_bandwidth(X, n_samples=1, quantile=0.3)
-    assert_array_almost_equal(bandwidth, 0., decimal=5)
+    assert_equal(bandwidth, 0.)


could just be assert a == b then

updated @NicolasHug

NicolasHug · 2019-04-25T11:04:21Z

Thanks @rajdeepd

This reverts commit 67f53dc.

jnothman reviewed Feb 23, 2019

View reviewed changes

rajdeepd force-pushed the test_mean_shift branch 2 times, most recently from b018e99 to 4cf6413 Compare March 1, 2019 14:36

jnothman reviewed Mar 12, 2019

View reviewed changes

new tests for mean_shift algo

dea8840

rajdeepd force-pushed the test_mean_shift branch from 4cf6413 to dea8840 Compare March 31, 2019 16:00

jnothman reviewed Mar 31, 2019

View reviewed changes

rajdeepd force-pushed the test_mean_shift branch from bb1dd95 to f40648d Compare April 1, 2019 15:52

jnothman approved these changes Apr 2, 2019

View reviewed changes

jnothman added the Waiting for Reviewer label Apr 6, 2019

thomasjpfan reviewed Apr 6, 2019

View reviewed changes

rajdeepd force-pushed the test_mean_shift branch 2 times, most recently from 71df239 to 1b9f928 Compare April 21, 2019 08:56

NicolasHug approved these changes Apr 21, 2019

View reviewed changes

Merge remote-tracking branch 'upstream/master' into test_mean_shift

aa17ea1

rajdeepd force-pushed the test_mean_shift branch from 1b9f928 to aa17ea1 Compare April 24, 2019 18:06

NicolasHug merged commit 690464b into scikit-learn:master Apr 25, 2019

jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Apr 25, 2019

additional tests for mean_shift algo (scikit-learn#13179)

77ac3df

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

additional tests for mean_shift algo (scikit-learn#13179)

67f53dc

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "additional tests for mean_shift algo (scikit-learn#13179)"

32c640f

This reverts commit 67f53dc.

xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019

Revert "additional tests for mean_shift algo (scikit-learn#13179)"

8273af6

This reverts commit 67f53dc.

rth mentioned this pull request Jun 25, 2019

TST Fix atol in test_estimate_bandwidth_1sample #14187

Merged

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

additional tests for mean_shift algo (scikit-learn#13179)

c50a029

		assert_raise_message(ValueError, msg, ms.fit, X)


		def test_seeds():

Uh oh!

Conversation

rajdeepd commented Feb 17, 2019

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

rajdeepd commented Feb 22, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Mar 31, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rajdeepd commented Apr 2, 2019

Uh oh!

jnothman commented Apr 2, 2019

Uh oh!

rajdeepd commented Apr 5, 2019

Uh oh!

jnothman commented Apr 6, 2019

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment