FIX Adjusts xi in test_extract_xi by thomasjpfan · Pull Request #14201 · scikit-learn/scikit-learn

thomasjpfan · 2019-06-27T01:33:49Z

Reference Issues/PRs

Fixes #13739

What does this implement/fix? Explain your changes.

Adjusts xi in test_extract_xi

Any other comments?

This test pass on i686 on scikit-learn-wheels configured to point to this branch.

Here is the diff of scikit-learn-wheels used to run the test.

cc @qinhanmin2014 @adrinjalali

rth · 2019-06-27T07:41:38Z

sklearn/cluster/optics_.py

            The instance.
        """
-        X = check_array(X, dtype=np.float)
+        X = check_array(X, dtype=np.float32)


Hmm, if something fails in 64bit due to numerical issues, I don't understand why it would not fail in 32bit. What's the advantage of doing this? I would say keeping f64 is always good as far as numerical precision is concerned.

This doesn't sound like a good idea to me. Most estimators accept a float64, if not enforce it. Which means usually if the user wants the data not to be copied, they can feed a float64 data. This change forces a copy on all those data, other than loosing precision.

Most CI jobs pass and we're unable to find any related bugs, so I guess it's not worthwhile to drop float64 support in order to make CIs green. Maybe increase the tolerance or construct another dataset?

adrinjalali · 2019-06-27T08:15:58Z

Maybe if this is fixing the issue, we should better generate the data for the tests?

This reverts commit dfba2c4.

rth · 2019-06-27T14:29:30Z

sklearn/cluster/tests/test_optics.py

    X, expected_labels = shuffle(X, expected_labels, random_state=rng)

+    if _IS_32BIT:
+        X = X.astype(np.float32)


float64 should still work on 32bit arch. If it fails it may mean that something internally in OPTICS creates arrays of dtype np.int (or np.intp) instead of X.dtype...

Time for a deep dive into OPTICS. I hope I have better vision when I surface.

thomasjpfan · 2019-06-27T15:49:59Z

Here is where the error is coming from:

X = np.vstack((C1, C2, C3, C4, C5, np.array([[100, 100]] * 2), C6))
expected_labels = np.r_[[1] * 5, [3] * 5, [2] * 5, [0] * 5, [2] * 5,
                        -1, -1, [4] * 5]
X, expected_labels = shuffle(X, expected_labels, random_state=rng)

clust = OPTICS(min_samples=3, min_cluster_size=3,
               max_eps=20, cluster_method='xi',
               xi=0.1).fit(X)
# this may fail if the predecessor correction is not at work!
assert_array_equal(clust.labels_, expected_labels)

which is failing on:

E       AssertionError: 
E       Arrays are not equal
E       
E       Mismatch: 18.8%
E       Max absolute difference: 3
E       Max relative difference: nan
E        x: array([ 0,  0, -1, -1,  1,  3,  3,  2,  0,  3,  3, -1,  1,  1, -1,  2, -1,
E               4,  0, -1,  4,  0,  4,  2, -1,  1,  1,  4,  2,  3,  4, -1])
E        y: array([ 0,  0,  2,  2,  1,  3,  3,  2,  0,  3,  3,  2,  1,  1,  2,  2, -1,
E               4,  0,  2,  4,  0,  4,  2, -1,  1,  1,  4,  2,  3,  4,  2])

Some of the clusters in 2 are marked with -1.

thomasjpfan · 2019-06-27T18:38:03Z

In the following line:

scikit-learn/sklearn/cluster/optics_.py

Line 476 in f339609

point = index[np.argmin(reachability_[index])]

the np.argmin(reachability_[index]) will start to diverge between 32 bit and 64 bit. At that point, in 32 bit, np.argmin(reachability_[index]) is 15, and in 64 bit it is 13.

In 64 bit:

reachability_[index][13] = 0.6762013074479917
reachability_[index][15] = 0.6762013074479917

and in 32 bit:

reachability_[index][13] = 0.6762013074479917
reachability_[index][15] = 0.6762013074479916

This is why in 32bit, 15 is chosen to be the argmin.

Edit:

This happened because the core_distances_ between 32 and 64 bit is slightly off for point_index 9.
This means _compute_core_distances_ is slightly different between 32 and 64 bit.

adrinjalali · 2019-06-28T09:09:59Z

@thomasjpfan that's probably because it's the nearest neighbor algorithm returning different values based on the dtype/architecture.

I had seen this issue, and kinda solved it with changing the dataset. It doesn't look like an optics issue to me, i.e. this optics test is surfacing an issue in the neighbors algorithm, I think.

thomasjpfan · 2019-07-01T03:47:25Z

This PR was reduced to only updating the xi parameter. This seems to work: https://travis-ci.org/thomasjpfan/scikit-learn-wheels/builds/552555734

jnothman

This is quite a comfortable fix!!

jnothman

This is quite a comfortable fix!!

rth

Thanks for investigating!

qinhanmin2014 · 2019-07-01T09:40:52Z

LGTM, thanks.

TST Uses float32

dfba2c4

rth reviewed Jun 27, 2019

View reviewed changes

thomasjpfan added 3 commits June 27, 2019 10:22

Revert "TST Uses float32"

97b39eb

This reverts commit dfba2c4.

Merge remote-tracking branch 'upstream/master' into fix_extract_xi

fb6c2cb

TST Uses float32 when 32bit

e14af46

rth reviewed Jun 27, 2019

View reviewed changes

REV Remove float32

6b42e13

thomasjpfan changed the title ~~[MRG] Forces float32 in optics~~ [WIP] Forces float32 in optics Jun 28, 2019

thomasjpfan added 2 commits June 30, 2019 21:28

TST Updates xi to 0.3

f6999af

Merge remote-tracking branch 'upstream/master' into fix_extract_xi

514eaa4

jnothman added this to the 0.21.3 milestone Jul 1, 2019

thomasjpfan changed the title ~~[WIP] Forces float32 in optics~~ [MRG] Adjusts xi in test_extract_xi Jul 1, 2019

jnothman reviewed Jul 1, 2019

View reviewed changes

jnothman approved these changes Jul 1, 2019

View reviewed changes

rth approved these changes Jul 1, 2019

View reviewed changes

rth changed the title ~~[MRG] Adjusts xi in test_extract_xi~~ FIX Adjusts xi in test_extract_xi Jul 1, 2019

rth merged commit bd2cc10 into scikit-learn:master Jul 1, 2019

koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019

FIX Adjusts xi in test_extract_xi (scikit-learn#14201)

bbc2a3b

jnothman pushed a commit to jnothman/scikit-learn that referenced this pull request Jul 24, 2019

FIX Adjusts xi in test_extract_xi (scikit-learn#14201)

87a73f3

Uh oh!

Conversation

thomasjpfan commented Jun 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

rth Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

adrinjalali Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Jun 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

adrinjalali commented Jun 27, 2019

Uh oh!

rth Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Jun 27, 2019

Choose a reason for hiding this comment

Uh oh!

thomasjpfan commented Jun 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

thomasjpfan commented Jun 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adrinjalali commented Jun 28, 2019

Uh oh!

thomasjpfan commented Jul 1, 2019

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

rth left a comment

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Jul 1, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

thomasjpfan commented Jun 27, 2019 •

edited

Loading

qinhanmin2014 Jun 27, 2019 •

edited

Loading

thomasjpfan commented Jun 27, 2019 •

edited

Loading

thomasjpfan commented Jun 27, 2019 •

edited

Loading