-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
OPTICS test_reach_dists inappropriate #12090
Description
I begin to suspect the correctness of the test. Seems that our implementation and the referenced implementation are not the same. In our implementation, when there're multiple points with the same reachability-distance, we choose the point which is closest to current point. In the referenced implementation, we choose the point with the smallest index. See the example below (taken from the referenced implementation):
X = np.array([[ 15., 70.], [ 31., 87.], [ 45., 32.], [ 32., 83.],
[ 26., 50.], [ 7., 31.], [ 43., 97.]])
###
# result from our implementation
clust = OPTICS(min_samples=5)
clust.fit(X)
clust.core_distances_
# array([38.89730068, 37.33630941, 52.63078947, 33.54101966, 33.54101966,
# 57.69748695, 49.979996 ])
clust.reachability_
# array([ inf, 33.54101966, 33.54101966, 38.89730068, 33.54101966,
# 33.54101966, 33.54101966])
clust.ordering_
# array([0, 3, 1, 6, 4, 2, 5])
###
# result from referenced implementation
RD, CD, order = optics(X, 4)
CD
# array([38.89730068, 37.33630941, 52.63078947, 33.54101966, 33.54101966,
# 57.69748695, 49.979996 ])
RD
# array([ 0. , 38.89730068, 33.54101966, 37.33630941, 33.54101966,
# 33.54101966, 33.54101966])
order
# [0, 1, 3, 4, 2, 5, 6]
We get the same CD, but different order (e.g., after choosing point 0, point 1&3&4&6 have the same RD, our implementation choose point 3 because it's closest to point 0, the referenced implementation choose point 1 because it has the smallest index), thus different RD (RD of current point is only related to previous points)
Correct me if I'm wrong @jnothman @espg @adrinjalali