Skip to content

OPTICS test_reach_dists inappropriate #12090

@qinhanmin2014

Description

@qinhanmin2014

I begin to suspect the correctness of the test. Seems that our implementation and the referenced implementation are not the same. In our implementation, when there're multiple points with the same reachability-distance, we choose the point which is closest to current point. In the referenced implementation, we choose the point with the smallest index. See the example below (taken from the referenced implementation):

X = np.array([[ 15.,  70.], [ 31.,  87.], [ 45.,  32.], [ 32.,  83.],
              [ 26.,  50.], [  7.,  31.], [ 43.,  97.]])
###
# result from our implementation
clust = OPTICS(min_samples=5)
clust.fit(X)
clust.core_distances_
# array([38.89730068, 37.33630941, 52.63078947, 33.54101966, 33.54101966,
#        57.69748695, 49.979996  ])
clust.reachability_
# array([        inf, 33.54101966, 33.54101966, 38.89730068, 33.54101966,
#        33.54101966, 33.54101966])
clust.ordering_
# array([0, 3, 1, 6, 4, 2, 5])
###
# result from referenced implementation
RD, CD, order = optics(X, 4)
CD
# array([38.89730068, 37.33630941, 52.63078947, 33.54101966, 33.54101966,
#        57.69748695, 49.979996  ])
RD
# array([ 0.        , 38.89730068, 33.54101966, 37.33630941, 33.54101966,
#        33.54101966, 33.54101966])
order
# [0, 1, 3, 4, 2, 5, 6]

We get the same CD, but different order (e.g., after choosing point 0, point 1&3&4&6 have the same RD, our implementation choose point 3 because it's closest to point 0, the referenced implementation choose point 1 because it has the smallest index), thus different RD (RD of current point is only related to previous points)
Correct me if I'm wrong @jnothman @espg @adrinjalali

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions