-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
OPTICS Reconsider whether there're unnecessary parameters #12375
Description
In OPTICS (_extract_optics), we include several parameters which do not appear in the original paper (Automatic Extraction of Clusters from Hierarchical Clustering Representations), including rejection_ratio, significant_min and ratio of points in the child we check (not exposed to users). I can't understand why we need these parameters. I think we take these code from https://github.com/amyxzhang/OPTICS-Automatic-Clustering and she noted in her code : An implementation of the following algorithm, with some minor add-ons. I think as scikit-learn, we should check whether these add-ons are reasonable and necessary. E.g.,
-
For
rejection_ratio, the original paper said that "We experimented with different ratios and in fact, any value in the range 0.7-0.8 always gives good results", so I guess it won't make too much difference? -
For
significant_min, I don't think it makes sense to users, since we have normalized RD at this point. -
For the magical 0.8 (ratio of points in the child we check) inside the code, I think we should remove it, or at least make it public (I won't vote +1 for it at this point).
-
I think we need to allow users to pass int to
min_maxima_ratio, likemin_cluster_size. And the relationship betweenmin_cluster_sizeandmin_samplesis still unclear. -
We're using number of points when checking whether a point needs to be moved, apparently that's wrong right? We should use RD.
ping @espg