OPTICS Reconsider whether there're unnecessary parameters

In OPTICS (_extract_optics), we include several parameters which do not appear in the original paper (Automatic Extraction of Clusters from Hierarchical Clustering Representations), including ``rejection_ratio``, ``significant_min`` and ratio of points in the child we check (not exposed to users). I can't understand why we need these parameters. I think we take these code from https://github.com/amyxzhang/OPTICS-Automatic-Clustering and she noted in her code : An implementation of the following algorithm, with some minor add-ons. I think as scikit-learn, we should check whether these add-ons are reasonable and necessary. E.g., 

* For ``rejection_ratio``, the original paper said that "We experimented with different ratios and in fact, any value in the range 0.7-0.8 always gives good results", so I guess it won't make too much difference?

* For ``significant_min``, I don't think it makes sense to users, since we have normalized RD at this point.

* For the magical 0.8 (ratio of points in the child we check) inside the code, I think we should remove it, or at least make it public (I won't vote +1 for it at this point).

* I think we need to allow users to pass int to ``min_maxima_ratio``, like ``min_cluster_size``. And the relationship between ``min_cluster_size`` and ``min_samples`` is still unclear.

* We're using number of points when checking whether a point needs to be moved, apparently that's wrong right? We should use RD.

ping @espg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OPTICS Reconsider whether there're unnecessary parameters #12375

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

OPTICS Reconsider whether there're unnecessary parameters #12375

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions