Skip to content

[MRG+1] OPTICS correctly handle multiple infs in reachability array.#12029

Merged
qinhanmin2014 merged 5 commits intoscikit-learn:masterfrom
adrinjalali:optics/infs
Oct 7, 2018
Merged

[MRG+1] OPTICS correctly handle multiple infs in reachability array.#12029
qinhanmin2014 merged 5 commits intoscikit-learn:masterfrom
adrinjalali:optics/infs

Conversation

@adrinjalali
Copy link
Copy Markdown
Member

See #11857

The extraction part of the code assumes only the first reachability_ to be inf, but the array may include multiple infs. This PR fixes the issue, and fixes a test which used to result in all inf values. It also adds a ValueError in case all reachability values are inf. And also tests for that ValueError.

This is yet another PR to extract parts of #11857.

Also see #11857 (review)

Ping @jnothman

@jnothman jnothman added this to the 0.20 milestone Sep 6, 2018
@jnothman
Copy link
Copy Markdown
Member

jnothman commented Sep 7, 2018

Please credit yourself for the OPTICS feature in what's new v0.20

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Sep 7, 2018

I'm not sure why the python 2 ci is suddenly failing

@adrinjalali adrinjalali changed the title OPTICS correctly handle multiple infs in reachability array. [MRG+1] OPTICS correctly handle multiple infs in reachability array. Sep 7, 2018
@adrinjalali
Copy link
Copy Markdown
Member Author

This seems ready to me. A second review is appreciated.

@jnothman jnothman modified the milestones: 0.20, 0.21 Sep 17, 2018

# Extraction wrapper
reachability = reachability / np.max(reachability[1:])
# according to the paper (p. 5), for a small enough generative distance
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which paper? Maybe at least provide the author and the time.


if ((avg_reach1 / reachability_plot[s]) > maxima_ratio or
(avg_reach2 / reachability_plot[s]) > maxima_ratio):
if ((avg_reach1 / maxima_ratio) > reachability_plot[s] or
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Not related) I'm wondering whether we need both maxima_ratio and rejection_ratio. Too many parameters make our OPTICS somehow unfriendly at least to me.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any insights from your side? @adrinjalali

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand it correctly, setting the two parameters the same, would result in each split point to accept only one of the children, since the next condition is exactly the opposite. But I agree this method has quite a few parameters, which can be confusing specially if the user hasn't read both the papers for OPTICS and this method (which we're calling SQLNK I guess).

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I agree this method has quite a few parameters, which can be confusing specially if the user hasn't read both the papers for OPTICS and this method (which we're calling SQLNK I guess).

You find these two parameters in a paper? Which one? I only find one.

if ((average reachability value in any node in NL) / s.r_dist > 0.75)
    // if split point s is not significant, ignore s and continue
    cluster_tree(N, parent_of_N, L); //

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from these two parameters, some other parameters also seems strange, e.g., significant_min (we're normalizing RD so I don't think it makes sense to users) and the magical check_ratio inside the implementation. I notice that amyxzhang noted that this is "An implementation of the following algorithm, with some minor add-ons". I think we need to check these add-ons carefully. As I missing something @adrinjalali, otherwise I'm going to open an issue. Thanks in advance.

Copy link
Copy Markdown
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @adrinjalali

@qinhanmin2014 qinhanmin2014 merged commit 2020867 into scikit-learn:master Oct 7, 2018
@adrinjalali adrinjalali deleted the optics/infs branch October 7, 2018 09:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants