Skip to content

[MRG+1] FIX consensus score on non-square similarity matrices#3640

Merged
amueller merged 2 commits intoscikit-learn:masterfrom
untom:fix_consensus_score
Jan 13, 2015
Merged

[MRG+1] FIX consensus score on non-square similarity matrices#3640
amueller merged 2 commits intoscikit-learn:masterfrom
untom:fix_consensus_score

Conversation

@untom
Copy link
Copy Markdown
Contributor

@untom untom commented Sep 5, 2014

This PR fixes #2445: When similarity matrices are non-square, the bicluster.consensus_score gave wrong results.

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage remained the same when pulling 308c7ef on untom:fix_consensus_score into 3c92686 on scikit-learn:master.

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Sep 6, 2014

fine, but needs a test

@untom
Copy link
Copy Markdown
Contributor Author

untom commented Sep 7, 2014

Added a test that shows the issue (fails in the current master, runs through with this patch).

@coveralls
Copy link
Copy Markdown

Coverage Status

Coverage increased (+0.0%) when pulling 171139f on untom:fix_consensus_score into 3c92686 on scikit-learn:master.

@untom untom changed the title FIX consensus score on non-square similarity matrices [MRG] FIX consensus score on non-square similarity matrices Oct 2, 2014
@untom
Copy link
Copy Markdown
Contributor Author

untom commented Nov 12, 2014

pinging potential reviewers (@arjoly @jnothman ? ). I've been bitten by the bug #2445 again today, I'd like to make sure this patch doesn't fall through the cracks.

@jnothman
Copy link
Copy Markdown
Member

This LGTM.

Aside: I am altogether not very happy with the dense data format that the bicluster module currently produces (see #2484). And do you find, @untom, that the consensus_score performs reasonably efficiently? In other work where I have used a similar metric (for coreference resolution which is a special kind of clustering), there is a lot of sparsity in the similarity matrix passed to Kuhn-Munkres (i.e. most pairs of clusters have no overlap in a standard system output), so one gets a substantial speedup by only performing that algorithm on strongly connected components of the similarity graph. Do you think a similar strategy would be beneficial here?

@jnothman jnothman changed the title [MRG] FIX consensus score on non-square similarity matrices [MRG+1] FIX consensus score on non-square similarity matrices Nov 12, 2014
@jnothman
Copy link
Copy Markdown
Member

And thanks for the ping!

@untom
Copy link
Copy Markdown
Contributor Author

untom commented Nov 12, 2014

I absolutely agree with the Issue raised in #2484. I found the picked data format odd as well, back when I started working on PR #2476. As far as performance goes, I've never seen consensus_score come up as bottleneck in my code (I've never worked with large number of biclusters though, usually < 10-20).

@jnothman
Copy link
Copy Markdown
Member

Ah of course. Thanks.

@untom
Copy link
Copy Markdown
Contributor Author

untom commented Jan 11, 2015

pinging @arjoly : could this get a MRG+2 ?

@untom
Copy link
Copy Markdown
Contributor Author

untom commented Jan 12, 2015

Or maybe @amueller is the right person to ping? (he added the issue to the 0.15.1 milestone)

@amueller
Copy link
Copy Markdown
Member

The fix looks good. Thanks. Sorry the fix was lying around for so long.

amueller added a commit that referenced this pull request Jan 13, 2015
[MRG+1] FIX consensus score on non-square similarity matrices
@amueller amueller merged commit c579244 into scikit-learn:master Jan 13, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

sklearn.metrics.consensus_score potentially gives wrong results

4 participants