sklearn.metrics.consensus_score potentially gives wrong results

Hi!

`sklearn.metrics.consensus_score()` gives wrong scores if the two results to be compared contain different numbers of biclusters. This is because the function contains as its final line:

```
return np.trace(matrix[:, indices[:, 1]]) / max(n_a, n_b)
```

which uses `np.trace` under the assumption that `matrix` (the similarity matrix) is square, and thus contains the most similar items in its diagonal. 

However, when `matrix` is non-square (i.e., `n_b != n_a` in the code), this fails. I have an example dataset that shows such a case, deposited under: https://www.dropbox.com/sh/plmsqof84xhtxry/7lIrdvX0mp . Just use:

```
import sklearn.metrics
a_rows = np.loadtxt("/home/tom/a_rows.txt")
a_cols = np.loadtxt("/home/tom/a_cols.txt")
b_rows = np.loadtxt("/home/tom/b_rows.txt")
b_cols = np.loadtxt("/home/tom/b_cols.txt")
print sklearn.metrics.consensus_score((a_rows, a_cols), (b_rows, b_cols))
```

This gives a consensus-score of ~0.328, however the real score should be ~0.529

The bug can be fixed by exchanging the last line of the function to:

```
return matrix[indices[:, 0], indices[:, 1]].sum() / max(n_a, n_b)
```

(I can send a pull request if necessary, however since it's just a single-line fix I'm not sure it's worth it)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sklearn.metrics.consensus_score potentially gives wrong results #2445

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

sklearn.metrics.consensus_score potentially gives wrong results #2445

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions