[WIP] ENH Multilabel confusion matrix by jnothman · Pull Request #10628 · scikit-learn/scikit-learn

jnothman · 2018-02-12T22:51:17Z

This PR considers a helper for multilabel/set-wise evaluation metrics such as precision, recall, fbeta, jaccard (#10083), fall-out, miss rate and specificity (#5516). It also incorporates suggestions from #8126 regarding efficiency of multilabel true positives calculation (but does not optimise for micro-average, perhaps unfortunately). Unlike confusion_matrix it is optimised for the multilabel case, but also handles multiclass problems like they are handled in precision_recall_fscore_support: as binarised OvR problems.

It benefits us by simplifying the precision_recall_fscore_support and future jaccard implementations greatly, and allows for further refactors between them. It also benefits us by making a clear calculation of sufficient statistics (although perhaps more statistics than necessary) from which standard metrics are a simple calculation: it makes the code less mystifying. In that sense, this is mostly a cosmetic change, but it provides users with the ability to easily generalise the P/R/F/S implementation to related metrics.

TODO:

implement multilabel_confusion_matrix and use it in precision_recall_fscore_support as an indirect form of testing
fix up edge cases that fail tests
benchmark multiclass implementation against incumbent P/R/F/S
benchmark multilabel implementation with benchmarks/bench_multilabel_metrics.py extended to consider non-micro averaging, sample_weight and perhaps other cases
directly test multilabel_confusion_matrix
document under model_evaluation.rst
document how to calculate fall-out, miss-rate, sensitivity, specificity from multilabel_confusion_matrix
refactor jaccard similarity implementation once [MRG] average parameter for jaccard_similarity_score #10083 is merged

If another contributor would like to take this on, I would welcome it. I have marked this as Easy because the code and technical knowledge involved is not hard, but it will take a bit of work, and clarity of understanding.

sklearn-lgtm · 2018-02-12T23:25:29Z

This pull request fixes 2 alerts - view on lgtm.com

fixed alerts:

2 for Potentially uninitialized local variable

Comment posted by lgtm.com

sklearn-lgtm · 2018-02-13T02:13:47Z

This pull request fixes 2 alerts - view on lgtm.com

fixed alerts:

2 for Potentially uninitialized local variable

Comment posted by lgtm.com

sklearn-lgtm · 2018-02-13T12:46:43Z

This pull request fixes 2 alerts - view on lgtm.com

fixed alerts:

2 for Potentially uninitialized local variable

Comment posted by lgtm.com

sklearn-lgtm · 2018-03-19T02:18:57Z

This pull request fixes 2 alerts when merging 542ec86 into e78263f - view on lgtm.com

fixed alerts:

2 for Potentially uninitialized local variable

Comment posted by lgtm.com

ShangwuYao · 2018-05-31T17:38:53Z

Hi @jnothman , I am continuing your work on this. But I am not familiar with the codecov thing, this check seems to be failing? Do I need to fix this?
And by benchmark, do you mean comparing the test results? How do I report this then? I don't think this will goes into the code, right?
Thanks!
----edit: I figured out the benchmark thing.

jnothman · 2018-06-01T04:51:51Z

benchmark means seeing if this is as fast or faster than the existing precision/recall implementation. codecov tells you if there are tests that run every line of new code. there should be

ShangwuYao · 2018-06-01T14:34:43Z

Thanks a lot for the help.
I think it would be better for me to check with you first before I mess up with your code...
The benchmarking results are:

Metric                                                               csc     csr   dense
precision_recall_fscore_support                                    0.007   0.003   0.007
precision_recall_fscore_support_with_multilabel_confusion_matrix   0.008   0.005   0.009

Since the new implementation of precision_recall_fscore_support is slower than the original one, should I just remove the new one? You are just using it for testing, correct?

And I think your implementation doesn't support multiclass-multioutput (it supports multilabel-indicator), I probably should raise an valueerror.

And the use of sample_weight in multilabel_confusion_matrix doesn't seem correct.

>>> y_true
array([[1, 0, 1],
       [0, 1, 0],
       [1, 1, 0]])
>>> y_pred
array([[1, 0, 0],
       [0, 1, 1],
       [0, 0, 1]])
>>> sample_weight
array([[3, 2, 1],
       [1, 2, 3],
       [2, 3, 4]])
>>> multilabel_confusion_matrix(y_true, y_pred, sample_weight=sample_weight)
array([[[-2,  0],
        [ 2,  3]],

       [[-2,  0],
        [ 3,  2]],

       [[-5,  7],
        [ 1,  0]]])

--edit: this is multiclass case, not multilabel-indicator.

jnothman · 2018-06-02T11:43:32Z

yes, it looks a lot slower, at least in some cases. can you profile and work out where it's much slower? sanple_weight should be 1d. 2d should raise an exception

jnothman added 2 commits February 13, 2018 09:03

ENH Multilabel confusion matrix

d80c6bb

Add see also references

7b97b8c

jnothman added Easy Well-defined and straightforward way to resolve Enhancement help wanted labels Feb 12, 2018

jnothman changed the title ~~ENH Multilabel confusion matrix~~ [WIP] ENH Multilabel confusion matrix Feb 12, 2018

jnothman mentioned this pull request Feb 12, 2018

[MRG] average parameter for jaccard_similarity_score #10083

Closed

Fix messy edge cases

1d30de7

jnothman added 2 commits February 13, 2018 14:27

Rm unnecessary comment

1ab72f1

Fix for old scipy

eca4dbb

jnothman mentioned this pull request Mar 6, 2018

multiclass jaccard_similarity_score should not be equal to accuracy_score #7332

Closed

FIX for old scipy

542ec86

jnothman mentioned this pull request Mar 19, 2018

[MRG] Add specificity score as a metric #10831

Closed

jnothman mentioned this pull request May 30, 2018

Adding Fall-out, Miss rate, specificity as metrics #5516

Open

ShangwuYao mentioned this pull request May 31, 2018

[MRG] FEA multilabel confusion matrix #11179

Merged

8 tasks

qinhanmin2014 closed this in #11179 Oct 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[WIP] ENH Multilabel confusion matrix#10628

[WIP] ENH Multilabel confusion matrix#10628
jnothman wants to merge 6 commits intoscikit-learn:masterfrom
jnothman:multilabel-confusion

jnothman commented Feb 12, 2018 •

edited

Loading

Uh oh!

sklearn-lgtm commented Feb 12, 2018

Uh oh!

sklearn-lgtm commented Feb 13, 2018

Uh oh!

sklearn-lgtm commented Feb 13, 2018

Uh oh!

sklearn-lgtm commented Mar 19, 2018

Uh oh!

ShangwuYao commented May 31, 2018 •

edited

Loading

Uh oh!

jnothman commented Jun 1, 2018 via email

Uh oh!

ShangwuYao commented Jun 1, 2018 •

edited

Loading

Uh oh!

jnothman commented Jun 2, 2018 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jnothman commented Feb 12, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sklearn-lgtm commented Feb 12, 2018

Uh oh!

sklearn-lgtm commented Feb 13, 2018

Uh oh!

sklearn-lgtm commented Feb 13, 2018

Uh oh!

sklearn-lgtm commented Mar 19, 2018

Uh oh!

ShangwuYao commented May 31, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jun 1, 2018 via email

Uh oh!

ShangwuYao commented Jun 1, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jnothman commented Jun 2, 2018 via email

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jnothman commented Feb 12, 2018 •

edited

Loading

ShangwuYao commented May 31, 2018 •

edited

Loading

ShangwuYao commented Jun 1, 2018 •

edited

Loading