[WIP] score function computing balanced accuracy#6752
[WIP] score function computing balanced accuracy#6752xyguo wants to merge 12 commits intoscikit-learn:masterfrom
Conversation
|
According to the latest comment under issue #6747, the balanced accuracy should only be conducted for binary classification problem as well as multi-label problems. Fixing the implementation. |
|
You don't necessary need to support multilabel initially. You do need to ensure this has:
|
|
@jnothman I see, by the way, I think it'd be better not to accept multilabel input, because this is essentially not an metric for multilabel problems. Maybe we should leave it to user. |
|
You may be right that it's not often reported for multilabel problems, but any metric applicable to binary problems is applicable to each label of a multilabel problem: a multilabel problem can be seen as multiple binary tasks. But as I said, we can leave multilabel support out for the moment. |
|
Now I've made a preliminary version of the metric function, with corresponding documentations, all tests passed. |
sklearn/metrics/classification.py
Outdated
| The balanced accuracy is used in binary classification problems to deal | ||
| with imbalanced datasets. It is defined as the arithmetic mean of sensitivity | ||
| (true positive rate) and specificity (true negative rate), or the average | ||
| accuracy obtained on either class. |
There was a problem hiding this comment.
Either use "recall" here or be explicit that it's "on either class's gold standard instances" or something.
|
@jnothman I have updated the doc as you commented. What's the next should I do now? (this is the first time I contribute code to an open source project >_<) Currently I'm trying to extend it to multilabel problems. As your comments mentioned, this quantity is equivalent to |
|
Yes, I think wrapping |
| conventional accuracy (i.e., the number of correct predictions divided by the total | ||
| number of predictions). In contrast, if the conventional accuracy is above chance only | ||
| because the classifier takes advantage of an imbalanced test set, then the balanced | ||
| accuracy, as appropriate, will drop to chance. |
There was a problem hiding this comment.
0.5 or 50% is clearer than 'chance'
|
For the most part, this looks great. I'm not sure if more specific tests for balanced accuracy are needed, or whether the doctests + common tests suffice. One problem with using |
|
Yes, I've noticed the problem caused by sparse input.
Maybe I'll have to reimplement it from scratch. -_- || |
|
The |
|
The May be we should also support multi-class problems, the definition of balanced accuracy generalizes to multi-class settings naturally (although it may not be so useful when number of classes exceeds two). |
|
How does it generalise to multiclass naturally? I don't think it's obvious. I don't think the need to exclude labels is important for multilabel; it is important for multiclass which is why it is supported in |
|
FYI, #5588 was an existing PR attempting this enhancement. I don't know why we didn't just continue on that one... but between these two PRs we should attempt some convergence... |
|
I have finished the support for multilabel, but several tests fails in And there are several other cases failed due to similar problems. I doubt maybe we should clarify the interface for different type of metrics ... |
|
So you need to get those tests to check for |
|
@jnothman got it, I will work on it soon |
|
@xyguo are you still working on this? |
|
@amueller Yes. I have been writing my thesis and don't have much time for this project. I plan to resume it later this month. |
|
@xyguo Are you still working on this or can I take this up? |
|
@dalmia Please take this up, I'm just too busy to work on it recently. Thanks! |
|
Thanks @xyguo |
* add function computing balanced accuracy * documentation for the balanced_accuracy_score * apply common tests to balanced_accuracy_score * constrained to binary classification problems only * add balanced_accuracy_score for CLF test * add scorer for balanced_accuracy * reorder the place of importing balanced_accuracy_score to be consistent with others * eliminate an accidentally added non-ascii character * remove balanced_accuracy_score from METRICS_WITH_LABELS * eliminate all non-ascii charaters in the doc of balanced_accuracy_score * fix doctest for nonexistent scoring function * fix documentation, clarify linkages to recall and auc * FIX: added changes as per last review See #6752, fixes #6747 * FIX: fix typo * FIX: remove flake8 errors * DOC: merge fixes * DOC: remove unwanted files * DOC update what's new
|
Closed by #8066. |
* add function computing balanced accuracy * documentation for the balanced_accuracy_score * apply common tests to balanced_accuracy_score * constrained to binary classification problems only * add balanced_accuracy_score for CLF test * add scorer for balanced_accuracy * reorder the place of importing balanced_accuracy_score to be consistent with others * eliminate an accidentally added non-ascii character * remove balanced_accuracy_score from METRICS_WITH_LABELS * eliminate all non-ascii charaters in the doc of balanced_accuracy_score * fix doctest for nonexistent scoring function * fix documentation, clarify linkages to recall and auc * FIX: added changes as per last review See scikit-learn#6752, fixes scikit-learn#6747 * FIX: fix typo * FIX: remove flake8 errors * DOC: merge fixes * DOC: remove unwanted files * DOC update what's new
* add function computing balanced accuracy * documentation for the balanced_accuracy_score * apply common tests to balanced_accuracy_score * constrained to binary classification problems only * add balanced_accuracy_score for CLF test * add scorer for balanced_accuracy * reorder the place of importing balanced_accuracy_score to be consistent with others * eliminate an accidentally added non-ascii character * remove balanced_accuracy_score from METRICS_WITH_LABELS * eliminate all non-ascii charaters in the doc of balanced_accuracy_score * fix doctest for nonexistent scoring function * fix documentation, clarify linkages to recall and auc * FIX: added changes as per last review See scikit-learn#6752, fixes scikit-learn#6747 * FIX: fix typo * FIX: remove flake8 errors * DOC: merge fixes * DOC: remove unwanted files * DOC update what's new
Reference Issue
This PR comes to address issue #6747, which suggests to implement an score function calculating the balanced accuracy.
What does this implement/fix? Explain your changes.
The balanced accuracy is actually an unweighted average of recall scores for each class. And the functionality is already provided by the
sklearn.metrics.recall_score-- just pass the argumentaverage='macro'(andpos_label=Nonefor version before 0.18).So the
balanced_accuracy_scorein this PR is a simple wrapper of therecall_score.Any other comments?
I'm not sure if there should be an test case for this function since the corresponding scenario already tested for
recall_score.