[MRG+1] Completely support binary y_true in roc_auc_score#9828
[MRG+1] Completely support binary y_true in roc_auc_score#9828TomDLT merged 7 commits intoscikit-learn:masterfrom qinhanmin2014:my-feature-3
Conversation
sklearn/metrics/tests/test_common.py
Outdated
| @@ -595,7 +595,8 @@ def test_invariance_string_vs_numbers_labels(): | |||
|
|
|||
| for name, metric in THRESHOLDED_METRICS.items(): | |||
| if name in ("log_loss", "hinge_loss", "unnormalized_log_loss", | |||
There was a problem hiding this comment.
What are we excluding by this? what happens when we remove the condition?
|
@jnothman Thanks for the review.
Test will fail because some metrics still only support {0, 1} y_true or {-1, 1} y_true. After this PR, there are still some THRESHOLDED_METRICS which are excluded by the if statement: # average_precision_score
average_precision_score
macro_average_precision_score
micro_average_precision_score
samples_average_precision_score
weighted_average_precision_score
# Multilabel ranking metrics
coverage_error
label_ranking_average_precision_score
label_ranking_loss |
|
I can't check now, but is that set complementary in some way, like
supporting pos_label? I'd rather the condition be a blacklist (suggesting
"not yet implemented" or "not applicable") than a whitelist, which would
seem to defy the purpose of common tests
…On 25 Sep 2017 8:46 pm, "Hanmin Qin" ***@***.***> wrote:
@jnothman <https://github.com/jnothman> Thanks for the review.
What are we excluding by this? what happens when we remove the condition?
Test will fail because some metrics still only support {0, 1} y_true or
{-1, 1} y_true.
After this PR, there are still some THRESHOLDED_METRICS which are excluded
by the if statement:
# average_precision_score
average_precision_score
macro_average_precision_score
micro_average_precision_score
samples_average_precision_score
weighted_average_precision_score
# Multilabel ranking metrics
coverage_error
label_ranking_average_precision_score
label_ranking_loss
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9828 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz67EVda_79DPjlQ-wqohfvDikV5gYks5sl4R5gaJpZM4PiYJ_>
.
|
|
would it be a good idea to merge the PRs so we can see the state of affairs
more completely?
should we have a regression test to show that cross_val_score works
regardless of which label is positive?
And I'm not sure I get why average precision would be undefined-binary, but
I'm still not in a position to look at the code.
|
|
@jnothman Thanks for your instant reply.
From my perspective, #9786 actually solves another problem (improve the stability of roc_auc_score) and is almost finished. There's hardly any direct relationship between the two PRs. So it might be better not to combine #9786 and this PR unless you insist.
Sorry but I don't quite understand the necessity of such test. If roc_auc_score works appropriately, then seems that the scorer based on roc_auc_score as well as cross_val_score should work appropriately? I can't find similar test currently. If you can point out something similar to me, I will be able to further understand the problem.
According to the doc and a glance at the source code, seems that we should also move average_precision_score out of METRIC_UNDEFINED_BINARY. I'll take care of it after #9786. I have wrapped up our discussions about the common test along with some opinions from myself in #9829. |
|
Sounds good |
@jnothman Now, we can get rid of the awkward list. Is it OK for you? Thanks. |
|
That looks much better! |
|
@jnothman Could you please give me some suggestions on how to make lgtm run? Thanks :) |
|
This is indeed much better than #9567, #6874, #2616 The basic use case seems to work: import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import roc_auc_score
np.random.seed(0)
n_samples, n_features = 100, 10
est = LogisticRegression()
X = np.random.randn(n_samples, n_features)
y = np.random.randint(2, size=n_samples)
classes = np.array(['good', 'not-good'])
for y_true in (classes[y], classes[1 - y]):
est.fit(X, y_true)
y_score = est.decision_function(X)
print(roc_auc_score(y_true, y_score))
# 0.678090575275
# 0.678090575275LGTM |
Reference Issue
Fixes #2723, proposed by @jnothman
Also see the discussions in #9805, #9567, #6874, #6873, #2616
What does this implement/fix? Explain your changes.
Currently, roc_auc_score only support either {0, 1} binary y_true or {-1, 1} binary y_true.
The PR completely support binary y_true. The basic thought is that for binary y_true, y_score is supposed to be the score of the class with greater label.
Use common test as the regression test.
Any other comments?
cc @jnothman