FIX Infer pos_label automatically in plot_roc_curve#15316
FIX Infer pos_label automatically in plot_roc_curve#15316qinhanmin2014 wants to merge 16 commits intoscikit-learn:masterfrom
Conversation
The reason we lack the parameter in roc_auc_score is that the score is symmetric with respect to the choice of the positive label. |
Sorry but I can't understand this part, should be |
Well we can keep pos_label parameter and define pos_label=None as the greater label (though this will be inconsistent with roc_auc_curve), but I can't understand why the choice of pos_label is significant here, the roc_auc_score of a model can be less than 0.5? |
|
Yes, you're right... now trying to remember which similar lemma is the
reason we don't need pos_label in roc_auc_score. I know I've justified it
to myself before!
But you can certainly choose either class as the positive one, and give
the corresponding ranking and get the same score: roc_auc_score(y_true ==
1, score) == roc_auc_score(y_true == 0, -score)
|
Yes the decision was (mainly) made by you (#2723 (comment), #2723 (comment)) and the PR was submitted by me. Honestly I'm still unable to understand the reason. (Please kindly forgive me.) Perhaps it's better to drop pos_label parameter here, because in roc_auc_score, we regard the class with greater label as the positive class. If we want to support pos_lable here, perhaps we need to support it in roc_auc_score at the same time. |
|
Firstly can we agree that plot_roc_curve is like roc_curve, *not* like
roc_auc_score? So support for pos_label should reflect roc_curve.
But yes, while I don't think it makes sense for the user to configure the
pos_label for roc_auc_score where the scores are coming from a scikit-learn
binary estimator, I suppose there's a little contradiction to then support
it in roc_curve... Tired, can't think further about it.
|
If we keep consistent with roc_curve (we do so currently), seems that we won't be able to solve Andy's issue, because for roc_curve,
Sorry I can'r understand this part. average_precision_score also comes from binary estimator but we support pos_label there? |
|
ping @thomasjpfan ready for a review |
|
Sorry for late reaction but I have a |
ogrisel
left a comment
There was a problem hiding this comment.
@qinhanmin2014 I pushed some improvements. Otherwise LGTM.
|
Based on the discussion in #15405 (comment), maybe we should actually remove |
|
See #15405 (comment) Unless we want to change the behavior to also slice predict_proba, then we should remove it. |
Fixes #15303
I think we should drop parameter
pos_labelinplot_roc_curve, because we do not have parameterpos_labelinroc_auc_score. Inroc_auc_score, for binary y_true, y_score is supposed to be the score of the class with greater label.