Skip to content

[MRG] Ensure that ROC curve starts at (0, 0)#10093

Merged
jnothman merged 4 commits intoscikit-learn:masterfrom
qinhanmin2014:roc_curve
Nov 10, 2017
Merged

[MRG] Ensure that ROC curve starts at (0, 0)#10093
jnothman merged 4 commits intoscikit-learn:masterfrom
qinhanmin2014:roc_curve

Conversation

@qinhanmin2014
Copy link
Copy Markdown
Member

Reference Issues/PRs

Fixes #9790
See also #9850

What does this implement/fix? Explain your changes.

Currently, when the first point of ROC curve is on y-axis, we don't add a point (0, 0), which is not consistent with doc & some papers & some R packages.
Reference:
(1)scikit-learn doc
thresholds : array, shape = [n_thresholds]
Decreasing thresholds on the decision function used to compute fpr and tpr. thresholds[0] represents no instances being predicted and is arbitrarily set to max(y_score) + 1.
(2)@jnothman's comment
our concern should be that in all cases, the curve includes a point that represents predicting nothing in the positive class, and that every further point represents predicting more than nothing, for every threshold at which this changes the fpr or tpr, until all are predicted.
(3)An introduction to ROC analysis cite >7000 link
(4)R package ROCR

library(ROCR)
pred <- prediction(c(0.1, 0.4, 0.35, 0.8), c(0, 0, 1, 1))
perf <- performance(pred,"tpr","fpr")
plot(perf)

Any other comments?

@jnothman
Copy link
Copy Markdown
Member

jnothman commented Nov 9, 2017

Looks good at a glance. Please add to what's new. Can this change affect the auc? If so, document carefully.

Also, perhaps add that reference on roc analysis to the docs

@qinhanmin2014
Copy link
Copy Markdown
Member Author

@jnothman Thanks a lot for the instant review :)
I have updated the doc and what's new accordingly. Since the fix is only adding vertical line (overlapping with y-axis) at the beginning of the curve, I believe it will not influence roc_auc_score.

@massich
Copy link
Copy Markdown
Contributor

massich commented Nov 9, 2017

LGTM

non-integer sample weights. :issue:`9786` by :user:`Hanmin Qin <qinhanmin2014>`.

- Fixed a bug where :func:`metrics.roc_curve` sometimes starts on y-axis instead
of (0, 0), which is inconsistent with the document and other implementations.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this does not affect auc

@qinhanmin2014
Copy link
Copy Markdown
Member Author

@jnothman Thanks. Comment addressed.

@jnothman
Copy link
Copy Markdown
Member

I don't think this is controversial: I'll take Joan's +1 on this... Let's merge. Thanks!

@jnothman jnothman merged commit 3e85359 into scikit-learn:master Nov 10, 2017
@qinhanmin2014 qinhanmin2014 deleted the roc_curve branch November 14, 2017 03:48
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

roc_curve doesn't always arbitrarily set thresholds[0] to max(y_score) + 1

3 participants