-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
ROC functions break down with low scores variance. #6688
Description
Description
The ROC functionality reduces its workload by first extracting forward differences between scores and then discarding potential thresholds that are too close together by calling utils.isclose.fixes. The problem is that this happens without notification of the user (or any chance to intervene, setting drop_intermediate argument to False does not change anything) and, moreover, without scaling the tolerance w.r.t. to the variance of the score vector.
Steps/Code to Reproduce
This is the (slightly simplified) example given in the documentation of metrics.roc_curve
import numpy as np
from sklearn import metrics
y = np.array([False, False, True, True])
original_scores = np.array([0.1, 0.4, 0.35, 0.8])
scores = original_scores
fpr, tpr, thresholds = metrics.roc_curve(y, scores)
print fpr, tpr, thresholds, metrics.roc_auc_score(y, scores)
scores = 1e-6*original_scores
fpr, tpr, thresholds = metrics.roc_curve(y, scores)
print fpr, tpr, thresholds, metrics.roc_auc_score(y, scores)
scores = 1e-7*original_scores
fpr, tpr, thresholds = metrics.roc_curve(y, scores)
print fpr, tpr, thresholds, metrics.roc_auc_score(y, scores)
scores = 1e-8*original_scores
fpr, tpr, thresholds = metrics.roc_curve(y, scores)
print fpr, tpr, thresholds, metrics.roc_auc_score(y, scores)Expected Results
All printed lines should yield the same output.
Versions
Windows-10-10.0.10586
('Python', '2.7.11 |Anaconda 2.5.0 (64-bit)| (default, Jan 29 2016, 14:26:21) [MSC v.1500 64 bit (AMD64)]')
('NumPy', '1.10.4')
('SciPy', '0.17.0')
('Scikit-Learn', '0.17')