[MRG+1] Add scorer based on brier_score_loss#9521
[MRG+1] Add scorer based on brier_score_loss#9521jnothman merged 11 commits intoscikit-learn:masterfrom qinhanmin2014:my-feature-1
Conversation
sklearn/metrics/scorer.py
Outdated
| log_loss_scorer = make_scorer(log_loss, greater_is_better=False, | ||
| needs_proba=True) | ||
| log_loss_scorer._deprecation_msg = deprecation_msg | ||
| brier_score_loss_scorer = make_scorer(brier_score_loss) |
There was a problem hiding this comment.
Hey @qinhanmin2014, I think that the greater_is_better parameter in make_scorer should be set to False because brier_score_loss is a loss function. It looks like it is True by default (see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L400)
|
@jnothman Sorry to disturb but I think currently we can't implement scorer based on brier_score_loss because like other similar metrics(e.g., log_loss with a scorer log_loss_scorer), it depends on predicted probabilities but it simply can't support the result returned by predict_proba(shape= (n_samples, n_classes)). |
|
you just need to pass just the second column, maybe using make_scorer(lambda y_true, y_pred: brier_score_loss(y_true, y_pred[:, 1]),
needs_proba=True, greater_is_better=False)(untested) |
|
@amueller Thanks a lot. It works! I also add a comment to address the reason(maybe limitation) of current implementation. |
sklearn/metrics/scorer.py
Outdated
| log_loss_scorer = make_scorer(log_loss, greater_is_better=False, | ||
| needs_proba=True) | ||
| log_loss_scorer._deprecation_msg = deprecation_msg | ||
| # Currently brier_score_loss don't support the shape of result |
There was a problem hiding this comment.
But surely we need the same for roc_auc_score (although that can fall back on decision_function when available). What's going on in that case?
There was a problem hiding this comment.
@jnothman Thanks. I checked the document. roc_auc_score support shape = [n_samples] or [n_samples, n_classes]. (note that roc_auc_score belongs to _ThresholdScorer, and log_loss_score belongs to _ProbaScorer)
There was a problem hiding this comment.
If brier_score is the odd one in not supporting the standard predict_proba shape, we might want to fix it there instead?
|
hmm it's not altogether the fault of the metric function. I think something
like what you have here is okay, except that it needs some validating, to
make sure the input is of the right shape...
I'm struggling to see where roc selects one column from the input. Can you
trace what's going on there?
…On 14 Aug 2017 1:50 am, "Andreas Mueller" ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In sklearn/metrics/scorer.py
<#9521 (comment)>
:
> @@ -514,6 +514,13 @@ def make_scorer(score_func, greater_is_better=True, needs_proba=False,
log_loss_scorer = make_scorer(log_loss, greater_is_better=False,
needs_proba=True)
log_loss_scorer._deprecation_msg = deprecation_msg
+# Currently brier_score_loss don't support the shape of result
If brier_score is the odd one in not supporting the standard
predict_proba shape, we might want to fix it there instead? I'm not
entirely certain, though...
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9521 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz6yEe34tFk7qI4_4pK2W1f4z9PAfEks5sXxtYgaJpZM4O0JGq>
.
|
|
@jnothman Thanks. |
|
@jnothman Here's my observation. for c in range(n_classes):
y_true_c = y_true.take([c], axis=not_average_axis).ravel()
y_score_c = y_score.take([c], axis=not_average_axis).ravel()
score[c] = binary_metric(y_true_c, y_score_c,
sample_weight=score_weight)Note that in the loop, the shape of y_true_c and y_score_c has become [n_samples]. So that's why roc_auc_score is based on roc_curve and roc_curve only supports shape = [n_samples]. |
|
But if someone does |
|
yes, the binary case is what I was wondering about. How is that working? Is
it working?
…On 15 Aug 2017 3:21 am, "Andreas Mueller" ***@***.***> wrote:
But if someone does cross_val_score(DecisionTreeClassifier(), X, y,
scoring="roc_auc") that works with binary even though predict_proba is (n_samples,
2) and y is not multi-label.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#9521 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEz65fxrCKhiu79o6r2ZXZhEdQCd8-Kks5sYIIzgaJpZM4O0JGq>
.
|
|
@jnothman @amueller (kindly regret me for a long comment and possible typo:)) y_true_1 = np.array([[0, 1], [1, 0], [1, 0], [0, 1]]*10) # multilabel-indicator
y_true_2 = np.array([1, 0, 0, 1]*10) # binary
y_prob_1 = np.array([[0.1, 0.9], [0.8, 0.2], [0.3, 0.7], [0.4, 0.6]]*10)
y_prob_2 = np.array([0.9, 0.2, 0.7, 0.6]*10)roc_auc_score(y_true_1, y_prob_1) # no error, can extent to multiclass
roc_auc_score(y_true_1, y_prob_2) # error, shape not consistent
roc_auc_score(y_true_2, y_prob_1) # error, shape not consistent
roc_auc_score(y_true_2, y_prob_2)
# no error, result same as the first one, only for binary classification However, since roc_auc_scorer is wrapped by _ThresholdScorer, it can support more complicated situations. if y_type == "binary":
y_pred = y_pred[:, 1]roc_auc_scorer can actually support binary y_true(shape=[n_samples]) and the output of predict_proba (shape=[n_samples, n_classes]). X = y_prob_1 # only for convinience
cross_val_score(DecisionTreeClassifier(), X, y_true_1, scoring="roc_auc")
# no problem like roc_auc_score
cross_val_score(DecisionTreeClassifier(), X, y_true_2, scoring="roc_auc")
# no problem with the wrapperWhen implementing scorer based on brier_loss_score, we need to use _ThresholdScorer. But unlike _ProbaScorer, it is naive since the only scorer based on it(log_loss_scorer) has done everything by itself. log_loss(y_true_1, y_prob_1) # no error
log_loss(y_true_1, y_prob_2) # no error
log_loss(y_true_2, y_prob_1) # no error
log_loss(y_true_2, y_prob_2) # no error
# they are the same |
|
The question was for binary classification "why does it work at all" and the answer is: I think we can be flexible and do the same in Btw, it looks like |
jnothman
left a comment
There was a problem hiding this comment.
Yes, I think this approach is fine, copying the ThresholdScorer behaviour to ProbaScorer. LGTM
|
Alright, I'm fine with merging this as well. Note, though, that it changes what is passed for Anyway, I'm ok to merge. |
|
@jnothman @amueller Thanks. I updated what's new along with some minor fix about myself (things in 0.19.X but not in master and a strange duplicate entry.) |
|
slight ping @jnothman Thanks :) |
| y_type = type_of_target(y) | ||
| y_pred = clf.predict_proba(X) | ||
| if y_type == "binary": | ||
| y_pred = y_pred[:, 1] |
There was a problem hiding this comment.
I think this change needs a non-regression test.
Otherwise LGTM.
There was a problem hiding this comment.
@jnothman Sorry but I don't quite understand your meaning. Currently, it seems that we don't test the wrapper of the scorer. Instead, we directly test the scorer to ensure a reasonable behaviour. And I have added the new scorer to the test.
There was a problem hiding this comment.
Oh. I think I got confused about what we were looking at. LGTM.


Reference Issue
Fixes #9512
What does this implement/fix? Explain your changes.
Add scorer based on brier_score_loss.
Any other comments?