[MRG+1] Add scorer based on brier_score_loss by qinhanmin2014 · Pull Request #9521 · scikit-learn/scikit-learn

qinhanmin2014 · 2017-08-11T00:49:46Z

Reference Issue

What does this implement/fix? Explain your changes.

Add scorer based on brier_score_loss.

Any other comments?

jrbourbeau · 2017-08-11T02:30:54Z

sklearn/metrics/scorer.py

 log_loss_scorer = make_scorer(log_loss, greater_is_better=False,
                              needs_proba=True)
 log_loss_scorer._deprecation_msg = deprecation_msg
+brier_score_loss_scorer = make_scorer(brier_score_loss)


Hey @qinhanmin2014, I think that the greater_is_better parameter in make_scorer should be set to False because brier_score_loss is a loss function. It looks like it is True by default (see https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/scorer.py#L400)

qinhanmin2014 · 2017-08-11T15:07:36Z

@jnothman Sorry to disturb but I think currently we can't implement scorer based on brier_score_loss because like other similar metrics(e.g., log_loss with a scorer log_loss_scorer), it depends on predicted probabilities but it simply can't support the result returned by predict_proba(shape= (n_samples, n_classes)).
From log_loss(already has a scorer):

From brier_score_loss:

Do I make a mistake? Thanks.

amueller · 2017-08-11T22:20:26Z

you just need to pass just the second column, maybe using

make_scorer(lambda y_true, y_pred: brier_score_loss(y_true, y_pred[:, 1]),
            needs_proba=True, greater_is_better=False)

(untested)

qinhanmin2014 · 2017-08-12T03:46:58Z

@amueller Thanks a lot. It works! I also add a comment to address the reason(maybe limitation) of current implementation.

jnothman · 2017-08-13T13:43:36Z

sklearn/metrics/scorer.py

 log_loss_scorer = make_scorer(log_loss, greater_is_better=False,
                              needs_proba=True)
 log_loss_scorer._deprecation_msg = deprecation_msg
+# Currently brier_score_loss don't support the shape of result


But surely we need the same for roc_auc_score (although that can fall back on decision_function when available). What's going on in that case?

@jnothman Thanks. I checked the document. roc_auc_score support shape = [n_samples] or [n_samples, n_classes]. (note that roc_auc_score belongs to _ThresholdScorer, and log_loss_score belongs to _ProbaScorer)

If brier_score is the odd one in not supporting the standard predict_proba shape, we might want to fix it there instead?

jnothman · 2017-08-13T22:25:24Z

hmm it's not altogether the fault of the metric function. I think something like what you have here is okay, except that it needs some validating, to make sure the input is of the right shape... I'm struggling to see where roc selects one column from the input. Can you trace what's going on there?

…

On 14 Aug 2017 1:50 am, "Andreas Mueller" ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In sklearn/metrics/scorer.py <#9521 (comment)> : > @@ -514,6 +514,13 @@ def make_scorer(score_func, greater_is_better=True, needs_proba=False, log_loss_scorer = make_scorer(log_loss, greater_is_better=False, needs_proba=True) log_loss_scorer._deprecation_msg = deprecation_msg +# Currently brier_score_loss don't support the shape of result If brier_score is the odd one in not supporting the standard predict_proba shape, we might want to fix it there instead? I'm not entirely certain, though... — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9521 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz6yEe34tFk7qI4_4pK2W1f4z9PAfEks5sXxtYgaJpZM4O0JGq> .

qinhanmin2014 · 2017-08-14T04:01:51Z

@jnothman Thanks.
(1)According to the docstring, brier_score_loss is only used in binary cases. If this requirement is satisfied, the implementation of the scorers seems to be correct.
(2)However currently, we don't have such check in brier_score_loss, e.g., if we change y_true from [0,1,1,0] to [0,2,2,1], we get the same result(because we regard the biggest value as 1, the rest as 0).
I'm inspecting roc_auc_score and will reply soon :)

qinhanmin2014 · 2017-08-14T15:19:45Z

@jnothman Here's my observation.
For roc_auc_score, when the shape of y_score is [n_samples, n_classes], the shape of y_true also need to be [n_samples, n_classes] (multilabel-indicator). Then each corresponding line is selected. The scores are then calculated and averaged.
Core function:_average_binary_score(in base.py)

for c in range(n_classes):
        y_true_c = y_true.take([c], axis=not_average_axis).ravel()
        y_score_c = y_score.take([c], axis=not_average_axis).ravel()
        score[c] = binary_metric(y_true_c, y_score_c,
                                 sample_weight=score_weight)

Note that in the loop, the shape of y_true_c and y_score_c has become [n_samples]. So that's why roc_auc_score is based on roc_curve and roc_curve only supports shape = [n_samples].

amueller · 2017-08-14T17:21:51Z

But if someone does cross_val_score(DecisionTreeClassifier(), X, y, scoring="roc_auc") that works with binary even though predict_proba is (n_samples, 2) and y is not multi-label.

jnothman · 2017-08-14T23:58:33Z

yes, the binary case is what I was wondering about. How is that working? Is it working?

…

On 15 Aug 2017 3:21 am, "Andreas Mueller" ***@***.***> wrote: But if someone does cross_val_score(DecisionTreeClassifier(), X, y, scoring="roc_auc") that works with binary even though predict_proba is (n_samples, 2) and y is not multi-label. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9521 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAEz65fxrCKhiu79o6r2ZXZhEdQCd8-Kks5sYIIzgaJpZM4O0JGq> .

qinhanmin2014 · 2017-08-15T09:24:12Z

@jnothman @amueller (kindly regret me for a long comment and possible typo:))
From my perspective, both @amueller and myself are right. The core issue here is that scorer has a wrapper(_PredictScorer/_ProbaScorer/_ThresholdScorer), which can help it support more complicated situations.
roc_auc_score support binary y_true(shape=[n_samples]) or multilabel-indicator y_true(shape=[n_samples, n_classes]). For binary classification, we can use binary y_true or multilabel-indicator y_true, but for multiclass classification, we can only use multilabel-indicator y_true. The shape of y_pred has to be consistent with y_true.

y_true_1 = np.array([[0, 1], [1, 0], [1, 0], [0, 1]]*10) # multilabel-indicator
y_true_2 = np.array([1, 0, 0, 1]*10) # binary
y_prob_1 = np.array([[0.1, 0.9], [0.8, 0.2], [0.3, 0.7], [0.4, 0.6]]*10)
y_prob_2 = np.array([0.9, 0.2, 0.7, 0.6]*10)

roc_auc_score(y_true_1, y_prob_1) # no error, can extent to multiclass
roc_auc_score(y_true_1, y_prob_2) # error, shape not consistent
roc_auc_score(y_true_2, y_prob_1) # error, shape not consistent
roc_auc_score(y_true_2, y_prob_2) 
# no error, result same as the first one, only for binary classification

However, since roc_auc_scorer is wrapped by _ThresholdScorer, it can support more complicated situations.
For example, through the following code in _ThresholdScorer:

if y_type == "binary":
    y_pred = y_pred[:, 1]

roc_auc_scorer can actually support binary y_true(shape=[n_samples]) and the output of predict_proba (shape=[n_samples, n_classes]).

X = y_prob_1 # only for convinience
cross_val_score(DecisionTreeClassifier(), X, y_true_1, scoring="roc_auc") 
# no problem like roc_auc_score
cross_val_score(DecisionTreeClassifier(), X, y_true_2, scoring="roc_auc") 
# no problem with the wrapper

When implementing scorer based on brier_loss_score, we need to use _ThresholdScorer. But unlike _ProbaScorer, it is naive since the only scorer based on it(log_loss_scorer) has done everything by itself.

log_loss(y_true_1, y_prob_1) # no error
log_loss(y_true_1, y_prob_2) # no error
log_loss(y_true_2, y_prob_1) # no error
log_loss(y_true_2, y_prob_2) # no error
# they are the same

qinhanmin2014 · 2017-08-15T11:26:38Z

@jnothman @amueller
I have come up with another way (might better) for the scorer. Could you please have a look? Thanks.

amueller · 2017-08-15T20:09:06Z

The question was for binary classification "why does it work at all" and the answer is: _ThresholdScorer has an if y_type == 'binary' and we're using predict_proba we slice the probabilities appropriately.
However _ProbaScorer doesn't do that. log_loss, the only other probabilistic score that we have right now, accepts either (n_samples,) and assumes it's the positive class (the larger one) or (n_samples, 2).

I think we can be flexible and do the same in brier_score_loss as we do in log_loss. Making the interface more flexible there also benefits people that directly use the function.

Btw, it looks like brier_score_loss might want to error if only one class is present and pos_label=None as we do in log_loss.

jnothman

Yes, I think this approach is fine, copying the ThresholdScorer behaviour to ProbaScorer. LGTM

qinhanmin2014 · 2017-08-16T05:01:48Z

@jnothman @amueller
I opened a new pull request #9562 to address your concerns about brier_score_loss along with a bug fix. Could you please have a look? Thanks:)

amueller · 2017-08-16T20:58:36Z

Alright, I'm fine with merging this as well. Note, though, that it changes what is passed for log_loss, but that's inconsequential (unless someone provided a malformed probability that doesn't sum to 1, in which case this patch will change results).

Anyway, I'm ok to merge.

qinhanmin2014 · 2017-08-17T01:47:30Z

@jnothman @amueller Thanks. I updated what's new along with some minor fix about myself (things in 0.19.X but not in master and a strange duplicate entry.)
From my perspective, @jnothman could you please check @amueller 's previous comment. Current implementation will slightly change the behaviour of the scorer based on log_loss when users manage to pass probability that doesn't sum to 1 in binary case.
e.g.
y_pred=np.array([[0.1, 0.1], [0.9, 0.9], [0.2, 0.2], [0.8, 0.8]])
is previously perceived to be equal to y_pred=np.array([0.5, 0.5, 0.5, 0.5]) (normalized)
but now equal to y_pred=np.array([0.1, 0.9, 0.2, 0.8]) (we only take the second column and it's considered to be the probability of the positive class)
Note that the doc state that we only support predicted probabilities for log_loss. We can go back to the previous solution to remain everything the same but that might seems awkward for the scorer.

qinhanmin2014 · 2017-08-21T13:30:38Z

slight ping @jnothman Thanks :)

jnothman · 2017-08-21T13:40:27Z

sklearn/metrics/scorer.py

+        y_type = type_of_target(y)
        y_pred = clf.predict_proba(X)
+        if y_type == "binary":
+            y_pred = y_pred[:, 1]


I think this change needs a non-regression test.

Otherwise LGTM.

@jnothman Sorry but I don't quite understand your meaning. Currently, it seems that we don't test the wrapper of the scorer. Instead, we directly test the scorer to ensure a reasonable behaviour. And I have added the new scorer to the test.

Oh. I think I got confused about what we were looking at. LGTM.

qinhanmin2014 added 2 commits August 11, 2017 08:46

add scorer based on brier_score_loss

da0f673

fix test

844f08e

jrbourbeau reviewed Aug 11, 2017

View reviewed changes

update parameter

063c9f3

qinhanmin2014 changed the title ~~[MRG] Add scorer based on brier_score_loss~~ [WIP] Add scorer based on brier_score_loss Aug 11, 2017

update parameter

995b1a3

update parameter

c8773ab

qinhanmin2014 changed the title ~~[WIP] Add scorer based on brier_score_loss~~ [MRG] Add scorer based on brier_score_loss Aug 12, 2017

qinhanmin2014 added 2 commits August 12, 2017 16:38

better format

1be7c7e

make test pass?

6d27fca

qinhanmin2014 mentioned this pull request Aug 12, 2017

test_kernel_ridge_singular_kernel sometimes fails on Travis #9535

Closed

jnothman reviewed Aug 13, 2017

View reviewed changes

new idea, kindly review if CIs green

9239c2b

jnothman reviewed Aug 15, 2017

View reviewed changes

jnothman changed the title ~~[MRG] Add scorer based on brier_score_loss~~ [MRG+1] Add scorer based on brier_score_loss Aug 15, 2017

qinhanmin2014 mentioned this pull request Aug 16, 2017

[MRG] Improvement and bug fix for brier_score_loss #9562

Closed

Merge branch 'master' into my-feature-1

2b3d51d

qinhanmin2014 added 2 commits August 17, 2017 08:41

update what's new

cb40c5c

update what's new

3b096e8

jnothman reviewed Aug 21, 2017

View reviewed changes

jnothman merged commit ee2025f into scikit-learn:master Aug 21, 2017

qinhanmin2014 deleted the my-feature-1 branch August 22, 2017 00:40

AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017

[MRG+1] Add scorer based on brier_score_loss (scikit-learn#9521)

e06ad15

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017

[MRG+1] Add scorer based on brier_score_loss (scikit-learn#9521)

3ac32ae

qinhanmin2014 mentioned this pull request Dec 5, 2017

Assymetry of roc_auc_score #10247

Closed

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017

[MRG+1] Add scorer based on brier_score_loss (scikit-learn#9521)

2fe624f

alien3211 pushed a commit to alien3211/scikit-learn that referenced this pull request Dec 21, 2018

Update make_scorer docs. Missing information after scikit-learn#9521 PR.

f8c4d5f

qinhanmin2014 mentioned this pull request Jul 13, 2019

DOC make_scorer now requires score function to accpet 1d y_pred when needs_proba is True #14318

Merged

Uh oh!

Conversation

qinhanmin2014 commented Aug 11, 2017

Reference Issue

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

jrbourbeau Aug 11, 2017

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Aug 11, 2017

Uh oh!

amueller commented Aug 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinhanmin2014 commented Aug 12, 2017

Uh oh!

jnothman Aug 13, 2017

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Aug 13, 2017

Choose a reason for hiding this comment

Uh oh!

amueller Aug 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jnothman commented Aug 13, 2017 via email

Uh oh!

qinhanmin2014 commented Aug 14, 2017

Uh oh!

qinhanmin2014 commented Aug 14, 2017

Uh oh!

amueller commented Aug 14, 2017

Uh oh!

jnothman commented Aug 14, 2017 via email

Uh oh!

qinhanmin2014 commented Aug 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinhanmin2014 commented Aug 15, 2017

Uh oh!

amueller commented Aug 15, 2017

Uh oh!

jnothman left a comment

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 commented Aug 16, 2017

Uh oh!

amueller commented Aug 16, 2017

Uh oh!

qinhanmin2014 commented Aug 17, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qinhanmin2014 commented Aug 21, 2017

Uh oh!

jnothman Aug 21, 2017

Choose a reason for hiding this comment

Uh oh!

qinhanmin2014 Aug 21, 2017

Choose a reason for hiding this comment

Uh oh!

jnothman Aug 21, 2017

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

amueller commented Aug 11, 2017 •

edited

Loading

amueller Aug 13, 2017 •

edited

Loading

qinhanmin2014 commented Aug 15, 2017 •

edited

Loading

qinhanmin2014 commented Aug 17, 2017 •

edited

Loading