Incorrect interpretation of Brier score loss in docstring 

https://github.com/scikit-learn/scikit-learn/blob/a24c8b464d094d2c468a16ea9f8bf8d42d949f84/sklearn/metrics/classification.py#L1852

> Therefore, the lower the Brier score is for a set of predictions, the better the predictions are calibrated.

As far as I can tell, this claim is incorrect (also incorrect in the Wikipedia entry cited) and should be deleted or detailed (one component of the two-term decomposition of the Brier score does assess calibration).

Here's a counter-example on a dataset with 12 points and 2 estimators.

model A | model B | y
-- | -- | --
0.00 | 0.00 | 0
0.25 | 0.25 | 0
0.25 | 0.25 | 0
0.25 | 0.25 | 0
0.25 | 0.75 | 1
0.50 | 0.50 | 0
0.50 | 0.50 | 1
0.75 | 0.25 | 0
0.75 | 0.75 | 1
0.75 | 0.75 | 1
0.75 | 0.75 | 1
1.00 | 1.00 | 1


Brier scores: 

model A | model B
-- | --
0.166666666667 | 0.0833333333333

Calibration curves:

![image](https://user-images.githubusercontent.com/2240469/38032701-944774ea-329e-11e8-8a77-6835ae244d45.png)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incorrect interpretation of Brier score loss in docstring #10883

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

model A	model B	y
0.00	0.00	0
0.25	0.25	0
0.25	0.25	0
0.25	0.25	0
0.25	0.75	1
0.50	0.50	0
0.50	0.50	1
0.75	0.25	0
0.75	0.75	1
0.75	0.75	1
0.75	0.75	1
1.00	1.00	1

Uh oh!

Incorrect interpretation of Brier score loss in docstring #10883

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions