[MRG+1] Isotonic calibration by agramfort · Pull Request #1176 · scikit-learn/scikit-learn

agramfort · 2012-09-23T16:12:37Z

calibration module with platt and Isotonic calibration.

A few issues :

nice calibration plots
add tests for metrics
add narrative doc
speed up the test execution as much as possible
fix multi-class support for the isotonic regression method

ogrisel · 2012-09-23T16:20:00Z

examples/plot_calibration.py

You should use sklearn.utils.train_test_split to transform this block into 2 one-liners:

http://scikit-learn.org/dev/modules/generated/sklearn.cross_validation.train_test_split.html

mblondel · 2012-09-23T16:32:55Z

"Brier score" seems to be the accepted name so I'm inclined to keep it that way.

IIRC, in "Transforming Classiﬁer Scores into Accurate Multiclass Probability Estimates" (KDD 2002), they study several methods and conclude that one-vs-rest in the most practical solution.

agramfort · 2012-09-23T20:15:30Z

"Brier score" seems to be the accepted name so I'm inclined to keep it
that way.

hum. I am sure it will pass the consistency brigade :)

IIRC, in "Transforming Classiﬁer Scores into Accurate Multiclass
Probability Estimates" (KDD 2002), they study several methods and conclude
that one-vs-rest in the most practical solution.

ok so IsotonicCalibrator should use OneVsRestClassifier internally and
the proba are estimated for each pair and normalized to sum to 1
right?

mblondel · 2012-09-24T06:40:38Z

I guess IsotanicCalibrator could take the 2d output of decision_function as an input and produce multiclass probabilities. Please read the reference I gave, I don't remember it well.

ogrisel · 2012-09-24T06:47:18Z

About the Brier score being not a sklearn consistent score (higher == worst in this case): I don't really know what would be best as the sklearn naming convention is indeed conflicting with this official name.

http://en.wikipedia.org/wiki/Brier_score

I would be +0 for keeping the brier_score name and emphasizing the fact that higher values means less confident estimates in the docstring and the narrative doc.

ogrisel · 2012-09-24T06:48:06Z

sklearn/metrics/metrics.py

agramfort · 2012-09-24T13:32:30Z

I guess IsotanicCalibrator could take the 2d output of decision_function
as an input and produce multiclass probabilities. Please read the reference
I gave, I don't remember it well.

indeed in there experimental results they use OvR

I think it's a good idea to fit an IR to each decision score.

agramfort · 2012-09-24T13:33:57Z

About the Brier score being not a sklearn consistent score (higher ==
worst in this case): I don't really no what would be best as the sklearn
naming convention is indeed conflicting with this official name.

http://en.wikipedia.org/wiki/Brier_score

I would be +0 for keeping the brier_score name and emphasizing the fact
that higher values means less confident estimates in the docstring and the
narrative doc.

can we call it brier ? or brier_error ? as it's a mean squared error
on the proba.

mblondel · 2012-09-24T13:50:34Z

One solution would be to introduce decorators to label some functions as scores and others as losses. Of course, this is out of the scope of this PR... (This idea would also be useful to define whether a metric accepts predicted labels or predicted scores, c.f. the AUC issue)

Another solution is to introduce a function negative_brier_score, which is just the brier score multiplied by -1. But I think it's important to use the most commonly used names so -1 for this solution.

So adding a note to the documentation as @ogrisel suggested seems like a good temporary solution to me.

paolo-losi · 2012-09-29T22:20:50Z

Some quick comments ...

+1 for the name brier_score
I would consider passing a cross validation class to IsotonicCalibration.__init__().
Using one fold (oo_bag data) in the case of small number of samples problems could be insufficient.
I would also add log_likelihood_loss for probability estimation evaluation.
See "The problem with the Brier score" by Stephen Jewson
I would warn in the doc agains using Isotonic Calibration with less than 1000 calibration samples since it tends
overfit (Platt's calibration is advisable in that case). I should be able to find a paper on the subject if anyone
is interested
+1 for default number of bins = sqrt(samples) for calibration_plot

agramfort · 2012-09-30T08:49:27Z

hi paolo,

thanks for this valuable feedback. I won't work on this in the next few
days to if you want to improve on my PR please do so. I'll merge your
commits into my PR.

ogrisel · 2012-09-30T10:35:56Z

I would warn in the doc agains using Isotonic Calibration with less than 1000 calibration samples since it tends overfit (Platt's calibration is advisable in that case). I should be able to find a paper on the subject if anyone is interested

+1

mblondel · 2013-02-01T17:09:53Z

Using one fold (oo_bag data) in the case of small number of samples problems could be insufficient.

If you use more than one fold, how do you combine the results?

mblondel · 2013-02-01T17:13:01Z

sklearn/calibration.py

The above is not consistent with our API. So, a CV object passed to the constructor is a good idea in any case.

agramfort · 2013-02-25T21:41:41Z

rebased on master + address some comments

I needed a coding break... ;)

If you feel like playing with it please do...

ogrisel · 2013-02-26T12:37:19Z

You should provide a default value for the estimator constructor param maybe, a baseline that is fast such as MultinomialNB that is both fast, has few hyperparameters and benefit from calibration:

======================================================================
ERROR: sklearn.tests.test_common.test_all_estimators
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/nose/case.py", line 197, in runTest
    self.test(*self.arg)
  File "/home/travis/build/scikit-learn/scikit-learn/sklearn/tests/test_common.py", line 68, in test_all_estimators
    estimator = Estimator()
TypeError: __init__() takes exactly 2 arguments (1 given)

agramfort · 2013-07-23T15:28:05Z

I've rebased and cleanup the example. I'am open to discussion regarding API and multiclass handling. I don't know how to take care of the OOB to estimate the probas.

agramfort · 2013-07-23T15:36:41Z

@mblondel any chance you can provide feedback on this? I'll get back to it tomorrow morning.

mblondel · 2013-07-24T01:26:58Z

sklearn/calibration.py

How about generating the calibration data with a cv object (parameter of the constructor)?

coveralls · 2014-07-16T13:07:52Z

Changes Unknown when pulling bd727f3 on agramfort:isotonic_calibration into * on scikit-learn:master*.

agramfort · 2014-07-16T16:50:16Z

@ogrisel @mblondel have a look :)

amueller · 2014-07-16T16:56:13Z

sklearn/metrics/metrics.py

space before ":" ;)

jmetzen · 2015-02-18T11:18:59Z

Alright, I squashed it now into 5 generic commits (calibration module, brier score, tests, examples, narrative doc). I've also added @mblondel to the list of authors in calibration.py

coveralls · 2015-02-18T11:29:35Z

Coverage increased (+0.02%) to 95.06% when pulling 1a2a9ae on agramfort:isotonic_calibration into f20ff86 on scikit-learn:master.

agramfort · 2015-02-18T20:20:18Z

I pushed a couple of commits (cosmit + coverage improvement). The coverage could be slightly improved. Currently we have 95% coverage.

ogrisel · 2015-02-19T10:34:14Z

Alright, I squashed it now into 5 generic commits (calibration module, brier score, tests, examples, narrative doc). I've also added @mblondel to the list of authors in calibration.py

Great, thank you very much!

ogrisel · 2015-02-19T10:37:36Z

The coverage could be slightly improved. Currently we have 95% coverage.

Indeed there is a couple of exceptions that should be covered by additional assert_raises or assert_raise_message checks:

https://coveralls.io/builds/1947924/source?filename=sklearn%2Fcalibration.py

It should be easy to raise the coverage close to 99% on that file.

coveralls · 2015-02-19T18:55:47Z

Coverage increased (+0.03%) to 95.07% when pulling 838b06e on agramfort:isotonic_calibration into f20ff86 on scikit-learn:master.

jmetzen · 2015-02-19T19:34:11Z

I've added some assert_raises, coverage of calibration.py should now be effectively 100%

ogrisel · 2015-02-20T14:19:19Z

Thanks. I think this is ok for merge. @agramfort @mblondel any more comments?

agramfort · 2015-02-20T14:27:56Z

+1 for merge on my side.

mblondel · 2015-02-20T15:02:19Z

Just to confirm: do we want really want sigmoid_calibration to be a public function?

Other than that +1 as well!

agramfort · 2015-02-20T15:09:16Z

Let's make sigmoid_calibration private indeed.

ogrisel · 2015-02-20T15:39:33Z

+1 for a private sigmoid_calibration as well. I will do the change and merge.

arjoly · 2015-02-20T16:36:12Z

sklearn/metrics/classification.py

Can you test this functions using the common tests in https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/metrics/tests/test_common.py ?

Done (these tests are pretty nice; I didn't know they exist). I had to modify test_invariance_string_vs_numbers_labels() slightly such that pos_label is also set for THRESHOLDED_METRICS

ogrisel · 2015-02-20T18:44:08Z

Oops I had already merged, I will cherry pick this.

ogrisel · 2015-02-20T18:47:12Z

Done! 🍻 Thank you very much everyone!

agramfort · 2015-02-20T19:12:17Z

yeah 🍻 !!!

GaelVaroquaux · 2015-02-20T19:12:59Z

🍻 indeed people. Congratulations!

jmetzen · 2015-02-20T19:52:13Z

Thanks for merging! 🍻

mblondel · 2015-02-20T23:26:13Z

Congrats!

When was this effort started again? Two? Three years ago? :)

amueller · 2015-03-02T17:40:44Z

examples/calibration/plot_calibration_curve.py

This raises a warning as GaussianNB doesn't support sample_weights. If this is expected behavior, I think the warning should be caught.

do we really want to catch warnings in examples... maybe just quickly add sample_weights to GaussianNB? :)

Do we really want examples to raise warnings? ;) I thought the use of a classifier that doesn't have sample_weights was intentional.

is it me or there is no use of sample_weight in this example?

I thought it was used internally in the calibration. But you are right, it shouldn't warn if it is not used. Will investigate!

ogrisel reviewed Sep 23, 2012
View reviewed changes

ogrisel reviewed Sep 24, 2012
View reviewed changes

sklearn/metrics/metrics.py Outdated

Copy link
Copy Markdown

Member

ogrisel Sep 24, 2012

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP 257

mblondel mentioned this pull request Sep 29, 2012

WIP: Calibration plot #882

Closed

mblondel mentioned this pull request Nov 28, 2012

implemented predict_proba for OneVsRestClassifier #1416

Closed

mblondel reviewed Feb 1, 2013
View reviewed changes

mblondel reviewed Jul 24, 2013
View reviewed changes

amueller reviewed Jul 16, 2014
View reviewed changes

sklearn/metrics/metrics.py Outdated

Copy link
Copy Markdown

Member

amueller Jul 16, 2014

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

space before ":" ;)

DOC Narrative doc for the calibration module

1a2a9ae

jmetzen force-pushed the isotonic_calibration branch from 346e75e to 1a2a9ae Compare February 18, 2015 11:16

COSMIT : pep8 + 2 spaces

458669b

TST improve coverage of calibration.py

db13132

jmetzen force-pushed the isotonic_calibration branch from 838b06e to db13132 Compare February 19, 2015 19:14

ogrisel mentioned this pull request Feb 20, 2015

[MRG + 2] EnsembleClassifier implementation #4161

Merged

arjoly reviewed Feb 20, 2015
View reviewed changes

TST Adding brier_score_loss to test_common.py

9474f09

ogrisel closed this Feb 20, 2015

amueller reviewed Mar 2, 2015
View reviewed changes

lesteve mentioned this pull request Mar 29, 2018

Incorrect interpretation of Brier score loss in docstring #10883

Closed

Uh oh!

Conversation

agramfort commented Sep 23, 2012

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mblondel commented Sep 23, 2012

Uh oh!

agramfort commented Sep 23, 2012

Uh oh!

mblondel commented Sep 24, 2012

Uh oh!

ogrisel commented Sep 24, 2012

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agramfort commented Sep 24, 2012

Uh oh!

agramfort commented Sep 24, 2012

Uh oh!

mblondel commented Sep 24, 2012

Uh oh!

paolo-losi commented Sep 29, 2012

Uh oh!

agramfort commented Sep 30, 2012

Uh oh!

ogrisel commented Sep 30, 2012

Uh oh!

mblondel commented Feb 1, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

agramfort commented Feb 25, 2013

Uh oh!

ogrisel commented Feb 26, 2013

Uh oh!

agramfort commented Jul 23, 2013

Uh oh!

agramfort commented Jul 23, 2013

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coveralls commented Jul 16, 2014

Uh oh!

agramfort commented Jul 16, 2014

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jmetzen commented Feb 18, 2015

Uh oh!

coveralls commented Feb 18, 2015

Uh oh!

agramfort commented Feb 18, 2015

Uh oh!

ogrisel commented Feb 19, 2015

Uh oh!

ogrisel commented Feb 19, 2015

Uh oh!

coveralls commented Feb 19, 2015

Uh oh!

jmetzen commented Feb 19, 2015

Uh oh!

ogrisel commented Feb 20, 2015

Uh oh!

agramfort commented Feb 20, 2015

Uh oh!

mblondel commented Feb 20, 2015

Uh oh!

agramfort commented Feb 20, 2015 via email

Uh oh!

ogrisel commented Feb 20, 2015

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ogrisel commented Feb 20, 2015

Uh oh!

ogrisel commented Feb 20, 2015

Uh oh!