Skip to content

Calibration and Refinement loss for Brier score loss #21774

@ColdTeapot273K

Description

@ColdTeapot273K

Describe the workflow you want to enable

As per Brier score User Guide:

“Only when refinement loss remains the same does a lower Brier score loss always mean better calibration”

But the current API doesn't provide refinement loss/calibration loss.
Which makes it hard to measure the quality of probabilistic estimates.

Describe your proposed solution

My proposed solution involves implementing the one described in the paper [Flach2008]
, the reference which is conveniently already mentioned in the User Guide

Namely, estimating Calibration loss and Refinement loss without any binning, on raw data/predictions.

If the community decides this is a valuable addition, I humbly present my implementation, which is WIP in terms of Scikit-Learn codebase conventions compliance and corner case processing, but is functioning in essence.

Describe alternatives you've considered, if relevant

Visual comparison of calibration curves
<not exact, not scalable

Making custom ad hoc metrics to estimate probability errors
<fragile, dubious

Using sklearn.calibration.calibration_curve
<requires binning1

Using #11096
<requires binning1

1 Now, problem with binning is described in [Bella2012]. To quote:

The problem of using bins is that if too few bins are defined, the real probabilities are
not properly detailed to give an accurate evaluation. If too many bins are defined, the real
probabilities are not properly estimated. A partial solution to this problem is to make the bins
overlap.

And using overlapping bins seems like and additional degree of freedom, additional parameter one'd have to keep in mind, tune and argue about with collegues.

Additional context

I'm aware that contributors to #11096 have done work in implementing Calibration loss with binning and clarifying docs on calibration topic in general. So I'd like to get feedback on that and I'm open to suggestions how to proceed.

UPD:
Closes #18268, #21718

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions