Skip to content

Feature Request: function to calculate Expected Calibration Error (ECE) #18268

@chrisyeh96

Description

@chrisyeh96

Describe the workflow you want to enable

I would like to add the ability to calculate Expected Calibration Error (ECE) within scikit-learn. ECE is defined in equation (3) from Guo et al. On Calibration of Modern Neural Networks. (2017). This is a well-cited paper (over 700 citations as of Aug. 26, 2020), and the ECE metric is now widely used in academic papers on model calibration. There is even a method now in TensorFlow Probability for computing ECE: tfp.stats.expected_calibration_error().

Describe your proposed solution

I can see several possibilities for how the ECE calculation could be added to scikit-learn, with pros and cons for each:

  1. make ECE an extra return value of sklearn.calibration.calibration_curve(). This method currently already does 99% of the work towards calculating ECE. Computing ECE is simply a one-liner added to the end of that method:

    def calibration_curve(...):
        ...
        ece = np.sum(np.abs(prob_true - prob_pred) * (bin_total[nonzero] / len(y_true)))
        return prob_true, prob_pred, ece

    The downside is that introducing a 3rd return value is a breaking change. One possible mitigation is to add a boolean return_ece parameter to the function definition with a default value of False, and only return ece if return_ece=True.

  2. add a dedicated function to sklearn.metrics, i.e., sklearn.metrics.expected_calibration_error(y_true, y_pred). Downside here is that it would need to re-compute everything that calibration_curve() already computes, so there is a performance penalty for a user who wants to both calculate ECE and plot a calibration curve.

  3. add a dedicated function to sklearn.calibration, i.e., sklearn.calibration.expected_calibration_error(y_true, y_pred). This would keep the ECE calibration within the calibration subpackage. Same downside as in option 2.

Additional context

I am happy to write the code and tests to add ECE calculation to scikit-learn. After all, the code to calculate ECE is just a one line addition to calibration_curve(). However, as I am not a regular contributor to scikit-learn, I am unfamiliar with the best place within the scikit-learn library to add such a feature. Please advise!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions