Feature Request: function to calculate Expected Calibration Error (ECE)

#### Describe the workflow you want to enable

I would like to add the ability to calculate Expected Calibration Error (ECE) within scikit-learn. ECE is defined in equation (3) from [Guo et al. *On Calibration of Modern Neural Networks.* (2017).](https://arxiv.org/abs/1706.04599) This is a well-cited paper (over 700 citations as of Aug. 26, 2020), and the ECE metric is now widely used in academic papers on model calibration. There is even a method now in TensorFlow Probability for computing ECE: [tfp.stats.expected_calibration_error()](https://www.tensorflow.org/probability/api_docs/python/tfp/stats/expected_calibration_error).


#### Describe your proposed solution

I can see several possibilities for how the ECE calculation could be added to scikit-learn, with pros and cons for each:

1. make ECE an extra return value of `sklearn.calibration.calibration_curve()`. This method currently already does 99% of the work towards calculating ECE. Computing ECE is simply a one-liner added to the end of that method:

    ```python
    def calibration_curve(...):
        ...
        ece = np.sum(np.abs(prob_true - prob_pred) * (bin_total[nonzero] / len(y_true)))
        return prob_true, prob_pred, ece
    ```

    The downside is that introducing a 3rd return value is a breaking change. One possible mitigation is to add a boolean `return_ece` parameter to the function definition with a default value of `False`, and only return `ece` if `return_ece=True`.

2. add a dedicated function to `sklearn.metrics`, i.e., `sklearn.metrics.expected_calibration_error(y_true, y_pred)`. Downside here is that it would need to re-compute everything that calibration_curve() already computes, so there is a performance penalty for a user who wants to both calculate ECE and plot a calibration curve.

3. add a dedicated function to `sklearn.calibration`, i.e., `sklearn.calibration.expected_calibration_error(y_true, y_pred)`. This would keep the ECE calibration within the calibration subpackage. Same downside as in option 2.

#### Additional context

I am happy to write the code and tests to add ECE calculation to scikit-learn. After all, the code to calculate ECE is just a one line addition to `calibration_curve()`. However, as I am not a regular contributor to scikit-learn, I am unfamiliar with the best place within the scikit-learn library to add such a feature. Please advise!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature Request: function to calculate Expected Calibration Error (ECE) #18268

Describe the workflow you want to enable

Describe your proposed solution

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Feature Request: function to calculate Expected Calibration Error (ECE) #18268

Description

Describe the workflow you want to enable

Describe your proposed solution

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions