-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Feature Request: function to calculate Expected Calibration Error (ECE) #18268
Description
Describe the workflow you want to enable
I would like to add the ability to calculate Expected Calibration Error (ECE) within scikit-learn. ECE is defined in equation (3) from Guo et al. On Calibration of Modern Neural Networks. (2017). This is a well-cited paper (over 700 citations as of Aug. 26, 2020), and the ECE metric is now widely used in academic papers on model calibration. There is even a method now in TensorFlow Probability for computing ECE: tfp.stats.expected_calibration_error().
Describe your proposed solution
I can see several possibilities for how the ECE calculation could be added to scikit-learn, with pros and cons for each:
-
make ECE an extra return value of
sklearn.calibration.calibration_curve(). This method currently already does 99% of the work towards calculating ECE. Computing ECE is simply a one-liner added to the end of that method:def calibration_curve(...): ... ece = np.sum(np.abs(prob_true - prob_pred) * (bin_total[nonzero] / len(y_true))) return prob_true, prob_pred, ece
The downside is that introducing a 3rd return value is a breaking change. One possible mitigation is to add a boolean
return_eceparameter to the function definition with a default value ofFalse, and only returneceifreturn_ece=True. -
add a dedicated function to
sklearn.metrics, i.e.,sklearn.metrics.expected_calibration_error(y_true, y_pred). Downside here is that it would need to re-compute everything that calibration_curve() already computes, so there is a performance penalty for a user who wants to both calculate ECE and plot a calibration curve. -
add a dedicated function to
sklearn.calibration, i.e.,sklearn.calibration.expected_calibration_error(y_true, y_pred). This would keep the ECE calibration within the calibration subpackage. Same downside as in option 2.
Additional context
I am happy to write the code and tests to add ECE calculation to scikit-learn. After all, the code to calculate ECE is just a one line addition to calibration_curve(). However, as I am not a regular contributor to scikit-learn, I am unfamiliar with the best place within the scikit-learn library to add such a feature. Please advise!