Describe the workflow you want to enable
I would like to add the ability to calculate Expected Calibration Error (ECE) within scikit-learn. ECE is defined in equation (3) from Guo et al. On Calibration of Modern Neural Networks. (2017). This is a well-cited paper (over 700 citations as of Aug. 26, 2020), and the ECE metric is now widely used in academic papers on model calibration. There is even a method now in TensorFlow Probability for computing ECE: tfp.stats.expected_calibration_error().
Describe your proposed solution
I can see several possibilities for how the ECE calculation could be added to scikit-learn, with pros and cons for each:
-
make ECE an extra return value of sklearn.calibration.calibration_curve(). This method currently already does 99% of the work towards calculating ECE. Computing ECE is simply a one-liner added to the end of that method:
def calibration_curve(...):
...
ece = np.sum(np.abs(prob_true - prob_pred) * (bin_total[nonzero] / len(y_true)))
return prob_true, prob_pred, ece
The downside is that introducing a 3rd return value is a breaking change. One possible mitigation is to add a boolean return_ece parameter to the function definition with a default value of False, and only return ece if return_ece=True.
-
add a dedicated function to sklearn.metrics, i.e., sklearn.metrics.expected_calibration_error(y_true, y_pred). Downside here is that it would need to re-compute everything that calibration_curve() already computes, so there is a performance penalty for a user who wants to both calculate ECE and plot a calibration curve.
-
add a dedicated function to sklearn.calibration, i.e., sklearn.calibration.expected_calibration_error(y_true, y_pred). This would keep the ECE calibration within the calibration subpackage. Same downside as in option 2.
Additional context
I am happy to write the code and tests to add ECE calculation to scikit-learn. After all, the code to calculate ECE is just a one line addition to calibration_curve(). However, as I am not a regular contributor to scikit-learn, I am unfamiliar with the best place within the scikit-learn library to add such a feature. Please advise!
Describe the workflow you want to enable
I would like to add the ability to calculate Expected Calibration Error (ECE) within scikit-learn. ECE is defined in equation (3) from Guo et al. On Calibration of Modern Neural Networks. (2017). This is a well-cited paper (over 700 citations as of Aug. 26, 2020), and the ECE metric is now widely used in academic papers on model calibration. There is even a method now in TensorFlow Probability for computing ECE: tfp.stats.expected_calibration_error().
Describe your proposed solution
I can see several possibilities for how the ECE calculation could be added to scikit-learn, with pros and cons for each:
make ECE an extra return value of
sklearn.calibration.calibration_curve(). This method currently already does 99% of the work towards calculating ECE. Computing ECE is simply a one-liner added to the end of that method:The downside is that introducing a 3rd return value is a breaking change. One possible mitigation is to add a boolean
return_eceparameter to the function definition with a default value ofFalse, and only returneceifreturn_ece=True.add a dedicated function to
sklearn.metrics, i.e.,sklearn.metrics.expected_calibration_error(y_true, y_pred). Downside here is that it would need to re-compute everything that calibration_curve() already computes, so there is a performance penalty for a user who wants to both calculate ECE and plot a calibration curve.add a dedicated function to
sklearn.calibration, i.e.,sklearn.calibration.expected_calibration_error(y_true, y_pred). This would keep the ECE calibration within the calibration subpackage. Same downside as in option 2.Additional context
I am happy to write the code and tests to add ECE calculation to scikit-learn. After all, the code to calculate ECE is just a one line addition to
calibration_curve(). However, as I am not a regular contributor to scikit-learn, I am unfamiliar with the best place within the scikit-learn library to add such a feature. Please advise!