-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Mean Standardized Log Loss (MSLL) for uncertainty aware regression models #21665
Description
Describe the workflow you want to enable
Why MSLL?
Traditional metrics such as mean_squared_error or r2_score may not be able to correctly evaluate uncertainty aware models because they do not take predictive standard deviation (y_std) into account.
Why sklearn?
To the best of my knowledge, I am unaware of any other library in Python that is widely used and has a well-organized, standardized set of metrics implemented for ML regression models.
Mean Standardized Log Loss (MSLL)
If the above equation (standardized log loss) is averaged over all the values in y_pred and y_std, the resultant metric is Mean Standardized Log Loss (MSLL). The above equation is derived from Eq. 2.34, pp. 23 in GPML book - Cited by 23541.
Properties
- MSLL is useful to evaluate probabilistic models (or any models that can output both
y_predandy_std, like GaussianProcessRegressor). - Lower the MSLL, better the model.
Which algorithms from sklearn can use this metric?
- sklearn.gaussian_process.GaussianProcessRegressor
- sklearn.linear_model.BayesianRidge
- sklearn.linear_model.ARDRegression
- ...
- Maybe more
A demo
-
A comparison between RMSE and MSLL
MSLL is high if the prediction intervals are not well calibrated, on the other side, RMSE can not take this criterion into account.
# Original Author: Vincent Dubourg <vincent.dubourg@gmail.com>
# Jake Vanderplas <vanderplas@astro.washington.edu>
# Jan Hendrik Metzen <jhm@informatik.uni-bremen.de>s
# License: BSD 3 clause
#
# Modified by: Zeel B Patel <patel_zeel@iitgn.ac.in>
import numpy as np
from matplotlib import pyplot as plt
from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C
np.random.seed(1)
def f(x):
"""The function to predict."""
return x * np.sin(x)
def msll(y_true, y_pred, y_std):
first_term = 0.5 * np.log(2 * np.pi * y_std**2)
second_term = ((y_true - y_pred)**2)/(2 * y_std**2)
return np.mean(first_term + second_term)
X = np.linspace(0.1, 9.9, 20)
X = np.atleast_2d(X).T
# Observations and noise
y = f(X).ravel()
dy = 0.5 + 1.0 * np.random.random(y.shape)
noise = np.random.normal(0, dy)
y += noise
# Instantiate a Gaussian Process model
gp = GaussianProcessRegressor(kernel=kernel, alpha=dy ** 2, n_restarts_optimizer=1)
# Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, y)
# Make the prediction on the meshed x-axis (ask for MSE as well)
y_pred, sigma = gp.predict(x, return_std=True)
# Plot the function, the prediction and the 95% confidence interval based on
# the MSE
plt.figure()
plt.plot(x, f(x), "r:", label=r"$f(x) = x\,\sin(x)$")
plt.errorbar(X.ravel(), y, dy, fmt="r.", markersize=10, label="Observations")
plt.plot(x, y_pred, "b-", label="Prediction")
plt.fill(
np.concatenate([x, x[::-1]]),
np.concatenate([y_pred - 1.9600 * sigma, (y_pred + 1.9600 * sigma)[::-1]]),
alpha=0.5,
fc="b",
ec="None",
label="95% confidence interval",
)
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
plt.ylim(-10, 20)
plt.legend(loc="upper left")
plt.title('MSLL: ' + str(msll(f(x).ravel(), y_pred, sigma)))
plt.show()Describe your proposed solution
A PR to include mean_standardized_log_loss in sklearn.metrics.
Potential usage
y_pred, y_std = model.predict(x, return_std=True)
msll = mean_standardized_log_loss(y_true, y_pred, y_std)A rough version of mean_standardized_log_loss function:
def mean_standardized_log_loss(
y_true, y_pred, *, sample_weight=None, multioutput="uniform_average", squared=True
):
"""Mean standardized log loss.
Read more in the :ref:`User Guide <mean_standardized_log_loss>`.
Parameters
----------
y_true : array-like of shape (n_samples,) or (n_samples, n_outputs)
Ground truth (correct) target values.
y_pred : array-like of shape (n_samples,) or (n_samples, n_outputs)
Estimated target values.
y_std : array-like of shape (n_samples,) or (n_samples, n_outputs)
Estimated standard deviation in predictions.
sample_weight : array-like of shape (n_samples,), default=None
Sample weights.
multioutput : {'raw_values', 'uniform_average'} or array-like of shape \
(n_outputs,), default='uniform_average'
Defines aggregating of multiple output values.
Array-like value defines weights used to average errors.
'raw_values' :
Returns a full set of errors in case of multioutput input.
'uniform_average' :
Errors of all outputs are averaged with uniform weight.
Returns
-------
loss : float or ndarray of floats
A non-negative floating point value (the best value is 0.0), or an
array of floating point values, one for each individual target.
Examples
--------
>>> from sklearn.metrics import mean_standardized_log_loss
>>> y_true = [3, -0.5, 2, 7]
>>> y_pred = [2.5, 0.0, 2, 8]
>>> y_std = [0.1, 0, 0.05, 0.3]
>>> mean_standardized_log_loss(y_true, y_pred, y_std)
6.356
>>> y_true = [[0.5, 1],[-1, 1],[7, -6]]
>>> y_pred = [[0, 2],[-1, 2],[8, -5]]
>>> y_std = [[0.01, 0.02],[0.01,0.04],[0.03,0.04]]
>>> mean_standardized_log_loss(y_true, y_pred, y_std)
5.511
>>> mean_squared_error(y_true, y_pred, multioutput='raw_values')
array([5.00107605, 6.02159874])
>>> mean_squared_error(y_true, y_pred, multioutput=[0.3, 0.7])
2.858
"""
# y_type, y_true, y_pred, multioutput = _check_reg_targets(
# y_true, y_pred, multioutput
# )
# check_consistent_length(y_true, y_pred, sample_weight)
###########
# Checks like the above ones to be implemented.
###########
first_term = 0.5 * np.log(2 * np.pi * y_std**2)
second_term = ((y_true - y_pred)**2)/(2 * y_std**2)
output_errors = np.average(first_term + second_term, axis=0, weights=sample_weight)
if isinstance(multioutput, str):
if multioutput == "raw_values":
return output_errors
elif multioutput == "uniform_average":
# pass None as weights to np.average: uniform mean
multioutput = None
return np.average(output_errors, weights=multioutput)Describe alternatives you've considered, if relevant
No response
Additional context
A set of publications that have used MSLL as a metric
| Publication | Venue | Citations |
|---|---|---|
| Rasmussen, C. E. (2003, February). Gaussian processes in machine learning. In Summer school on machine learning (pp. 63-71). Springer, Berlin, Heidelberg. | 23541 | |
| Alvarez, M. A., & Lawrence, N. D. (2011). Computationally efficient convolved multiple output Gaussian processes. The Journal of Machine Learning Research, 12, 1459-1500. | JMLR '11 | 289 |
| Chalupka, K., Williams, C. K., & Murray, I. (2013). A framework for evaluating approximation methods for Gaussian process regression. Journal of Machine Learning Research, 14, 333-350. | JMLR '13 | 144 |
| Andrew Gordon Wilson, David A. Knowles, and Zoubin Ghahramani. 2012. Gaussian process regression networks. In Proceedings of the 29th International Conference on International Conference on Machine Learning (ICML'12). Omnipress, Madison, WI, USA, 1139–1146. | ICML '12 | 143 |
| Wilson, A. G., Gilboa, E., Cunningham, J. P., & Nehorai, A. (2014, December). Fast Kernel Learning for Multidimensional Pattern Extrapolation. In NIPS (pp. 3626-3634). | NIPS '14 | 141 |
| Chen, Z., & Wang, B. (2018). How priors of initial hyperparameters affect Gaussian process regression models. Neurocomputing, 275, 1702-1710. | Neurocomputing '18 | 44 |
| Nguyen, D. T., Filippone, M., & Michiardi, P. (2019, April). Exact gaussian process regression with distributed computations. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing (pp. 1286-1295). | 8 | |
| Lederer, A., Conejo, A. J. O., Maier, K. A., Xiao, W., Umlauft, J., & Hirche, S. (2021, July). Gaussian Process-Based Real-Time Learning for Safety Critical Applications. In International Conference on Machine Learning (pp. 6055-6064). PMLR. | ICML '21 | 2 |
| Lederer, A., Conejo, A. J. O., Maier, K. A., Xiao, W., Umlauft, J., & Hirche, S. (2021, July). Gaussian Process-Based Real-Time Learning for Safety Critical Applications. In International Conference on Machine Learning (pp. 6055-6064). PMLR. | ICML '21 | 2 |
| Gümbel, S., & Schmidt, T. (2020). Machine learning for multiple yield curve markets: fast calibration in the Gaussian affine framework. Risks, 8(2), 50. | 1 |
Acknowledgement
I would like to thank @wesselb for suggesting this metric for the first time to me.




