Mean Standardized Log Loss (MSLL) for uncertainty aware regression models

### Describe the workflow you want to enable

#### Why MSLL?
Traditional metrics such as [`mean_squared_error`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.mean_squared_error.html) or [`r2_score`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.r2_score.html#sklearn.metrics.r2_score) may not be able to correctly evaluate uncertainty aware models because they do not take predictive standard deviation (`y_std`) into account.

#### Why sklearn?
To the best of my knowledge, I am unaware of any other library in Python that is widely used and has a well-organized, standardized set of metrics implemented for ML regression models. 

#### Mean Standardized Log Loss (MSLL)
![](https://i2.paste.pics/81bcccd9205f35cc3745506dba0a7043.png)

If the above equation (standardized log loss) is averaged over all the values in `y_pred` and `y_std`, the resultant metric is Mean Standardized Log Loss (MSLL). The above equation is derived from [Eq. 2.34, pp. 23 in GPML book](http://www.gaussianprocess.org/gpml/chapters/RW.pdf) - Cited by 23541.

#### Properties
* MSLL is useful to evaluate probabilistic models (or any models that can output both `y_pred` and `y_std`, like [GaussianProcessRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html)). 
* Lower the MSLL, better the model.

#### Which algorithms from sklearn can use this metric?
* [sklearn.gaussian_process.GaussianProcessRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html)
* [sklearn.linear_model.BayesianRidge](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.BayesianRidge.html)
* [sklearn.linear_model.ARDRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ARDRegression.html)
* ...
* Maybe more

#### A demo
* An overfitted model (high MSLL)
![image](https://user-images.githubusercontent.com/59758528/141679330-5d391d8e-110d-43af-9e06-fbcfae171fa3.png)

* An underfitted model (very high MSLL)
![image](https://user-images.githubusercontent.com/59758528/141679346-b1449a4a-9abb-4300-ab79-a3142a33347f.png)

* A well-fitted model (low MSLL)
![image](https://user-images.githubusercontent.com/59758528/141679365-4ebdb2d6-c450-499b-9827-03f424cdb5f6.png)

* A comparison between RMSE and MSLL

MSLL is high if the prediction intervals are not well calibrated, on the other side, RMSE can not take this criterion into account.

![image](https://user-images.githubusercontent.com/59758528/141680846-bdf86a93-9b28-417f-9b64-7b592d15c2b2.png)

```python
# Original Author: Vincent Dubourg <vincent.dubourg@gmail.com>
#         Jake Vanderplas <vanderplas@astro.washington.edu>
#         Jan Hendrik Metzen <jhm@informatik.uni-bremen.de>s
# License: BSD 3 clause
#
# Modified by: Zeel B Patel <patel_zeel@iitgn.ac.in>

import numpy as np
from matplotlib import pyplot as plt

from sklearn.gaussian_process import GaussianProcessRegressor
from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C

np.random.seed(1)

def f(x):
    """The function to predict."""
    return x * np.sin(x)

def msll(y_true, y_pred, y_std):
    first_term = 0.5 * np.log(2 * np.pi * y_std**2)
    second_term = ((y_true - y_pred)**2)/(2 * y_std**2)
    
    return np.mean(first_term + second_term)

X = np.linspace(0.1, 9.9, 20)
X = np.atleast_2d(X).T

# Observations and noise
y = f(X).ravel()
dy = 0.5 + 1.0 * np.random.random(y.shape)
noise = np.random.normal(0, dy)
y += noise

# Instantiate a Gaussian Process model
gp = GaussianProcessRegressor(kernel=kernel, alpha=dy ** 2, n_restarts_optimizer=1)

# Fit to data using Maximum Likelihood Estimation of the parameters
gp.fit(X, y)

# Make the prediction on the meshed x-axis (ask for MSE as well)
y_pred, sigma = gp.predict(x, return_std=True)

# Plot the function, the prediction and the 95% confidence interval based on
# the MSE
plt.figure()
plt.plot(x, f(x), "r:", label=r"$f(x) = x\,\sin(x)$")
plt.errorbar(X.ravel(), y, dy, fmt="r.", markersize=10, label="Observations")
plt.plot(x, y_pred, "b-", label="Prediction")
plt.fill(
    np.concatenate([x, x[::-1]]),
    np.concatenate([y_pred - 1.9600 * sigma, (y_pred + 1.9600 * sigma)[::-1]]),
    alpha=0.5,
    fc="b",
    ec="None",
    label="95% confidence interval",
)
plt.xlabel("$x$")
plt.ylabel("$f(x)$")
plt.ylim(-10, 20)
plt.legend(loc="upper left")
plt.title('MSLL: ' + str(msll(f(x).ravel(), y_pred, sigma)))

plt.show()
```

### Describe your proposed solution

A PR to include `mean_standardized_log_loss` in `sklearn.metrics`.

#### Potential usage
```python
y_pred, y_std = model.predict(x, return_std=True)
msll = mean_standardized_log_loss(y_true, y_pred, y_std)
```

#### A rough version of `mean_standardized_log_loss` function:

```python
def mean_standardized_log_loss(
    y_true, y_pred, *, sample_weight=None, multioutput="uniform_average", squared=True
):
    """Mean standardized log loss.
    Read more in the :ref:`User Guide <mean_standardized_log_loss>`.
    Parameters
    ----------
    y_true : array-like of shape (n_samples,) or (n_samples, n_outputs)
        Ground truth (correct) target values.
    y_pred : array-like of shape (n_samples,) or (n_samples, n_outputs)
        Estimated target values.
    y_std : array-like of shape (n_samples,) or (n_samples, n_outputs)
        Estimated standard deviation in predictions.
    sample_weight : array-like of shape (n_samples,), default=None
        Sample weights.
    multioutput : {'raw_values', 'uniform_average'} or array-like of shape \
            (n_outputs,), default='uniform_average'
        Defines aggregating of multiple output values.
        Array-like value defines weights used to average errors.
        'raw_values' :
            Returns a full set of errors in case of multioutput input.
        'uniform_average' :
            Errors of all outputs are averaged with uniform weight.

    Returns
    -------
    loss : float or ndarray of floats
        A non-negative floating point value (the best value is 0.0), or an
        array of floating point values, one for each individual target.
    Examples
    --------
    >>> from sklearn.metrics import mean_standardized_log_loss
    >>> y_true = [3, -0.5, 2, 7]
    >>> y_pred = [2.5, 0.0, 2, 8]
    >>> y_std = [0.1, 0, 0.05, 0.3]
    >>> mean_standardized_log_loss(y_true, y_pred, y_std)
    6.356
    >>> y_true = [[0.5, 1],[-1, 1],[7, -6]]
    >>> y_pred = [[0, 2],[-1, 2],[8, -5]]
    >>> y_std = [[0.01, 0.02],[0.01,0.04],[0.03,0.04]]
    >>> mean_standardized_log_loss(y_true, y_pred, y_std)
    5.511
    >>> mean_squared_error(y_true, y_pred, multioutput='raw_values')
    array([5.00107605, 6.02159874])
    >>> mean_squared_error(y_true, y_pred, multioutput=[0.3, 0.7])
    2.858
    """
    # y_type, y_true, y_pred, multioutput = _check_reg_targets(
    #   y_true, y_pred, multioutput
    # )
    # check_consistent_length(y_true, y_pred, sample_weight)
    
    ###########
    # Checks like the above ones to be implemented.
    ###########
    
    first_term = 0.5 * np.log(2 * np.pi * y_std**2)
    second_term = ((y_true - y_pred)**2)/(2 * y_std**2)
    
    output_errors = np.average(first_term + second_term, axis=0, weights=sample_weight)

    if isinstance(multioutput, str):
        if multioutput == "raw_values":
            return output_errors
        elif multioutput == "uniform_average":
            # pass None as weights to np.average: uniform mean
            multioutput = None

    return np.average(output_errors, weights=multioutput)
```

### Describe alternatives you've considered, if relevant

_No response_

### Additional context

#### A set of publications that have used MSLL as a metric

| Publication | Venue | Citations |
| :--               | :- | --: | 
| Rasmussen, C. E. (2003, February). Gaussian processes in machine learning. In Summer school on machine learning (pp. 63-71). Springer, Berlin, Heidelberg. | | 23541 | 
| Alvarez, M. A., & Lawrence, N. D. (2011). Computationally efficient convolved multiple output Gaussian processes. The Journal of Machine Learning Research, 12, 1459-1500. | JMLR '11 | 289 |
|Chalupka, K., Williams, C. K., & Murray, I. (2013). A framework for evaluating approximation methods for Gaussian process regression. Journal of Machine Learning Research, 14, 333-350.| JMLR '13 | 144 |
| Andrew Gordon Wilson, David A. Knowles, and Zoubin Ghahramani. 2012. Gaussian process regression networks. In Proceedings of the 29th International Conference on International Conference on Machine Learning (ICML'12). Omnipress, Madison, WI, USA, 1139–1146.| ICML '12 |143 |
| Wilson, A. G., Gilboa, E., Cunningham, J. P., & Nehorai, A. (2014, December). Fast Kernel Learning for Multidimensional Pattern Extrapolation. In NIPS (pp. 3626-3634). | NIPS '14 | 141 |
| Chen, Z., & Wang, B. (2018). How priors of initial hyperparameters affect Gaussian process regression models. Neurocomputing, 275, 1702-1710. | Neurocomputing '18 | 44 |
| Nguyen, D. T., Filippone, M., & Michiardi, P. (2019, April). Exact gaussian process regression with distributed computations. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing (pp. 1286-1295). |  | 8 |
|Lederer, A., Conejo, A. J. O., Maier, K. A., Xiao, W., Umlauft, J., & Hirche, S. (2021, July). Gaussian Process-Based Real-Time Learning for Safety Critical Applications. In International Conference on Machine Learning (pp. 6055-6064). PMLR. | ICML '21 | 2 |
| Lederer, A., Conejo, A. J. O., Maier, K. A., Xiao, W., Umlauft, J., & Hirche, S. (2021, July). Gaussian Process-Based Real-Time Learning for Safety Critical Applications. In International Conference on Machine Learning (pp. 6055-6064). PMLR. | ICML '21 |2 |
| Gümbel, S., & Schmidt, T. (2020). Machine learning for multiple yield curve markets: fast calibration in the Gaussian affine framework. Risks, 8(2), 50. | | 1 |

#### Acknowledgement

I would like to thank @wesselb for suggesting this metric for the first time to me.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Mean Standardized Log Loss (MSLL) for uncertainty aware regression models #21665

Describe the workflow you want to enable

Why MSLL?

Why sklearn?

Mean Standardized Log Loss (MSLL)

Properties

Which algorithms from sklearn can use this metric?

A demo

Describe your proposed solution

Potential usage

A rough version of `mean_standardized_log_loss` function:

Describe alternatives you've considered, if relevant

Additional context

A set of publications that have used MSLL as a metric

Acknowledgement

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Publication	Venue	Citations
Rasmussen, C. E. (2003, February). Gaussian processes in machine learning. In Summer school on machine learning (pp. 63-71). Springer, Berlin, Heidelberg.		23541
Alvarez, M. A., & Lawrence, N. D. (2011). Computationally efficient convolved multiple output Gaussian processes. The Journal of Machine Learning Research, 12, 1459-1500.	JMLR '11	289
Chalupka, K., Williams, C. K., & Murray, I. (2013). A framework for evaluating approximation methods for Gaussian process regression. Journal of Machine Learning Research, 14, 333-350.	JMLR '13	144
Andrew Gordon Wilson, David A. Knowles, and Zoubin Ghahramani. 2012. Gaussian process regression networks. In Proceedings of the 29th International Conference on International Conference on Machine Learning (ICML'12). Omnipress, Madison, WI, USA, 1139–1146.	ICML '12	143
Wilson, A. G., Gilboa, E., Cunningham, J. P., & Nehorai, A. (2014, December). Fast Kernel Learning for Multidimensional Pattern Extrapolation. In NIPS (pp. 3626-3634).	NIPS '14	141
Chen, Z., & Wang, B. (2018). How priors of initial hyperparameters affect Gaussian process regression models. Neurocomputing, 275, 1702-1710.	Neurocomputing '18	44
Nguyen, D. T., Filippone, M., & Michiardi, P. (2019, April). Exact gaussian process regression with distributed computations. In Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing (pp. 1286-1295).		8
Lederer, A., Conejo, A. J. O., Maier, K. A., Xiao, W., Umlauft, J., & Hirche, S. (2021, July). Gaussian Process-Based Real-Time Learning for Safety Critical Applications. In International Conference on Machine Learning (pp. 6055-6064). PMLR.	ICML '21	2
Lederer, A., Conejo, A. J. O., Maier, K. A., Xiao, W., Umlauft, J., & Hirche, S. (2021, July). Gaussian Process-Based Real-Time Learning for Safety Critical Applications. In International Conference on Machine Learning (pp. 6055-6064). PMLR.	ICML '21	2
Gümbel, S., & Schmidt, T. (2020). Machine learning for multiple yield curve markets: fast calibration in the Gaussian affine framework. Risks, 8(2), 50.		1

Uh oh!

Mean Standardized Log Loss (MSLL) for uncertainty aware regression models #21665

Description

Describe the workflow you want to enable

Why MSLL?

Why sklearn?

Mean Standardized Log Loss (MSLL)

Properties

Which algorithms from sklearn can use this metric?

A demo

Describe your proposed solution

Potential usage

A rough version of mean_standardized_log_loss function:

Describe alternatives you've considered, if relevant

Additional context

A set of publications that have used MSLL as a metric

Acknowledgement

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

A rough version of `mean_standardized_log_loss` function: