Skip to content

Regression metrics - which strategy ? #13482

@smarie

Description

@smarie

I recently came across #12895 (with PR #13467) and the older #6457, this woke up an old topic that I would like to share.

In our team, we had the need to provide model performance metrics, for regression models. This is a slightly different goal than using metrics for grid-search or model selection. Indeed the metric is not only used to "select the best model" but to provide users with feedback about "how good a model is".

For regression models I introduced three categories of metrics that happened to be quite intuitive:

  • Absolute performance (L2 RMSE, L1 MAE): these metrics can all be interpreted as an "average prediction error" ("average" in the broad sense here) expressed in the unit of the prediction target (e.g. "average error of 12kWh")

  • Relative performance (L2 CVRMSE, L1 CVMAE, and per-point relative metrics such as MAPE or MARE, MARES, MAREL...): these metrics can all be interpreted as an "average relative prediction error" expressed as a percentage of the target (e.g. "average error of 10%").

  • Comparison to a dummy model (L2 RRSE, L1 RAE): these metrics can all be interpreted as a ratio between the performance of the model at hand, and the performance of a dummy, constant model (predicting always the average). These need to be inverted to be intuitive e.g. "20% -> 5 times better than a dummy model"

Of course these categories are "applicative". They all make sense from a user point of view, however as far as model selection is concerned, only two make sense (MAE and RMSE). Not even R² because R²=1-RRSE² so it is not a performance metric but a comparison to dummy metric (but I dont want to open the debate here so please refrain from objecting on that one :) ).

Anyway my question for the core sklearn team is: shall I propose a pull request with all these metrics ? I'm ready to shoot since we've done it in our private repo, aligned with sklearn regression.py file. So it is rather a matter of deciding if this is a good idea. And if so, introducing categories might be needed, to help users better understand.

An alternative might be to create a small independent projet containing all the metrics, leaving only the mean_absolute_error (L1) and mean_squared_error (L2) in sklearn.

Any thoughts on this ?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions