Regression metrics - which strategy ?

I recently came across #12895 (with PR #13467) and the older #6457, this woke up an old topic that I would like to share.

In our team, we had the need to provide model performance metrics, for regression models. This is a slightly different goal than using metrics for grid-search or model selection. Indeed the metric is not only used to "select the best model" but to provide users with feedback about "how good a model is".

For regression models I introduced three categories of metrics that happened to be quite intuitive:

 * Absolute performance (L2 RMSE, L1 MAE): these metrics can all be interpreted as an "average prediction error" ("average" in the broad sense here) expressed in the unit of the prediction target (e.g. "average error of 12kWh")

 * Relative performance (L2 CVRMSE, L1 CVMAE, and per-point relative metrics such as MAPE or MARE, MARES, MAREL...): these metrics can all be interpreted as an "average relative prediction error" expressed as a percentage of the target (e.g. "average error of 10%").

 * Comparison to a dummy model (L2 RRSE, L1 RAE): these metrics can all be interpreted as a ratio between the performance of the model at hand, and the performance of a dummy, constant model (predicting always the average). These need to be inverted to be intuitive e.g. "20% -> 5 times better than a dummy model"

Of course these categories are "applicative". They all make sense from a user point of view, however as far as model selection is concerned, only two make sense (MAE and RMSE). Not even R² because R²=1-RRSE² so it is not a performance metric but a comparison to dummy metric (but I dont want to open the debate here so please refrain from objecting on that one :) ).

Anyway my question for the core `sklearn` team is: shall I propose a pull request with all these metrics ? I'm ready to shoot since we've done it in our private repo, aligned with sklearn `regression.py` file. So it is rather a matter of deciding if this is a good idea. And if so, introducing categories might be needed, to help users better understand.

An alternative might be to create a small independent projet containing all the metrics, leaving only the `mean_absolute_error` (L1) and `mean_squared_error` (L2) in sklearn.

Any thoughts on this ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regression metrics - which strategy ? #13482

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Regression metrics - which strategy ? #13482

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions