[MRG] Enhancement: Add MAPE as an evaluation metric #10711
[MRG] Enhancement: Add MAPE as an evaluation metric #10711mohamed-ali wants to merge 55 commits intoscikit-learn:masterfrom mohamed-ali:Add-MAPE-as-evaluation-metric
Conversation
jnothman
left a comment
There was a problem hiding this comment.
I think you should note the sensitivity to small y_true, and perhaps we need to validate that y_true is non-zero.
Thanks for the thoroughness!
doc/modules/model_evaluation.rst
Outdated
| **Regression** | ||
| 'explained_variance' :func:`metrics.explained_variance_score` | ||
| 'neg_mean_absolute_error' :func:`metrics.mean_absolute_error` | ||
| 'neg_mean_absolute_percentage_error' :func:`metrics.mean_absolute_percentage_error` |
There was a problem hiding this comment.
@amueller suggested neg_mape here. I'm not entirely sure which is better. But you'll need to widen the table if we go with this name!
There was a problem hiding this comment.
Yes, I am open to change the name or keep it as is. But I'll the necessary changes once we agree on a name ;)
doc/modules/model_evaluation.rst
Outdated
| >>> cross_val_score(model, X, y, scoring='wrong_choice') | ||
| Traceback (most recent call last): | ||
| ValueError: 'wrong_choice' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'brier_score_loss', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score'] | ||
| ValueError: 'wrong_choice' is not a valid scoring value. Valid options are ['accuracy', 'adjusted_mutual_info_score', 'adjusted_rand_score', 'average_precision', 'balanced_accuracy', 'brier_score_loss', 'completeness_score', 'explained_variance', 'f1', 'f1_macro', 'f1_micro', 'f1_samples', 'f1_weighted', 'fowlkes_mallows_score', 'homogeneity_score', 'mutual_info_score', 'neg_log_loss', 'neg_mean_absolute_error', 'neg_mean_absolute_percentage_error', 'neg_mean_squared_error', 'neg_mean_squared_log_error', 'neg_median_absolute_error', 'normalized_mutual_info_score', 'precision', 'precision_macro', 'precision_micro', 'precision_samples', 'precision_weighted', 'r2', 'recall', 'recall_macro', 'recall_micro', 'recall_samples', 'recall_weighted', 'roc_auc', 'v_measure_score'] |
sklearn/metrics/regression.py
Outdated
| if y_type == 'continuous-multioutput': | ||
| raise ValueError("Multioutput not supported " | ||
| "in mean_absolute_percentage_error") | ||
| return np.average(np.abs((y_true - y_pred)/y_true))*100 |
There was a problem hiding this comment.
Call mean rather than average just to match the name
|
This pull request introduces 2 alerts when merging 7c516ef into 2e30df3 - view on lgtm.com new alerts:
Comment posted by lgtm.com |
|
Yes, I think raise an error, but I don't know much about MAPE so I can't
say if there are alternative approaches (except the one on Wikipedia which
suggests getting differences relative to the mean of y_true).
"mean_absolute_percentage_error requires y_true to never be zero" would be
sufficient.
|
|
@amueller, @jnothman, @qinhanmin2014 The test scenarios in:
generate a this explains the failures in Travis.CI, could you please suggest what to do? |
|
Pep8 consistency is a very poor excuse for naming inconsistency. I'm unsure what to do with the name, but pep8 is the last of our concerns (stick |
|
You might need to add a new category in the common tests for regression metrics that require non-zero y, and update y in the tests accordingly. |
|
@jnothman thanks for the tip ( |
sklearn/metrics/tests/test_common.py
Outdated
| # matrix instead of a number. Testing of | ||
| # confusion_matrix with sample_weight is in | ||
| # test_classification.py | ||
| "confusion_matrix", # Left this one here because the tests in this file do |
There was a problem hiding this comment.
Please do not change unrelated things. It makes your contribution harder to review and may introduce merge conflicts to other pull requests.
There was a problem hiding this comment.
Ok, my bad. I used autopep8 on the file.
…hamed-ali/scikit-learn into Add-MAPE-as-evaluation-metric
|
@lesteve travis build finished successfully, so there is no regression with the new changes that you pushed. Thanks! |
There was a problem hiding this comment.
A few comments.
Just curious, maybe for @amueller who opened the associated issue, Wikipedia page does not seem very nice with MAPE, e.g.:
"Although the concept of MAPE sounds very simple and convincing, it has major drawbacks in practical application". Is is still used despite its drawbacks?
| **Regression** | ||
| 'explained_variance' :func:`metrics.explained_variance_score` | ||
| 'neg_mean_absolute_error' :func:`metrics.mean_absolute_error` | ||
| 'neg_mape' :func:`metrics.mean_absolute_percentage_error` |
There was a problem hiding this comment.
Why not spell it out fully here since like all the other metrics? i.e. neg_mean_absolute_percentage_error
There was a problem hiding this comment.
@lesteve I clarified in the PR description above that the name has to be chosen/voted by all of us. Initially I used neg_mean_absolute_percentage_error but then, since mape is already a famous acronym which, also, makes the metric cleaner, I chose to switch to neg_mape. However, we can change back to the long version, If most of us think that's the right thing to do.
There was a problem hiding this comment.
I would be in favour of neg_mean_absolute_error version personally. It is more consistent with neg_mean_absolute_error and more consistent with the metric name ( metrics.mean_absolute_percentage_error). Happy to hear what others think.
There was a problem hiding this comment.
I would also be in favor using the explicit expanded name by default and introduce neg_mape as an alias as we do for neg_mse.
There was a problem hiding this comment.
Actually we do not have neg_mse. I thought we had.
doc/whats_new/v0.20.rst
Outdated
| - Added the :func:`metrics.balanced_accuracy_score` metric and a corresponding | ||
| ``'balanced_accuracy'`` scorer for binary classification. | ||
| :issue:`8066` by :user:`xyguo` and :user:`Aman Dalmia <dalmia>`. | ||
| - Added the :func:`metrics.mean_absolute_percentage_error` metric and the associated |
There was a problem hiding this comment.
"Model evaluation" is not the best section for this I would say, in doc/whats_new/v0.19.0.rst there is a "Metrics" section. I think you can do the same here.
There was a problem hiding this comment.
Nice catch, I will do that.
sklearn/metrics/scorer.py
Outdated
| mean_absolute_error_scorer._deprecation_msg = deprecation_msg | ||
| neg_mape_scorer = make_scorer(mean_absolute_percentage_error, | ||
| greater_is_better=False) | ||
|
|
There was a problem hiding this comment.
Maybe remove the new line here. When there is no clear rule, my advice would be to follow the same implicit convention as the code you are changing.
|
@amueller can you approve if all the changes that you requested have been addressed? Thanks! |
|
'scorer' isn't part of the name, firstly.
Secondly, I think the conventional name of the metric is stupid when I
think it's better described as "mean relative error". The only "absolute"
thing about it is that it's not squared, as far as I can tell. (And if this
measure is known in the literature as "mean relative error" as well, I'd be
tempted to use that name.
Given the weirdness of its name, I'm tempted to say that the name "MAPE"
has a special meaning beyond the words that constitute it, and hence we
should consider the short name despite the inconsistency
But I don't like the inconsistency either.
I also don't think this is something to agonize over.
|
|
I don't see any reason this shouldn't be considered for merge, @mohamed-ali. I think the docs should emphasise a little more that the metric is not reported as a percentage here. |
|
would you mind fixing the conflicts please? |
Yes sure, I'll spend some time in the next two days to fix them. Thanks for letting me know. |
|
thanks and sorry about the delay in reviewing |
|
hm looks like there's a bunch of errors now :-/ |
Yes, many things changed since this PR was created. I'll spend more time debugging. If not, I might recreate the PR with a new branch from master. |
|
Closing in favor of #15007. |
Reference Issues/PRs
Fixes #10708
Closes #6605
What does this implement/fix? Explain your changes.
sklearn.metrics.mean_absolute_percentage_errorneg_mapescorer for regression problems.tests/test_common.pyandtests/test_score_objects.pyy_truethat doesn't include zeros intests/test_common.pyandtests/test_score_objects.py.doc/modules/model_evaluation.rstanddoc/modules/classes.rstAny other comments?
neg_mean_absolute_percentage_error_scorerorneg_mape_scorer