Skip to content

New forecasting metrics#801

Merged
mloning merged 72 commits intosktime:mainfrom
RNKuhns:new-forecasting-metrics
Apr 30, 2021
Merged

New forecasting metrics#801
mloning merged 72 commits intosktime:mainfrom
RNKuhns:new-forecasting-metrics

Conversation

@RNKuhns
Copy link
Copy Markdown
Contributor

@RNKuhns RNKuhns commented Apr 12, 2021

Reference Issues/PRs

Fixes #671. See discussion on closed PR #672 that this pull request will complete.

What does this implement/fix? Explain your changes.

Adds additional forecasting metrics as discussed in #671.

This is a draft pull request. I need to finish the docstrings (I'm adding examples to docstrings) and also write unit tests but wanted to give you a heads up of the functionality.

The pull request adds new forecasting performance metrics (see list below) that can each work with multivariate series (but default to work with univariate forecasts). Also changes naming conventions to align with scikit-learn (e.g. mean_absoluate_error rather than mae or mae_loss).

The end result would be the following performance metric functions (each with a corresponding class version for scoring):

relative_error,
mean_asymmetric_error,
mean_absolute_scaled_error,
median_absolute_scaled_error,
mean_squared_scaled_error,
median_squared_scaled_error,
mean_absolute_error,
mean_squared_error,
median_absolute_error,
median_squared_error,
mean_absolute_percentage_error,
median_absolute_percentage_error,
mean_squared_percentage_error,
median_squared_percentage_error,
mean_relative_absolute_error,
median_relative_absolute_error,
geometric_mean_relative_absolute_error,
geometric_mean_relative_squared_error

Does your contribution introduce a new dependency? If yes, which one?

None. Pinning Scikit-learn dependency to >= 0.24.* was done separately.

What should a reviewer concentrate their feedback on?

Focus on changes in ./performance_metrics/forecasting/_functions.py and ./performance_metrics/forecasting/_classes.py.

Finishing fixes elsewhere in module where old performance metric names (e.g. sMAPE) are imported.

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • I've added unit tests and made sure they pass locally.
For new estimators
  • I've added the estimator to the online documentation.
  • I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

RNKuhns and others added 30 commits December 26, 2020 11:30
Other minor tweaks to forecasting functions
that were identified as part of unit testign.
Fixed __init__ in all classes.
Verification applies to relative error functions.
Since naming convention of forecasting performance
metrics was changed, the old references to
things like sMAPE and smape_loss had to be
updated.
Added examples to docstrings.

Fixed squared percentage error functions to
match formula in sources. Also fixed
geometric_mean_relative_squared_error and
geometric_mean_relative_absolute_error
Fixed merge after conversion of master branch
to main branch
@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Apr 22, 2021

Having classes and functions comes from scikit-learn I think, where all performance metrics are principally functions, but you wrap them in a scorer when passing them to higher-level functionality like model evaluation or tuning.

I do agree with that, @mloning, but I feel they are not homogenously exposed - which contributes to interface inhomogenity.

Having said that, MetricFunctionWrapper should be private and only used internally for constructing classes.

That's indeed an instance of what I mean above.
Template metrics can be public, but factories should probably be private. There should also be public classes corresponding to concrete metrics - for every function as well, no?

In terms of software design, even if we only expose classes, it still makes sense to encapsulate performance metrics in functions I think, and then wrap them in classes to attach additional information like greater_is_better. This seems to be the common pattern in other libraries like pytorch too.

Also agreed, but properties shouldn't be constructor parameters (it's not something you can choose/set, but an intrinsic property - a static class variable).

@mloning
Copy link
Copy Markdown
Contributor

mloning commented Apr 22, 2021

I do agree with that, @mloning, but I feel they are not homogenously exposed - which contributes to interface inhomogenity.

Okay, what changes to do you suggest to make the interface for classes and functions more consistent? Should we do them in this PR or a follow-up PR?

There should also be public classes corresponding to concrete metrics - for every function as well, no?

In principle, yes. Not sure if there is a class for every function already @RNKuhns?

Also agreed, but properties shouldn't be constructor parameters (it's not something you can choose/set, but an intrinsic property - a static class variable).

I see, I suggest to make the factory methods private for now and change that in a future PR.

Copy link
Copy Markdown
Contributor

@mloning mloning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One last change from my side!

mloning
mloning previously approved these changes Apr 24, 2021
Copy link
Copy Markdown
Contributor

@mloning mloning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to be merged to me!

@fkiraly let me know if you want to take another look before we merge.

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Apr 26, 2021

Nice!

I have a few comments:

  • the non-optional call signature of the functions is inconsistent and violates the "uniformity" pattern - some functions require y_pred_benchmark. I would recommend to let all functions to accept this as an argument (e.g., by using a kwargs catch argument), and let the metrics that don't need it simply ignore it when passed (instead of producing an error).
  • I would strongly suggest to loop multioutput and sp parameters through to the class constructors - otherwise you can't use classes with multi-output metrics; perhaps even horizon_weight (though I'm not sure about that one)
  • I would suggest that classes inherit from sklearn BaseEstimator, otherwise you can't tune parameters in metrics, e.g., when using regularization or penalty parameters in the metric to tune for the same, unregularized/unpenalized metric as a generalization criterion

Minor issues:

  • strange capitalization of mixin (should be Mixin not MixIn)
  • sp appears in docstrings but not in constructor

@RNKuhns
Copy link
Copy Markdown
Contributor Author

RNKuhns commented Apr 26, 2021

@fkiraly thanks for the feedback. I've made most of the changes you've discussed.

  • Good catch on sp in docstrings and not constructor. I've got that updated in the 4 scaled metrics that use it.
  • I'm updating the mixin capitalization now (I was thinking 2 words like mix in are MixIn per CapWords Python class naming conventions, but I see that Scikit-learn uses Mixin).
  • _MetricFunctionWrapper is already setup to inherit from BaseEstimator. The other metrics inherit from _MetricFunctionWrapper, so they should all have its attributes/methods built-in.

I haven't made the update with y_pred_benchmark because Markus and I are planning to discuss how to deal with y_train and y_pred_benchmark as a follow-on PR (see #712, I believe we noted y_pred_benchmark in the discussion even though the title just mentions y_train).

We could also include the potential inclusion of multioutput and horizon_weight in the PR related to #712; however, I'm open to adding them to each functions constructor since they are parameters of all the underlying functions. It would be really quick to do that (but for the time being we aren't actually using that functionality so that could also be a future PR to add it). @fkiraly and @mloning let me know what you want me to do on those.

Copy link
Copy Markdown
Contributor

@mloning mloning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to merge this now and add additional functionality in separate PRs.

I'm not sure if all functions should accept y_train and y_pred_benchmark, but let's discuss this in a separate issue/PR.

@mloning
Copy link
Copy Markdown
Contributor

mloning commented Apr 29, 2021

@fkiraly any more comments? Otherwise I'll merge this tomorrow :)

@mloning
Copy link
Copy Markdown
Contributor

mloning commented Apr 30, 2021

Now merged @RNKuhns - thanks again 🎉 👍

@jerronl
Copy link
Copy Markdown

jerronl commented Apr 30, 2021

from sktime.performance_metrics.forecasting import smape_loss

ImportError: cannot import name 'smape_loss' from 'sktime.performance_metrics.forecasting' (/usr/local/lib/python3.6/dist-packages/sktime/sktime/performance_metrics/forecasting/init.py)

@mloning
Copy link
Copy Markdown
Contributor

mloning commented Apr 30, 2021

@jerronl smape_loss was renamed to mean_absolute_percentage_error

@mloning
Copy link
Copy Markdown
Contributor

mloning commented Apr 30, 2021

Perhaps we should add deprecation warnings for the old loss functions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add Additional Forecasting Evaluation Functions/Classes

4 participants