Update metric to handle y_train by RNKuhns · Pull Request #858 · sktime/sktime

RNKuhns · 2021-05-04T23:22:20Z

Reference Issues/PRs

This addresses functionality in #712.

What does this implement/fix? Explain your changes.

Add functionality to accept y_train in __call__ method and pass to underlying function if it requires y_train.

Also updated underlying metric classes to inherit from BaseMetric.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Any other comments?

PR checklist

For all contributions

I've added myself to the list of contributors.
Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
I've added unit tests and made sure they pass locally.

For new estimators

I've added the estimator to the online documentation.
I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

RNKuhns · 2021-05-07T10:54:34Z

@mloning and @fkiraly the more I'm looking at this while looking to simplify the unit tests for these metrics, it makes more and more sense to me that the metric functions in _functions.py should also get updated to accept y_train via 1**kwargs` so we have a uniform interface.

Difference to the user passing info to functions will be relatively minimal.

Benefit is that y_train would be passed same way to function as the __call__ of the class version of the metric:

mean_absolute_scaled_error(y_true, y_pred, y_train=y_train, ... # Any other keyword args like multioutput)
mase = MeanAbsoluteScaledError()
mase(y_true, y_pred, y_train=y_train)

versus the current function setup accepting y_train as arg and class version of metric accepting it as keyword:

mean_absolute_scaled_error(y_true, y_pred, y_train, ... # Any other keyword args like multioutput)
mase = MeanAbsoluteScaledError()
mase(y_true, y_pred, y_train=y_train)

Note that I'll do whatever we decide for y_train for metrics that accept y_pred_benchmark.

I'll I've got this coded up but wanted to get your feedback on the approach before committing and pushing to Github.

metrics that require y_pred_benchmark

fkiraly · 2021-05-11T21:00:41Z

@mloning and @fkiraly the more I'm looking at this while looking to simplify the unit tests for these metrics, it makes more and more sense to me that the metric functions in _functions.py should also get updated to accept y_train via 1**kwargs` so we have a uniform interface.

I've got this coded up but wanted to get your feedback on the approach before committing and pushing to Github.

Yes, very much agreed! I think this indeed makes unit tests easier, and any other exercise that requires a unified interface on the function level.

RNKuhns · 2021-05-12T00:28:17Z

@fkiraly that sounds great. I think this should basically be ready for review. But I see the PR is failing the manylinux build. @mloning the details look like there is an issue with numba versioning in the linux builds, but not entirely sure what is going on there.

mloning

Thanks @RNKuhns - the manylinux CI will hopefully be fixed in #870.

I had a first look at the code and left a few minor comments below.

sktime/performance_metrics/forecasting/_classes.py

sktime/performance_metrics/forecasting/_functions.py

sktime/performance_metrics/tests/test_performance_metrics_forecasting.py

sktime/utils/validation/forecasting.py

…time into metric-handle-y_train

mloning

@RNKuhns I had look at the changes, I think we're almost there, just a few more comments.

sktime/performance_metrics/base/_base.py

sktime/performance_metrics/forecasting/_functions.py

sktime/utils/validation/forecasting.py

RNKuhns · 2021-05-19T01:21:47Z

@mloning I think I've incorporated all your comments. As a bonus I've tweaked all the docstrings to better align with NumPy conventions and pass the pydocstyle checks.

I also opened an issue related to the creation of a BaseObject per our discussion above. I can start tackling that next.

sktime/utils/validation/forecasting.py

mloning

Looks all good to me - I'll merge in the next few days in case anyone else wants to take a look!

…time into metric-handle-y_train

mloning

Hi @RNKuhns, we should also update evaluate in sktime.forecasting.model_evaluation to pass y_train as a keyword argument to the scoring: https://github.com/alan-turing-institute/sktime/blob/c611e6a3587d7b3a44cb7deefd4b6baa4897fc9b/sktime/forecasting/model_evaluation/_functions.py#L106

If we do that here, we should also add a metric that requires y_train to the current test cases here (perhaps add MASE and remove MSE which doesn't add much as an additional test case): https://github.com/alan-turing-institute/sktime/blob/c611e6a3587d7b3a44cb7deefd4b6baa4897fc9b/sktime/forecasting/model_evaluation/tests/test_evaluate.py#L80

RNKuhns · 2021-05-22T00:50:15Z

@mloning I'm working on update to evaluate.

Ran into two minor hitches. First is that in the call to scoring within evaluate() the y_true and y_pred arguments were flipped (doesn't really matter for most metrics, but I fixed it).

The bigger hitch is that the test cases include in-sample predictions. In MeanAbsoluteScaledError() we have a check to make sure that y_train is before y_true. This makes sense in the context of forecast evaluation in general (and MASE's definition). But some of the test cases are looking at in-sample predictions and in those cases the check is failed in those cases; hence, the tests fail.

I can think of two options for approaching this:

We have separate test function for MASE that just tests the common configs with only the out-of-sample forecasting horizons or just pass on the tests with MASE when the forecasting horizon is negative
We remove the check in mean_absolute_scaled_error to require y_train prior to y_true

I am in the camp that MASE is designed to be applied when evaluating out-of-sample forecasts so we just include a line of code to exclude tests of evaluate for MASE when the forecasting horizon is negative (in-sample). That still ensures the MASE functionality is working in a way that makes sense for users (the whole evaluating in-sample forecasts is not a good gauge of the model's future predictive ability thing). But what do you think?

I could also figure out how to add a quick check in evaluate to raise an informative error if the user tries to use a metric that requires y_train with a CV object with a negative forecasting horizon (I'll also add note in docstring).

Test tune was creating scorer from scikit-learn mean_squared_error. Using sktime metric class now. Also updated documentation of make_forecasting_scorer to make input function signature clear.

mloning · 2021-05-24T19:19:35Z

@RNKuhns I agree, but perhaps the check that y_train is prior to y_true is a bit over eager and we make everything a bit simpler by not enforcing that. In principle MASE still works for in-sample predictions, so perhaps we should exclude that option. This moves the responsibility to users for making sure they're evaluating models on genuine forecasts which I think is fine. But happy to follow your lead here @RNKuhns.

fkiraly · 2021-05-25T20:30:01Z

@RNKuhns, I'm with @mloning on this one - in my opinion, the metric should just model the "bare" mathematical/scientific object, not make any checks regarding a plausible use case. This is because user expectations - they will expect the code object to behave as the scientific one, and because of the domain driven design principle to follow this mapping.

A secondary point - in my opinion not the main argument here, but it's to the same effect - is that it can make sense for a metric in-principle to have training and test time points overlapping, for example when you are trying to estimate over-optimism of in-sample estimates. We shouldn't exclude a rare sensible use case.

sktime/forecasting/model_evaluation/tests/test_evaluate.py

mloning

Thanks @RNKuhns - looks all good to me now! Will merge in the next few days in case anyone else wants to take a look.

mloning · 2021-06-03T11:54:31Z

Now merged - great work @RNKuhns 🎉

RNKuhns added 2 commits May 4, 2021 19:08

Updated forecasting metrics to handle y_train

9a7a5b6

Merge remote-tracking branch 'upstream/main' into main

345df52

RNKuhns mentioned this pull request May 4, 2021

Uniform handling of y_train in forecasting performance metrics #712

Closed

RNKuhns added 3 commits May 7, 2021 14:58

Unified interface for metric functions and classes

6d9cccb

Raise NotImplementedError in check_scoring for

fe1fb7d

metrics that require y_pred_benchmark

Switched order of check_scoring checks

140c0bb

RNKuhns marked this pull request as ready for review May 12, 2021 00:29

RNKuhns requested review from aiwalter and mloning as code owners May 12, 2021 00:29

mloning reviewed May 14, 2021

View reviewed changes

RNKuhns and others added 5 commits May 16, 2021 21:39

Updated check_scoring handling of y_pred_benchmark

585def2

Changed reference of loss func or class to metric

9a66c13

Tweaked tags and improved error handling

a9eff74

Merge branch 'alan-turing-institute:main' into metric-handle-y_train

734f8a9

Merge branch 'metric-handle-y_train' of https://github.com/RNKuhns/sk…

30d163c

…time into metric-handle-y_train

mloning reviewed May 17, 2021

View reviewed changes

sktime/performance_metrics/base/_base.py Show resolved Hide resolved

sktime/performance_metrics/forecasting/_functions.py Outdated Show resolved Hide resolved

sktime/utils/validation/forecasting.py Outdated Show resolved Hide resolved

RNKuhns added 3 commits May 17, 2021 22:01

Added comment to BaseMetric about _all_tags

d8ac7b3

Tweaked _get_kwarg and docstrings

8e35bd4

Tweaked docstrings

3308bdd

RNKuhns mentioned this pull request May 19, 2021

Create BaseObject #877

Closed

mloning reviewed May 19, 2021

View reviewed changes

sktime/utils/validation/forecasting.py Outdated Show resolved Hide resolved

Fixed check_scoring validation

f3f5ec7

mloning previously approved these changes May 20, 2021

View reviewed changes

Markus Löning and others added 3 commits May 20, 2021 15:44

Merge branch 'main' into metric-handle-y_train

47d56f6

Added @RNKuhns as performance metric code owner

2d325cb

Merge branch 'metric-handle-y_train' of https://github.com/RNKuhns/sk…

7f66a5f

…time into metric-handle-y_train

RNKuhns dismissed mloning’s stale review via 7f66a5f May 20, 2021 15:49

RNKuhns added 4 commits May 20, 2021 11:50

Fixed docstrings

de85b9f

Added config to skip test files in pydocstyle

a738c4b

Tweaked performance_metric docs

55a45bb

Switched doc automodule back to prior setup

23bc6e2

mloning reviewed May 21, 2021

View reviewed changes

RNKuhns added 3 commits May 23, 2021 09:55

Update evaluate to work with new metric interface

f8e1bd7

Remove make_forecasting_scorer from test_tune

be29d5b

Test tune was creating scorer from scikit-learn mean_squared_error. Using sktime metric class now. Also updated documentation of make_forecasting_scorer to make input function signature clear.

Added comment clarifying tests using MASE

717e497

Removed check on y_train in scaled metrics

1506638

mloning reviewed May 31, 2021

View reviewed changes

sktime/forecasting/model_evaluation/tests/test_evaluate.py Outdated Show resolved Hide resolved

Tweak performance metrics used in test_evaluate

afd25b6

mloning approved these changes Jun 1, 2021

View reviewed changes

Merge branch 'main' into metric-handle-y_train

360e8e0

RNKuhns mentioned this pull request Jun 9, 2021

[ENH] directional forecasting #745

Open

Uh oh!

Conversation

RNKuhns commented May 4, 2021

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Any other comments?

PR checklist

For all contributions

For new estimators

Uh oh!

RNKuhns commented May 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented May 11, 2021

Uh oh!

RNKuhns commented May 12, 2021

Uh oh!

mloning left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mloning left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RNKuhns commented May 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

mloning left a comment

Choose a reason for hiding this comment

Uh oh!

mloning left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

RNKuhns commented May 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mloning commented May 24, 2021

Uh oh!

fkiraly commented May 25, 2021

Uh oh!

Uh oh!

mloning left a comment

Choose a reason for hiding this comment

Uh oh!

mloning commented Jun 3, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

RNKuhns commented May 7, 2021 •

edited

Loading

mloning left a comment •

edited

Loading

RNKuhns commented May 19, 2021 •

edited

Loading

mloning left a comment •

edited

Loading

RNKuhns commented May 22, 2021 •

edited

Loading