Conversation
|
@mloning and @fkiraly the more I'm looking at this while looking to simplify the unit tests for these metrics, it makes more and more sense to me that the metric functions in Difference to the user passing info to functions will be relatively minimal. Benefit is that mean_absolute_scaled_error(y_true, y_pred, y_train=y_train, ... # Any other keyword args like multioutput)
mase = MeanAbsoluteScaledError()
mase(y_true, y_pred, y_train=y_train)versus the current function setup accepting y_train as arg and class version of metric accepting it as keyword: mean_absolute_scaled_error(y_true, y_pred, y_train, ... # Any other keyword args like multioutput)
mase = MeanAbsoluteScaledError()
mase(y_true, y_pred, y_train=y_train)Note that I'll do whatever we decide for I'll I've got this coded up but wanted to get your feedback on the approach before committing and pushing to Github. |
metrics that require y_pred_benchmark
Yes, very much agreed! I think this indeed makes unit tests easier, and any other exercise that requires a unified interface on the function level. |
sktime/performance_metrics/tests/test_performance_metrics_forecasting.py
Outdated
Show resolved
Hide resolved
…time into metric-handle-y_train
|
@mloning I think I've incorporated all your comments. As a bonus I've tweaked all the docstrings to better align with NumPy conventions and pass the pydocstyle checks. I also opened an issue related to the creation of a |
mloning
left a comment
There was a problem hiding this comment.
Looks all good to me - I'll merge in the next few days in case anyone else wants to take a look!
There was a problem hiding this comment.
Hi @RNKuhns, we should also update evaluate in sktime.forecasting.model_evaluation to pass y_train as a keyword argument to the scoring: https://github.com/alan-turing-institute/sktime/blob/c611e6a3587d7b3a44cb7deefd4b6baa4897fc9b/sktime/forecasting/model_evaluation/_functions.py#L106
If we do that here, we should also add a metric that requires y_train to the current test cases here (perhaps add MASE and remove MSE which doesn't add much as an additional test case): https://github.com/alan-turing-institute/sktime/blob/c611e6a3587d7b3a44cb7deefd4b6baa4897fc9b/sktime/forecasting/model_evaluation/tests/test_evaluate.py#L80
|
@mloning I'm working on update to evaluate. Ran into two minor hitches. First is that in the call to scoring within The bigger hitch is that the test cases include in-sample predictions. In MeanAbsoluteScaledError() we have a check to make sure that y_train is before y_true. This makes sense in the context of forecast evaluation in general (and MASE's definition). But some of the test cases are looking at in-sample predictions and in those cases the check is failed in those cases; hence, the tests fail. I can think of two options for approaching this:
I am in the camp that MASE is designed to be applied when evaluating out-of-sample forecasts so we just include a line of code to exclude tests of evaluate for MASE when the forecasting horizon is negative (in-sample). That still ensures the MASE functionality is working in a way that makes sense for users (the whole evaluating in-sample forecasts is not a good gauge of the model's future predictive ability thing). But what do you think? I could also figure out how to add a quick check in |
Test tune was creating scorer from scikit-learn mean_squared_error. Using sktime metric class now. Also updated documentation of make_forecasting_scorer to make input function signature clear.
|
@RNKuhns I agree, but perhaps the check that |
|
@RNKuhns, I'm with @mloning on this one - in my opinion, the metric should just model the "bare" mathematical/scientific object, not make any checks regarding a plausible use case. This is because user expectations - they will expect the code object to behave as the scientific one, and because of the domain driven design principle to follow this mapping. A secondary point - in my opinion not the main argument here, but it's to the same effect - is that it can make sense for a metric in-principle to have training and test time points overlapping, for example when you are trying to estimate over-optimism of in-sample estimates. We shouldn't exclude a rare sensible use case. |
|
Now merged - great work @RNKuhns 🎉 |
Reference Issues/PRs
This addresses functionality in #712.
What does this implement/fix? Explain your changes.
Add functionality to accept
y_trainin__call__method and pass to underlying function if it requiresy_train.Also updated underlying metric classes to inherit from BaseMetric.
Does your contribution introduce a new dependency? If yes, which one?
What should a reviewer concentrate their feedback on?
Any other comments?
PR checklist
For all contributions
For new estimators