Multivariate forecasting (prototype) by mloning · Pull Request #1074 · sktime/sktime

mloning · 2021-06-28T19:35:22Z

Reference Issues/PRs

Alternative design proposal for multivariate forecasting to design proposal in #980

What does this implement/fix? Explain your changes.

multivariate forecasting framework based on existing input validation and tag system
column forecaster as example

Does your contribution introduce a new dependency? If yes, which one?

No

Notes

Requires changes to unit testing framework to accommodate multivariate forecasting

fkiraly · 2021-06-28T19:44:47Z

Questions:

would this approach, if carried out, require re-working all forecasters individually?
would a forecaster that has the multivariate tag accept only pd.DataFrame and not pd.Series anymore?

mloning · 2021-06-28T19:49:24Z

No, not as far as I can see. We would have to implement new multivariate forecasters and make changes to the existing univariate forecasters if we want them to accept multivariate series natively, rather than using the ColumnForecaster.
In this PR yes, but that could easily be accommodated if desired, by casting a pd.Series into a pd.DataFrame (e.g. via y.to_frame()) in the check_y function. The internal type in _fit would always be pd.DataFrame for pure multivariate forecasters (forecasters with tags multivariate-only=True and univariate-only=False), and always pd.Series for pure univariate forecasters.

fkiraly · 2021-06-28T19:51:49Z

Is there a STEP that you are implementing here?

mloning · 2021-06-28T19:54:55Z

This is my STEP in a way. There are a few things missing from the PR yet to be a full implementation, but I hope the idea becomes clear.

fkiraly · 2021-06-28T21:00:15Z

There should be a STEP for any major interface changes for a core module like this, outlining precisely how the end result would look like. I don't think it's fair to ask a reviewer to extrapolate from a punctual change to the final state without any specs, or to ask reviewers to approve without a clear worked out design.

Especially, I think a number of questions remain open what the implications would be, for instance:

what precisely is the required work, for individual classes - those that exist, and those that you would be planning to add? When is work needed, and expected, in the _fit, _predict, etc?
which interface contracts will this break, if any? For instance, will changing an estimator from univariate to multivariate throw an error in code relying on the older pd.Series interface because it now requires pd.DataFrame?
what is the meaning of the proposed tags? What happens if both multivariate-only and univariate-only are false? What is the intended behaviour in the permissible combination cases?
which internal type conversions precisely is this proposing, or not? You say conversions could be easily accommodated, but the important question is which conversions will actually happen in the proposed end state? And how precisely would this happen?

Lovkush-A · 2021-06-28T21:16:57Z

There should be a STEP for any major interface changes for a core module like this, outlining precisely how the end result would look like. I don't think it's fair to ask a reviewer to extrapolate from a punctual change to the final state without any specs, or to ask reviewers to approve without a clear worked out design.

I think this PR is still in its draft stages and so it not expecting a review yet. I'd be surprised if Markus is expecting you or anybody else to make a final decision with an incomplete design.

sktime/forecasting/compose/_column.py

fkiraly · 2021-06-29T09:36:45Z

I think this PR is still in its draft stages and so it not expecting a review yet. I'd be surprised if Markus is expecting you or anybody else to make a final decision with an incomplete design.

Yes, that makes sense, @Lovkush-A.

What is worth noting is that this design is in incompatible with #980, so logically this is suggesting a decision between: #980, this PR, or sth else.
I would be keen to hear @mloning's thoughts about this, regarding how you suggest we proceed with the discussion.

mloning · 2021-06-29T12:14:03Z

Thanks for the review @Lovkush-A! Note that the unit tests are currently not running on the multivariate forecaster. Adding multivariate forecasting and the tag will require some new tests and changes to the existing test which I haven't implemented yet. The main purpose of this PR is to show what I had in mind for the multivariate forecasting framework.

@fkiraly I think these questions are good discussion points. Here are my quick answers, but happy to discuss them in more depth.

what precisely is the required work, for individual classes - those that exist, and those that you would be planning to add? When is work needed, and expected, in the _fit, _predict, etc?

No work is needed in _fit or _predict, except in those cases where we want a forecaster to handle both univariate and multivariate y.

which interface contracts will this break, if any? For instance, will changing an estimator from univariate to multivariate throw an error in code relying on the older pd.Series interface because it now requires pd.DataFrame?

It doesn't break any existing contracts as far as I can see. As said above, changing an estimator from univariate to multivariate will throw an error if a pd.Series is passed, but that could be handled easily.

what is the meaning of the proposed tags? What happens if both multivariate-only and univariate-only are false? What is the intended behaviour in the permissible combination cases?

We need to align on tag meaning more generally. Here, univariate-only means it only support univariate y, multivariate-only means it only supports multivariate y. They cannot both be true. They can both be false.

which internal type conversions precisely is this proposing, or not? You say conversions could be easily accommodated, but the important question is which conversions will actually happen in the proposed end state? And how precisely would this happen?

Currently, no new conversions happen besides the ones that are already happening. Including user-convenience conversion if desired would be easily possible (e.g. pd.Series to pd.DataFrame for pure multivariate forecasters).

Shall we arrange another call on this? I think there are some fundamental questions that we need to align on, particularly:

which input types do we want to support
which internal types do we want to support
which output types do we want to support

AngelPone · 2021-07-01T06:17:15Z

@mloning In my opinion, there are two main scenarios need multivariate forecasting

forecast each time series individually, and then combine the results (e.g. Forecast combination, ensemble, stacking, bagging, hierarchical forecasting et al.), and obtain multiple ouptuts or single output.
train model using all time series and generate multivariate forecasts (e.g. ML-based global forecasting models, VAR, VARMA, et al.)

ColumnForecaster in your prototype suitable for the first situation, and additional preprocessing and output processes are needed regarding specific models. The prototype is not(at least now) suitable for models using all time series.

Lovkush-A · 2021-07-01T08:45:08Z

@AngelPone. This PR is primarily intended as a comparison and alternative design to the input/output conversion framework in #980.

satya-pattnaik · 2021-07-10T15:08:05Z

2. ML-based global forecasting models

@AngelPone Can you elaborate on ML Based global methods a bit?
Do you mean Boosted Trees/Decision Trees where we pass a ID for "each panel"(for example storeID/category ID in the M5 data) as a categorical exogenous variable? A single model for the entire data?
Something like this?

AngelPone · 2021-07-12T06:27:49Z

@satya-pattnaik yes, a single gbdt-based or nn-based global model for the whole dataset. Typically each time series is transformed into tabular(make_reduction in sktime) and all time series are concatenated as train set of ML models. Explainable variables include lag values and their statistics, categorical variables, price, et al..

fkiraly · 2021-07-12T10:44:26Z

@satya-pattnaik, @AngelPone, I feel your discussion is on a very important/useful topic, but off-topic for this thread.

this is about the core interface logic rather than concrete algorithms.
yours seems to be about an important (but concrete) class of composite multivariate models.

Suggest opening or contributing to issues that deal with suggestions for a multivariate ML model wishlist? Topic being about concretes, atomic or composites, to implement?

mloning · 2021-07-14T07:02:12Z

Closed in favour of design agreed on in PR #980

@mloning

…s - working prototype (#980) This PR introduces multiple input/output type support for `X` and `y` in forecasters, including generic support for multivariate `y`, i.e., `pd.DataFrame`, and the possibility to pass `np.ndarray` and `pd.Series` to either argument. This is loosely based on the design in [STEP no.5](https://github.com/sktime/enhancement-proposals/tree/main/steps/05_scitype_based_IO_checks) and subsequent discussions with @mloning around #1074. The key ingredients are: * converters parameterized by from, to, and as - in the `convertIO`module. Besides the obvious conversion functionality, the converters can be given access to a dictionary via reference in the `store` argument, where information for inverting lossy conversions (like from `pd.DataFrame` to `np.array`) can be stored * a new tay `y:scitype` which can be `"univariate"`, `"multivariate"`, or `"both"`, indicating what type of `y` are supported (multivariate here means 2 dims or more) * new tags which encode the type of `y` and `X` that the private `_fit`, `_predict`, and `_update` assume internally - for now, it's just one type and not a list of compatible types * some logic in the public "plumbing" area of `fit`, `predict`, `update`, which converts inputs to the public layer to the desired input of the logic layer and back * expanding tests and checks that ensure that errors are raised when the wrong types are passed, and changes that ensure the new allowe inputs such as `pd.DataFrame` are allowed rather than blocked by the checks

Multivariate forecasting + ColumnForecaster

781263a

fkiraly mentioned this pull request Jun 28, 2021

Forecasting support for multivariate y and multiple input/output types - working prototype #980

Merged

Lovkush-A reviewed Jun 28, 2021

View reviewed changes

sktime/forecasting/compose/_column.py Outdated Show resolved Hide resolved

Lovkush-A reviewed Jun 28, 2021

View reviewed changes

sktime/forecasting/compose/_column.py Outdated Show resolved Hide resolved

fkiraly mentioned this pull request Jun 29, 2021

[ENH] VAR and VECM models #929

Closed

2 tasks

Add unit tests

c6427be

fkiraly mentioned this pull request Jun 30, 2021

TSC base template refactor #1026

Merged

Lovkush-A mentioned this pull request Jun 30, 2021

Design and implementation of ColumnEnsembleForecaster #1081

Closed

satya-pattnaik mentioned this pull request Jul 13, 2021

[ENH] Global/Panel Forecasting Using Decision Trees/Decision Tree Ensembles #1132

Closed

Uh oh!

Conversation

mloning commented Jun 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

Notes

Uh oh!

fkiraly commented Jun 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mloning commented Jun 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Jun 28, 2021

Uh oh!

mloning commented Jun 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Jun 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lovkush-A commented Jun 28, 2021

Uh oh!

Uh oh!

Uh oh!

fkiraly commented Jun 29, 2021

Uh oh!

mloning commented Jun 29, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AngelPone commented Jul 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Lovkush-A commented Jul 1, 2021

Uh oh!

satya-pattnaik commented Jul 10, 2021

Uh oh!

AngelPone commented Jul 12, 2021

Uh oh!

fkiraly commented Jul 12, 2021

Uh oh!

mloning commented Jul 14, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

mloning commented Jun 28, 2021 •

edited

Loading

fkiraly commented Jun 28, 2021 •

edited

Loading

mloning commented Jun 28, 2021 •

edited

Loading

mloning commented Jun 28, 2021 •

edited

Loading

fkiraly commented Jun 28, 2021 •

edited

Loading

mloning commented Jun 29, 2021 •

edited

Loading

AngelPone commented Jul 1, 2021 •

edited

Loading

mloning commented Jul 14, 2021 •

edited

Loading