Skip to content

Multivariate forecasting (prototype)#1074

Closed
mloning wants to merge 2 commits intomainfrom
multivariate-forecasting
Closed

Multivariate forecasting (prototype)#1074
mloning wants to merge 2 commits intomainfrom
multivariate-forecasting

Conversation

@mloning
Copy link
Copy Markdown
Contributor

@mloning mloning commented Jun 28, 2021

Reference Issues/PRs

Alternative design proposal for multivariate forecasting to design proposal in #980

What does this implement/fix? Explain your changes.

  • multivariate forecasting framework based on existing input validation and tag system
  • column forecaster as example

Does your contribution introduce a new dependency? If yes, which one?

No

Notes

  • Requires changes to unit testing framework to accommodate multivariate forecasting

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jun 28, 2021

Questions:

  • would this approach, if carried out, require re-working all forecasters individually?
  • would a forecaster that has the multivariate tag accept only pd.DataFrame and not pd.Series anymore?

@mloning
Copy link
Copy Markdown
Contributor Author

mloning commented Jun 28, 2021

  • No, not as far as I can see. We would have to implement new multivariate forecasters and make changes to the existing univariate forecasters if we want them to accept multivariate series natively, rather than using the ColumnForecaster.
  • In this PR yes, but that could easily be accommodated if desired, by casting a pd.Series into a pd.DataFrame (e.g. via y.to_frame()) in the check_y function. The internal type in _fit would always be pd.DataFrame for pure multivariate forecasters (forecasters with tags multivariate-only=True and univariate-only=False), and always pd.Series for pure univariate forecasters.

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jun 28, 2021

Is there a STEP that you are implementing here?

@mloning
Copy link
Copy Markdown
Contributor Author

mloning commented Jun 28, 2021

This is my STEP in a way. There are a few things missing from the PR yet to be a full implementation, but I hope the idea becomes clear.

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jun 28, 2021

There should be a STEP for any major interface changes for a core module like this, outlining precisely how the end result would look like. I don't think it's fair to ask a reviewer to extrapolate from a punctual change to the final state without any specs, or to ask reviewers to approve without a clear worked out design.

Especially, I think a number of questions remain open what the implications would be, for instance:

  • what precisely is the required work, for individual classes - those that exist, and those that you would be planning to add? When is work needed, and expected, in the _fit, _predict, etc?
  • which interface contracts will this break, if any? For instance, will changing an estimator from univariate to multivariate throw an error in code relying on the older pd.Series interface because it now requires pd.DataFrame?
  • what is the meaning of the proposed tags? What happens if both multivariate-only and univariate-only are false? What is the intended behaviour in the permissible combination cases?
  • which internal type conversions precisely is this proposing, or not? You say conversions could be easily accommodated, but the important question is which conversions will actually happen in the proposed end state? And how precisely would this happen?

@Lovkush-A
Copy link
Copy Markdown
Collaborator

There should be a STEP for any major interface changes for a core module like this, outlining precisely how the end result would look like. I don't think it's fair to ask a reviewer to extrapolate from a punctual change to the final state without any specs, or to ask reviewers to approve without a clear worked out design.

I think this PR is still in its draft stages and so it not expecting a review yet. I'd be surprised if Markus is expecting you or anybody else to make a final decision with an incomplete design.

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jun 29, 2021

I think this PR is still in its draft stages and so it not expecting a review yet. I'd be surprised if Markus is expecting you or anybody else to make a final decision with an incomplete design.

Yes, that makes sense, @Lovkush-A.

What is worth noting is that this design is in incompatible with #980, so logically this is suggesting a decision between: #980, this PR, or sth else.
I would be keen to hear @mloning's thoughts about this, regarding how you suggest we proceed with the discussion.

@fkiraly fkiraly mentioned this pull request Jun 29, 2021
2 tasks
@mloning
Copy link
Copy Markdown
Contributor Author

mloning commented Jun 29, 2021

Thanks for the review @Lovkush-A! Note that the unit tests are currently not running on the multivariate forecaster. Adding multivariate forecasting and the tag will require some new tests and changes to the existing test which I haven't implemented yet. The main purpose of this PR is to show what I had in mind for the multivariate forecasting framework.

@fkiraly I think these questions are good discussion points. Here are my quick answers, but happy to discuss them in more depth.

  • what precisely is the required work, for individual classes - those that exist, and those that you would be planning to add? When is work needed, and expected, in the _fit, _predict, etc?

No work is needed in _fit or _predict, except in those cases where we want a forecaster to handle both univariate and multivariate y.

  • which interface contracts will this break, if any? For instance, will changing an estimator from univariate to multivariate throw an error in code relying on the older pd.Series interface because it now requires pd.DataFrame?

It doesn't break any existing contracts as far as I can see. As said above, changing an estimator from univariate to multivariate will throw an error if a pd.Series is passed, but that could be handled easily.

  • what is the meaning of the proposed tags? What happens if both multivariate-only and univariate-only are false? What is the intended behaviour in the permissible combination cases?

We need to align on tag meaning more generally. Here, univariate-only means it only support univariate y, multivariate-only means it only supports multivariate y. They cannot both be true. They can both be false.

  • which internal type conversions precisely is this proposing, or not? You say conversions could be easily accommodated, but the important question is which conversions will actually happen in the proposed end state? And how precisely would this happen?

Currently, no new conversions happen besides the ones that are already happening. Including user-convenience conversion if desired would be easily possible (e.g. pd.Series to pd.DataFrame for pure multivariate forecasters).

Shall we arrange another call on this? I think there are some fundamental questions that we need to align on, particularly:

  • which input types do we want to support
  • which internal types do we want to support
  • which output types do we want to support

@AngelPone
Copy link
Copy Markdown
Contributor

AngelPone commented Jul 1, 2021

@mloning In my opinion, there are two main scenarios need multivariate forecasting

  1. forecast each time series individually, and then combine the results (e.g. Forecast combination, ensemble, stacking, bagging, hierarchical forecasting et al.), and obtain multiple ouptuts or single output.
  2. train model using all time series and generate multivariate forecasts (e.g. ML-based global forecasting models, VAR, VARMA, et al.)

ColumnForecaster in your prototype suitable for the first situation, and additional preprocessing and output processes are needed regarding specific models. The prototype is not(at least now) suitable for models using all time series.

@Lovkush-A
Copy link
Copy Markdown
Collaborator

@AngelPone. This PR is primarily intended as a comparison and alternative design to the input/output conversion framework in #980.

@satya-pattnaik
Copy link
Copy Markdown
Collaborator

2. ML-based global forecasting models

@AngelPone Can you elaborate on ML Based global methods a bit?
Do you mean Boosted Trees/Decision Trees where we pass a ID for "each panel"(for example storeID/category ID in the M5 data) as a categorical exogenous variable? A single model for the entire data?
Something like this?

@AngelPone
Copy link
Copy Markdown
Contributor

@satya-pattnaik yes, a single gbdt-based or nn-based global model for the whole dataset. Typically each time series is transformed into tabular(make_reduction in sktime) and all time series are concatenated as train set of ML models. Explainable variables include lag values and their statistics, categorical variables, price, et al..

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jul 12, 2021

@satya-pattnaik, @AngelPone, I feel your discussion is on a very important/useful topic, but off-topic for this thread.

this is about the core interface logic rather than concrete algorithms.
yours seems to be about an important (but concrete) class of composite multivariate models.

Suggest opening or contributing to issues that deal with suggestions for a multivariate ML model wishlist? Topic being about concretes, atomic or composites, to implement?

@mloning
Copy link
Copy Markdown
Contributor Author

mloning commented Jul 14, 2021

Closed in favour of design agreed on in PR #980

fkiraly added a commit that referenced this pull request Jul 22, 2021
…s - working prototype (#980)

This PR introduces multiple input/output type support for `X` and `y` in forecasters, including generic support for multivariate `y`, i.e., `pd.DataFrame`, and the possibility to pass `np.ndarray` and `pd.Series` to either argument. This is loosely based on the design in [STEP no.5](https://github.com/sktime/enhancement-proposals/tree/main/steps/05_scitype_based_IO_checks) and subsequent discussions with @mloning around #1074.

The key ingredients are:

* converters parameterized by from, to, and as - in the `convertIO`module. Besides the obvious conversion functionality, the converters can be given access to a dictionary via reference in the `store` argument, where information for inverting lossy conversions (like from `pd.DataFrame` to `np.array`) can be stored
* a new tay `y:scitype` which can be `"univariate"`, `"multivariate"`, or `"both"`, indicating what type of `y` are supported (multivariate here means 2 dims or more)
* new tags which encode the type of `y` and `X` that the private `_fit`, `_predict`, and `_update` assume internally - for now, it's just one type and not a list of compatible types
* some logic in the public "plumbing" area of `fit`, `predict`, `update`, which converts inputs to the public layer to the desired input of the logic layer and back
* expanding tests and checks that ensure that errors are raised when the wrong types are passed, and changes that ensure the new allowe inputs such as `pd.DataFrame` are allowed rather than blocked by the checks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

API design API design & software architecture module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants