Conversation
|
Questions:
|
|
|
Is there a STEP that you are implementing here? |
|
This is my STEP in a way. There are a few things missing from the PR yet to be a full implementation, but I hope the idea becomes clear. |
|
There should be a STEP for any major interface changes for a core module like this, outlining precisely how the end result would look like. I don't think it's fair to ask a reviewer to extrapolate from a punctual change to the final state without any specs, or to ask reviewers to approve without a clear worked out design. Especially, I think a number of questions remain open what the implications would be, for instance:
|
I think this PR is still in its draft stages and so it not expecting a review yet. I'd be surprised if Markus is expecting you or anybody else to make a final decision with an incomplete design. |
Yes, that makes sense, @Lovkush-A. What is worth noting is that this design is in incompatible with #980, so logically this is suggesting a decision between: #980, this PR, or sth else. |
|
Thanks for the review @Lovkush-A! Note that the unit tests are currently not running on the multivariate forecaster. Adding multivariate forecasting and the tag will require some new tests and changes to the existing test which I haven't implemented yet. The main purpose of this PR is to show what I had in mind for the multivariate forecasting framework. @fkiraly I think these questions are good discussion points. Here are my quick answers, but happy to discuss them in more depth.
No work is needed in
It doesn't break any existing contracts as far as I can see. As said above, changing an estimator from univariate to multivariate will throw an error if a
We need to align on tag meaning more generally. Here,
Currently, no new conversions happen besides the ones that are already happening. Including user-convenience conversion if desired would be easily possible (e.g. pd.Series to pd.DataFrame for pure multivariate forecasters). Shall we arrange another call on this? I think there are some fundamental questions that we need to align on, particularly:
|
|
@mloning In my opinion, there are two main scenarios need multivariate forecasting
ColumnForecaster in your prototype suitable for the first situation, and additional preprocessing and output processes are needed regarding specific models. The prototype is not(at least now) suitable for models using all time series. |
|
@AngelPone. This PR is primarily intended as a comparison and alternative design to the input/output conversion framework in #980. |
@AngelPone Can you elaborate on ML Based global methods a bit? |
|
@satya-pattnaik yes, a single gbdt-based or nn-based global model for the whole dataset. Typically each time series is transformed into tabular( |
|
@satya-pattnaik, @AngelPone, I feel your discussion is on a very important/useful topic, but off-topic for this thread. this is about the core interface logic rather than concrete algorithms. Suggest opening or contributing to issues that deal with suggestions for a multivariate ML model wishlist? Topic being about concretes, atomic or composites, to implement? |
|
Closed in favour of design agreed on in PR #980 |
…s - working prototype (#980) This PR introduces multiple input/output type support for `X` and `y` in forecasters, including generic support for multivariate `y`, i.e., `pd.DataFrame`, and the possibility to pass `np.ndarray` and `pd.Series` to either argument. This is loosely based on the design in [STEP no.5](https://github.com/sktime/enhancement-proposals/tree/main/steps/05_scitype_based_IO_checks) and subsequent discussions with @mloning around #1074. The key ingredients are: * converters parameterized by from, to, and as - in the `convertIO`module. Besides the obvious conversion functionality, the converters can be given access to a dictionary via reference in the `store` argument, where information for inverting lossy conversions (like from `pd.DataFrame` to `np.array`) can be stored * a new tay `y:scitype` which can be `"univariate"`, `"multivariate"`, or `"both"`, indicating what type of `y` are supported (multivariate here means 2 dims or more) * new tags which encode the type of `y` and `X` that the private `_fit`, `_predict`, and `_update` assume internally - for now, it's just one type and not a list of compatible types * some logic in the public "plumbing" area of `fit`, `predict`, `update`, which converts inputs to the public layer to the desired input of the logic layer and back * expanding tests and checks that ensure that errors are raised when the wrong types are passed, and changes that ensure the new allowe inputs such as `pd.DataFrame` are allowed rather than blocked by the checks
Reference Issues/PRs
Alternative design proposal for multivariate forecasting to design proposal in #980
What does this implement/fix? Explain your changes.
Does your contribution introduce a new dependency? If yes, which one?
No
Notes