Skip to content

Forecasting support for multivariate y and multiple input/output types - working prototype#980

Merged
fkiraly merged 63 commits intomainfrom
forecasting-input-output-conv
Jul 22, 2021
Merged

Forecasting support for multivariate y and multiple input/output types - working prototype#980
fkiraly merged 63 commits intomainfrom
forecasting-input-output-conv

Conversation

@fkiraly
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly commented Jun 19, 2021

This is a merge/review ready sketch for how multivariate y could be supported for multivariate forecasters, alongside the possibility to pass np.array and pd.Series. This is based on the design in STEP no.5, adapted to the new forecasting base class.

The key ingredient are:

  • converters parameterized by from, to, and as - in the convertIOmodule. Besides the obvious conversion functionality, the converters can be given access to a dictionary via reference in the store argument, where information for inverting lossy conversions (like from pd.DataFrame to np.array) can be stored
  • a new tag y_type which encodes the type of y that the private _fit, _predict, and _update assume internally - for now, it's just one type and not a list of compatible types
  • some logic in the public "plumbing" area of fit, predict, update, which converts inputs to the public layer to the desired input of the logic layer and back

This is for discussion, and I expect major change before we merge.

It might be especially interesting to discuss the following interaction with typing:
the third argument of the converter, the "as" argument, is a scitype, and designed to have the same name as the respective typing composite type. Also see discussion in #965, #973 and #974.

Thinking of the logical continuation of this:

  • extend this to the other arguments, e.g., X
  • the individual argument checks like check_X etc should be replaced by checks of the kind check(what, as : scitype), e.g., check(y, "Series") should replace the less generic check_y.
  • extend this to time series classification, to support 3D numpy arrays, awkward arrays, nested data frames

A "visionary" pathway may see this working with type annotations that are scitypes, not typing machine types, e.g.,

fit(X : Series, Y : Series)
etc
_fit(X: Series, Y : UnivariateSeries)

and checks/conversions being done automatically.

@fkiraly
Copy link
Copy Markdown
Collaborator Author

fkiraly commented Jun 19, 2021

pinging @chrisholder since otherwise I can't add him reviewer

@fkiraly
Copy link
Copy Markdown
Collaborator Author

fkiraly commented Jun 20, 2021

note: this fails with some of the forecasters that are not yet refactored, since it can happen that fit is overridden but predict is not, in which case the variables written in fit are not addressed.

I see two ways to address this:

My preference is no.1, since it might add redundant checks and impose less stringent interface compatibility conditions on new forecasters.

@fkiraly
Copy link
Copy Markdown
Collaborator Author

fkiraly commented Jun 22, 2021

I've found a better option no.3 - initialize the fit variables with "safe" defaults that are what they would be for all of the forecasters that haven't been refactored. We can change that later, though it might be in general a good idea to have these "safe" defaults.

@fkiraly fkiraly mentioned this pull request Jun 22, 2021
@fkiraly fkiraly marked this pull request as ready for review June 22, 2021 11:59
@fkiraly
Copy link
Copy Markdown
Collaborator Author

fkiraly commented Jun 22, 2021

ready for review

@fkiraly fkiraly changed the title [DRAFT] Forecasting support for multivariate y and multiple input/output types Forecasting support for multivariate y and multiple input/output types - working prototype Jun 22, 2021
@fkiraly
Copy link
Copy Markdown
Collaborator Author

fkiraly commented Jul 21, 2021

from verbal discussion with @mloning, @mloning reviewed this and did not like two things:

  • that output conversions were still implemented, just turned off - he thought the option should be removed entirely. That I highlighted as deviation from the agreed plan. @mloning's argument here is the unnecessary code complexity when looking at fit/predict. I concurred with this and agreed to remove entirely.
  • that I didn't remove the tag for the internal machine type, y_inner_mtype after our discussion. @mloning thought I deviated from the agreement here, though it didn't occur to me that this should have been removed (see list above). @mloning thought that the inner type should be fully determined by the y:scitype tag, and some translation between the y:scitype tag and the convert functionality should happen in fit and update. @mloning thought this was a strong point of contention. I disagreed that the tag should be removed, since I didn't think it was part of the agreement to change this part, and it would add the "glue code" to fit/update as unnecessary complexity in there.

Overall, we agreed that I can make the call to merge this PR, in the interest of time and dependent development activities of @GuzalBulatova and @thayeylolu. Further, we can have detail discussions on the necessary tags later, and see it as a delta to this PR rather than a blocker for this PR.

The call I made is as follows:

  • I remove the output conversions entirely, in line with the spirit of our agreement above, and the argument of code complexity.
  • I'm keeping y_inner_mtype for the moment, to keep the conversion logic and the content of fit/update simpler; also, in consideration of the time series classification case and facebook prophet where the supposed 1:1 mapping of inner type and scitype is broken. Also, in consideration of potentially slowly adding geomstats like data container functionality to sktime, and development activity where it's not entirely clear yet whether this mapping would be easy to adhere to.

FYI, @aiwalter, @TonyBagnall.

@Lovkush-A
Copy link
Copy Markdown
Collaborator

@fkiraly Great effort, Franz! This is one monster of a PR.

I have seen all your comments/resolutions to my review conversations and am happy with decisions made. There was one conversation in which you gave two opposite answers to the same question (whether MvS is always pandas frames or not), but I do not think that is actionable in this PR.

@fkiraly
Copy link
Copy Markdown
Collaborator Author

fkiraly commented Jul 22, 2021

FYI, @mloning: I also made changes to the test which parallel yours in #1074 - the enforce functions were not working properly.

@fkiraly fkiraly merged commit 5bab347 into main Jul 22, 2021
@fkiraly fkiraly deleted the forecasting-input-output-conv branch July 22, 2021 07:32
@fkiraly
Copy link
Copy Markdown
Collaborator Author

fkiraly commented Jul 22, 2021

@GuzalBulatova, @thayeylolu, this is now merged - and it should be useable in your work on multivariate forecasters.

fkiraly added a commit that referenced this pull request Jul 22, 2021
The pd.DataFrame->pd.Series conversion was incorrectly missing from `_convertIO` (from #980), this is fixed now.

Merging quickly since urgent bugfix
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:forecasting forecasting module: forecasting, incl probabilistic and hierarchical forecasting

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants