Conversation
…itute/sktime into transform-types
…itute/sktime into transform-types
|
This is a huge pull request... Starting review with the proposed typing (from the STEP), questions/comments.
|
|
Quick Q: would you be able to explain why the replacement for If I understand correctly, the |
|
? |
|
?? @mloning |
|
Yes @fkiraly, wanted to discuss your points in our next meeting, thanks for the feedback! To answer your quick question: the reason is that unequal length intervals are unequal length panel data, even though we can represent this in the nested pd.DataFrame, we don't really have functionality to deal with that yet, the previous implementation was very hacky, that's why I removed it. |
|
Note on the modular time series forest: the source of the problem is not being able to properly handle samples of multivariate series, sampled and unequal time points, and of unequal lengths.
while numpy array is not possible due to the unequal length issue. |
|
I've looked at the code in greater detail now.
I agree with the general design, I think this is making things much clearer and cleaner. Major comments:
Minor comments:
|
|
Regarding the issue of simultaneous use of I think the most elegant way would be to use the typing, and perhaps annotated types, to introduce generic type checks for function arguments. This can be covered by two parts:
The |
fkiraly
left a comment
There was a problem hiding this comment.
Questions for discussion above.
Won't approve/request since I think the typing and doc issue may be good to discuss.
|
@fkiraly thanks again for the review, merged now, but happy to discuss this further and push new changes! |
|
so, is the idea to have this as a "working version" and deal with the type checks later? |
|
Sorry if that wasn't clear @fkiraly! As discussed on Friday, other work depends on this and I want to move forward. You also didn't request these changes, so I thought they were points for discussion rather than change requests. Regarding your comments:
|
But not all learners can be given
I suspected this, but what is the "rule" in assigning letters here? Is it consistent across scitypes?
Well, at least the extension guidelines are straightforward.
My intuition would tell me to get the "row wise panel transformers" from consistently applying a factory, decorator, or wrapper/reduction method to row wise transformers, to obtain panel versions. Perhaps add a tag that says a panel transformer is row-wise, rather than a full scitype, but not sure whether that's useful anywhere. |
Reference Issues/PRs
STEP03 on transform typing
What does this implement/fix? Explain your changes.
Major changes
There are some backward-incompatible changes here and I'm planning to make this release v0.5.0 to indicate this.
RowTransformerbased on new transformer types, but no longer handles multivariate unequal length data, so we cannot use it together with multiple random intervals, to do this properly we need better support for unequal length data, so this basically removes the most modular version of time series forestRandomIntervalFeatureExtractionTransformerto no longer inherit but now wrapRandomIntervalSegmenter(avoids change of transform type through inheritance)fit(y, fh=None, X=None)tofit(y, X=None, fh=None)to make it easier to pass exogenous arguments viafit(y, X)asfhis rarely required infitseries_as_featurestopaneleverywhere including module/function names aspanelis shorter, easier to read, and more widely used, exceptsktime/series_as_featuresmodule which exclusively refers to classification/regression/clustering setting and not other panel data related functionalityMinor changes
all_estimatorsfunction to acceptexcludeandreturn_nameskeyword argumenttime_series_slopeto consistently handle multidimensional inputs, combined with_lsq_fitused in TSFKey files to review
sktime/transformers/base.py