coherent interface for data_processing (panel conversions) module by fkiraly · Pull Request #1061 · sktime/sktime

fkiraly · 2021-06-26T19:08:29Z

This PR introduces a clean interface for the panel data types conversions module.

This is fully downwards compatible, but adds the convert(from, to, scitype)(what) syntax on top of the utils.data_processing module.

This should provide users with an easy way to convert their data into the sktime required format for TSC without having to sift through the code to find the right converter utility.

I would also be planning to put this in a future section 1 for "data loading" in the TSC introduction notebook.

A number of conversions are missing, and some are lossy, so this may be a good start for future "good first issues".

TonyBagnall

Not looked at the code yet, but we so need this, it's great you have taken it on. I'll look in detail on monday

fkiraly · 2021-07-01T19:23:11Z

@TonyBagnall - which Monday?

TonyBagnall · 2021-07-01T19:28:08Z

That's a bit sarcastic. I've been busy this week, and thought we should resolve 980 first in any case. I'm not sure why, after months of inaction, everything now has to be rushed through to your agenda. We are all volunteers here.

fkiraly · 2021-07-01T19:52:59Z

That's a bit sarcastic.

Hah, indeed 😃

I've been busy this week, and thought we should resolve 980 first in any case.

No, don't worry! Btw, I think this is independent of #980 - it just provides a nicer interface to the existing converters, that's all.

I'm not sure why, after months of inaction, everything now has to be rushed through to your agenda.

Well, we had the dev sprint and a larger number of PR are open. I'm just trying to avoid that PR get forgotten just because no one looks at then, especially the ones that are fresh. It's more difficult if they get older and PR pile on and on.

You are doing the same with the old PR.

We are all volunteers here.

Not since you've clicked "I agree" when you signed up for slack. Did you read the fine print?

TonyBagnall

I'm not sure on the terminology used here. "one column per variable" doesn't mean much to me, not sure what wide-format is and as I understand it there is no standard definition for long format? Could we link to an example of each either in file or through a loader from ts format? Also n_instances and n_timepoints are clear to me, but n_columns means nothing. I would prefer n_dimensions if this is for TSC.
MTYPE_REGISTER_PANEL = [
("nested_univ", "pd.DataFrame with one column per variable, pd.Series in cells"),
("numpy3D", "3D np.array of format (n_instances, n_columns, n_timepoints)"),
("numpyflat", "2D np.array of format (n_instances, n_columnsn_timepoints)"),
("pd-multiindex", "pd.DataFrame with multi-index (instance, time point)"),
("pd-wide", "pd.DataFrame in wide format, cols = (instancetime point)"),
("pd-long", "pd.DataFrame in long format, cols = (index, time_index, column)"),
]

fkiraly · 2021-07-05T19:30:26Z

Also n_instances and n_timepoints are clear to me, but n_columns means nothing

This is how it is in the docstrings of the existing converters, I thought you would recognize it in the module that you have been curating, @TonyBagnall 😃

But I agree, this entire business is in dire need of:

a document where all the default formats are listed
with examples for the format, optimally the same "scientific data" in different machine types

Could we link to an example of each either in file or through a loader from ts format?

I would very much hope so, but that would be a different scope imo. This is just cleaning up the existing module with existing docstrings.

not sure what wide-format is and as I understand it there is no standard definition for long format?

I think it's relatively standard, at least in the medical world? Not sure whether it's R specific terminology though.

TonyBagnall

based on discussion I'm happy to approve this, its certainly an improvement

TonyBagnall · 2021-07-14T12:02:55Z

I'll put this in unless @mloning wants to review it? No rush if you do want a review, but also I'm doing some housekeeping, good to get things done, and this seems pretty uncontentious

…rs (#1225) This PR further consolidates the datatypes module introduced in #1061 and #1201: * adding example fixtures for most important panel data containers * consolidated checks for "is (some type)" in module, added some missing ones * adding registry constants for panel data containers * renaming leftover "what" variables to "obj" * bugfix in converter from nested to multi-index * adding converters for list-of-data-frames panel type used in the distance module * added tests for checking functionality * adding tests for conversion functionality, testing converters against fixtures * adding docstrings where they were missing

coherent interface for data_processing (panel conversions) module

64ceaab

fkiraly added API design API design & software architecture implementing framework Implementing or improving framework for learning tasks, e.g., base class functionality module:classification classification module: time series classification labels Jun 26, 2021

fkiraly requested review from ABostrom, TonyBagnall and mloning June 26, 2021 19:08

fkiraly added 2 commits June 26, 2021 20:11

linting

2e55981

linting

ec5a0e1

TonyBagnall reviewed Jun 26, 2021

View reviewed changes

fkiraly mentioned this pull request Jun 26, 2021

Forecasting support for multivariate y and multiple input/output types - working prototype #980

Merged

fkiraly added 3 commits June 27, 2021 00:38

linting

7c2aafc

_make_column_names export added

ad3992f

more exports - it´s needed, but isn´t this private?

e1b8817

fkiraly mentioned this pull request Jul 3, 2021

tests refactor: scitype abstraction and dispatch #1097

Closed

TonyBagnall reviewed Jul 5, 2021

View reviewed changes

Merge branch 'main' into panel-converters

8912cad

TonyBagnall approved these changes Jul 6, 2021

View reviewed changes

fkiraly and others added 5 commits July 11, 2021 22:19

Merge branch 'main' into panel-converters

ed17089

Merge branch 'main' into panel-converters

beada1b

Merge branch 'main' into panel-converters

2af9b6a

Merge branch 'main' into panel-converters

97d7593

Merge branch 'main' into panel-converters

b724621

fkiraly merged commit 9ca7587 into main Jul 17, 2021

fkiraly deleted the panel-converters branch July 17, 2021 08:35

fkiraly mentioned this pull request Jul 29, 2021

Datatypes module consolidation - tests, fixtures, panel data containers #1225

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

coherent interface for data_processing (panel conversions) module#1061

coherent interface for data_processing (panel conversions) module#1061
fkiraly merged 12 commits intomainfrom
panel-converters

fkiraly commented Jun 26, 2021

Uh oh!

TonyBagnall left a comment

Uh oh!

fkiraly commented Jul 1, 2021

Uh oh!

TonyBagnall commented Jul 1, 2021

Uh oh!

fkiraly commented Jul 1, 2021 •

edited

Loading

Uh oh!

TonyBagnall left a comment

Uh oh!

fkiraly commented Jul 5, 2021 •

edited

Loading

Uh oh!

TonyBagnall left a comment

Uh oh!

TonyBagnall commented Jul 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

fkiraly commented Jun 26, 2021

Uh oh!

TonyBagnall left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly commented Jul 1, 2021

Uh oh!

TonyBagnall commented Jul 1, 2021

Uh oh!

fkiraly commented Jul 1, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TonyBagnall left a comment

Choose a reason for hiding this comment

Uh oh!

fkiraly commented Jul 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

TonyBagnall left a comment

Choose a reason for hiding this comment

Uh oh!

TonyBagnall commented Jul 14, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

fkiraly commented Jul 1, 2021 •

edited

Loading

fkiraly commented Jul 5, 2021 •

edited

Loading