[ENH] Adapter from multiindex to ListDataset by TNTran92 · Pull Request #2893 · sktime/sktime

TNTran92 · 2022-06-29T01:26:26Z

Reference Issues/PRs

Partial solution to Issue #2860

What does this implement/fix? Explain your changes.

Implement an adapter to convert multiindex format to ListDataset in gluon-ts
The method takes as input a pd-multiindex DataFrame, (optional) a list of categorical feature, (optional) the startdate and (optional) frequency of the dataset.

Does your contribution introduce a new dependency? If yes, which one?

Dependency: gluon-ts

What should a reviewer concentrate their feedback on?

Testing with different dataset in sktime to weed out the bugs. So far, Arrowhead, ItalyPoowerDemand and StandWalkJump have been tested.

Any other comments?

PR checklist

For all contributions

I've added myself to the list of contributors.
Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
I've added unit tests and made sure they pass locally.
The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.

For new estimators

I've added the estimator to the online documentation.
I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

fkiraly

Thanks!

May I suggest some changes:

remove dependency on convert_to and check_is. I'll integrate it into the conversion framework later. Assume the input is pd-multiindex.
do you need to input frequency? Can that not be inferred from the pandas input time index? This should be inferred and appropriately translated, instead of being overwritten, in the vanilla call.
could you move this to datatypes.adapters?
a back-conversion function would be nice, we need that for forecasts, do we not? Maybe a separate PR
can you kindly add tests please, with example input and outputs?

TNTran92 · 2022-07-03T17:37:52Z

could you move this to datatypes.adapters?

I don't see this in the repo. Can you help me locate it?

TNTran92 · 2022-07-03T18:53:09Z

a back-conversion function would be nice, we need that for forecasts, do we not? Maybe a separate PR

Yes, this will be in a separate PR. That one is in progress.

fkiraly · 2022-07-03T19:31:24Z

I don't see this in the repo. Can you help me locate it?

It does not exist, sorry for being unclear - I suggest creating a new submodule, so we can isolate soft dependencies there.

…nto List_Dataset_2

TNTran92 · 2022-07-05T02:48:33Z

remove dependency on convert_to and check_is. I'll integrate it into the conversion framework later. Assume the input is pd-multiindex.

do you need to input frequency? Can that not be inferred from the pandas input time index? This should be inferred and appropriately translated, instead of being overwritten, in the vanilla call.

could you move this to datatypes.adapters?

a back-conversion function would be nice, we need that for forecasts, do we not? Maybe a separate PR

can you kindly add tests please, with example input and outputs?

I use convert_to because the code works with nested pd.Series. If I remove convert_to, I will have to write my own code somewhere in the function that will do something similar.
Frequency and start_date are now inferred from input data.
The converter has been moved to datatypes._adapters
Lastly, I am still working on making a pytest...

fkiraly · 2022-07-06T22:17:17Z

Lastly, I am still working on making a pytest...

I can help with the testing, there is a conversion testing framework already in datatypes. What it needs are examples that are the same in the format - so you could add examples to the panel._examples module?

TNTran92 · 2022-07-07T05:07:35Z

Lastly, I am still working on making a pytest...

I can help with the testing, there is a conversion testing framework already in datatypes. What it needs are examples that are the same in the format - so you could add examples to the panel._examples module?

I added 2 examples: 1 for univariate and 1 for multivariate. Also added listdataset into registry. I'm still trying to figure out pytest

TNTran92 · 2022-07-07T14:27:06Z

This is for my information only. After adding datatypes example, encounter
` def test_check_metadata_inference(scitype, mtype, fixture_index):
"""Tests that check_is_mtype correctly infers metadata of examples.

    Parameters
    ----------
    scitype : str - scitype of fixture
    mtype : str - mtype of fixture
    fixture_index : int - index of fixture tuple with that scitype and mtype

    Raises
    ------
    RuntimeError if scitype is not defined or has no mtypes or examples
    AssertionError if example metadata is not correctly inferred
    error if check itself raises an error
    """
    # retrieve fixture for checking

  fixture, _, expected_metadata = get_examples(

        mtype=mtype, as_scitype=scitype, return_metadata=True
    ).get(fixture_index)

E TypeError: cannot unpack non-iterable NoneType object

fixture_index = 1
mtype = 'listdataset'
scitype = 'Panel'

sktime/datatypes/tests/test_check.py:153: TypeError

fkiraly · 2022-07-08T14:48:32Z

for the record, this is as we discussed today in person:

the _examples are of a form where every "abstract example" should be spelled out in each of the mtypes. These should either be None or some data container. You added examples with index 3 and 4, so those are not available in the other mtypes.

A secondary question I have is how we deal with mtypes that have soft dependencies, I need to think about that more carefully.

fkiraly

@TNTran92, I´ve thought about this.
Given the complexity of some design questions, see #2957, I would suggest:

Let´s separate this PR from the datatypes testing framework., i.e., split off the _examples and possibly _registry.

I would recommend to:

put the examples and tests in a sub-folder of _adapters. Don´t try to integrate with the current test system, just make new tests with expected input/output pairs.
then, focus next on the back-conversion, while we think about #2957 and integration with the current testing framework (the soft dep question needs to be addressed first).

That would prevent us from getting stuck, and allows @AurumnPegasus to use the conversions already for gluonts neural networks.

(this is nothing related to the quality of your work - great work! This is only about removing a conditionality on integration to give @AurumnPegasus access to the functionality, and delaying the integration to a later piece of work)

TNTran92 · 2022-07-08T17:50:24Z

@TNTran92, I´ve thought about this. Given the complexity of some design questions, see #2957, I would suggest:

Let´s separate this PR from the datatypes testing framework., i.e., split off the _examples and possibly _registry.

I would recommend to:

put the examples and tests in a sub-folder of _adapters. Don´t try to integrate with the current test system, just make new tests with expected input/output pairs.

then, focus next on the back-conversion, while we think about [ENH] How to deal with mtypes that have soft dependencies? #2957 and integration with the current testing framework (the soft dep question needs to be addressed first).

That would prevent us from getting stuck, and allows @AurumnPegasus to use the conversions already for gluonts neural networks.

(this is nothing related to the quality of your work - great work! This is only about removing a conditionality on integration to give @AurumnPegasus access to the functionality, and delaying the integration to a later piece of work)

Understood. I will try to get the unit test and the back-conversion done as soon as possible.

TNTran92 · 2022-07-11T03:48:21Z

I have merge conflict that can't be resolved on this branch, so I will be closing this pull request and add the unit test plus everything done so far to the PR #2976

TNTran92 added 3 commits June 28, 2022 20:13

pyprojectfile commit

5535d4e

soft_dependencies added

a7e36b0

_data_io commit

013dc27

TNTran92 marked this pull request as ready for review June 29, 2022 02:48

TNTran92 requested review from aiwalter and fkiraly as code owners June 29, 2022 02:48

TNTran92 mentioned this pull request Jun 29, 2022

[ENH] data conversion adapters to gluonts #2860

Closed

fkiraly requested changes Jun 29, 2022

View reviewed changes

Merge branch 'main' into List_Dataset_2

b79b6c0

fkiraly linked an issue Jun 30, 2022 that may be closed by this pull request

[ENH] data conversion adapters to gluonts #2860

Closed

Merge branch 'main' into List_Dataset_2

dc7e9cf

Remove mtype check

4337c21

TNTran92 added 5 commits July 3, 2022 16:35

Merge branch 'main' into List_Dataset_2

e899e15

Merge branch 'List_Dataset_2' of https://github.com/TNTran92/sktime i…

f4f704b

…nto List_Dataset_2

convert_to

28a0940

Add freq infer and moved to datatypes.adapters

89f5970

Merge branch 'main' into List_Dataset_2

31f87f8

TNTran92 added 7 commits July 4, 2022 21:53

Modified __Init__.py

a1bb6d2

Add __init__

81acc14

Fix _adapter

a34f542

Wrong folder's name

c1f9a2a

Merge branch 'main' into List_Dataset_2

a983b53

Merge branch 'main' into List_Dataset_2

e3e32ba

Merge branch 'main' into List_Dataset_2

e79851c

Merge branch 'main' into List_Dataset_2

b739226

TNTran92 added 2 commits July 6, 2022 23:18

Remove redundant param

2bd16ba

Update _examples and _registry

cd08862

fkiraly assigned TNTran92 and fkiraly Jul 8, 2022

fkiraly mentioned this pull request Jul 8, 2022

[ENH] How to deal with mtypes that have soft dependencies? #2957

Closed

fkiraly requested changes Jul 8, 2022

View reviewed changes

TNTran92 added 2 commits July 10, 2022 21:24

Add unit test for conversion from multiindex to listdataset

4e88daa

Stage this commit

995b91c

TNTran92 requested review from MatthewMiddlehurst, TonyBagnall and patrickzib as code owners July 11, 2022 03:15

TNTran92 closed this Jul 11, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Adapter from multiindex to ListDataset #2893

[ENH] Adapter from multiindex to ListDataset #2893
TNTran92 wants to merge 23 commits intosktime:mainfrom
TNTran92:List_Dataset_2

TNTran92 commented Jun 29, 2022 •

edited

Loading

Uh oh!

fkiraly left a comment •

edited

Loading

Uh oh!

TNTran92 commented Jul 3, 2022

Uh oh!

TNTran92 commented Jul 3, 2022

Uh oh!

fkiraly commented Jul 3, 2022

Uh oh!

TNTran92 commented Jul 5, 2022 •

edited

Loading

Uh oh!

fkiraly commented Jul 6, 2022

Uh oh!

TNTran92 commented Jul 7, 2022

Uh oh!

TNTran92 commented Jul 7, 2022

Uh oh!

fkiraly commented Jul 8, 2022

Uh oh!

fkiraly left a comment •

edited

Loading

Uh oh!

TNTran92 commented Jul 8, 2022

Uh oh!

TNTran92 commented Jul 11, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

TNTran92 commented Jun 29, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Does your contribution introduce a new dependency? If yes, which one?

What should a reviewer concentrate their feedback on?

Any other comments?

PR checklist

For all contributions

For new estimators

Uh oh!

fkiraly left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TNTran92 commented Jul 3, 2022

Uh oh!

TNTran92 commented Jul 3, 2022

Uh oh!

fkiraly commented Jul 3, 2022

Uh oh!

TNTran92 commented Jul 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Jul 6, 2022

Uh oh!

TNTran92 commented Jul 7, 2022

Uh oh!

TNTran92 commented Jul 7, 2022

Uh oh!

fkiraly commented Jul 8, 2022

Uh oh!

fkiraly left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TNTran92 commented Jul 8, 2022

Uh oh!

TNTran92 commented Jul 11, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

TNTran92 commented Jun 29, 2022 •

edited

Loading

fkiraly left a comment •

edited

Loading

TNTran92 commented Jul 5, 2022 •

edited

Loading

fkiraly left a comment •

edited

Loading

TNTran92 commented Jul 11, 2022 •

edited

Loading