Skip to content

[ENH] Adapter from multiindex to ListDataset #2893

Closed
TNTran92 wants to merge 23 commits intosktime:mainfrom
TNTran92:List_Dataset_2
Closed

[ENH] Adapter from multiindex to ListDataset #2893
TNTran92 wants to merge 23 commits intosktime:mainfrom
TNTran92:List_Dataset_2

Conversation

@TNTran92
Copy link
Copy Markdown
Contributor

@TNTran92 TNTran92 commented Jun 29, 2022

Reference Issues/PRs

Partial solution to Issue #2860

What does this implement/fix? Explain your changes.

Implement an adapter to convert multiindex format to ListDataset in gluon-ts
The method takes as input a pd-multiindex DataFrame, (optional) a list of categorical feature, (optional) the startdate and (optional) frequency of the dataset.

Does your contribution introduce a new dependency? If yes, which one?

Dependency: gluon-ts

What should a reviewer concentrate their feedback on?

Testing with different dataset in sktime to weed out the bugs. So far, Arrowhead, ItalyPoowerDemand and StandWalkJump have been tested.

Any other comments?

PR checklist

For all contributions
  • I've added myself to the list of contributors.
  • Optionally, I've updated sktime's CODEOWNERS to receive notifications about future changes to these files.
  • I've added unit tests and made sure they pass locally.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG] indicating whether the PR topic is related to enhancement, maintenance, documentation, or bug.
For new estimators
  • I've added the estimator to the online documentation.
  • I've updated the existing example notebooks or provided a new one to showcase how my estimator works.

@TNTran92 TNTran92 marked this pull request as ready for review June 29, 2022 02:48
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

May I suggest some changes:

  • remove dependency on convert_to and check_is. I'll integrate it into the conversion framework later. Assume the input is pd-multiindex.
  • do you need to input frequency? Can that not be inferred from the pandas input time index? This should be inferred and appropriately translated, instead of being overwritten, in the vanilla call.
  • could you move this to datatypes.adapters?
  • a back-conversion function would be nice, we need that for forecasts, do we not? Maybe a separate PR
  • can you kindly add tests please, with example input and outputs?

@fkiraly fkiraly linked an issue Jun 30, 2022 that may be closed by this pull request
@TNTran92
Copy link
Copy Markdown
Contributor Author

TNTran92 commented Jul 3, 2022

  • could you move this to datatypes.adapters?

I don't see this in the repo. Can you help me locate it?

@TNTran92
Copy link
Copy Markdown
Contributor Author

TNTran92 commented Jul 3, 2022

  • a back-conversion function would be nice, we need that for forecasts, do we not? Maybe a separate PR

Yes, this will be in a separate PR. That one is in progress.

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jul 3, 2022

I don't see this in the repo. Can you help me locate it?

It does not exist, sorry for being unclear - I suggest creating a new submodule, so we can isolate soft dependencies there.

@TNTran92
Copy link
Copy Markdown
Contributor Author

TNTran92 commented Jul 5, 2022

  • remove dependency on convert_to and check_is. I'll integrate it into the conversion framework later. Assume the input is pd-multiindex.
  • do you need to input frequency? Can that not be inferred from the pandas input time index? This should be inferred and appropriately translated, instead of being overwritten, in the vanilla call.
  • could you move this to datatypes.adapters?
  • a back-conversion function would be nice, we need that for forecasts, do we not? Maybe a separate PR
  • can you kindly add tests please, with example input and outputs?
  • I use convert_to because the code works with nested pd.Series. If I remove convert_to, I will have to write my own code somewhere in the function that will do something similar.
  • Frequency and start_date are now inferred from input data.
  • The converter has been moved to datatypes._adapters
  • Lastly, I am still working on making a pytest...

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jul 6, 2022

Lastly, I am still working on making a pytest...

I can help with the testing, there is a conversion testing framework already in datatypes. What it needs are examples that are the same in the format - so you could add examples to the panel._examples module?

@TNTran92
Copy link
Copy Markdown
Contributor Author

TNTran92 commented Jul 7, 2022

Lastly, I am still working on making a pytest...

I can help with the testing, there is a conversion testing framework already in datatypes. What it needs are examples that are the same in the format - so you could add examples to the panel._examples module?

I added 2 examples: 1 for univariate and 1 for multivariate. Also added listdataset into registry. I'm still trying to figure out pytest

@TNTran92
Copy link
Copy Markdown
Contributor Author

TNTran92 commented Jul 7, 2022

This is for my information only. After adding datatypes example, encounter
` def test_check_metadata_inference(scitype, mtype, fixture_index):
"""Tests that check_is_mtype correctly infers metadata of examples.

    Parameters
    ----------
    scitype : str - scitype of fixture
    mtype : str - mtype of fixture
    fixture_index : int - index of fixture tuple with that scitype and mtype

    Raises
    ------
    RuntimeError if scitype is not defined or has no mtypes or examples
    AssertionError if example metadata is not correctly inferred
    error if check itself raises an error
    """
    # retrieve fixture for checking
  fixture, _, expected_metadata = get_examples(
        mtype=mtype, as_scitype=scitype, return_metadata=True
    ).get(fixture_index)

E TypeError: cannot unpack non-iterable NoneType object

fixture_index = 1
mtype = 'listdataset'
scitype = 'Panel'

sktime/datatypes/tests/test_check.py:153: TypeError

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jul 8, 2022

for the record, this is as we discussed today in person:

the _examples are of a form where every "abstract example" should be spelled out in each of the mtypes. These should either be None or some data container. You added examples with index 3 and 4, so those are not available in the other mtypes.

A secondary question I have is how we deal with mtypes that have soft dependencies, I need to think about that more carefully.

Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@TNTran92, I´ve thought about this.
Given the complexity of some design questions, see #2957, I would suggest:

Let´s separate this PR from the datatypes testing framework., i.e., split off the _examples and possibly _registry.

I would recommend to:

  • put the examples and tests in a sub-folder of _adapters. Don´t try to integrate with the current test system, just make new tests with expected input/output pairs.
  • then, focus next on the back-conversion, while we think about #2957 and integration with the current testing framework (the soft dep question needs to be addressed first).

That would prevent us from getting stuck, and allows @AurumnPegasus to use the conversions already for gluonts neural networks.

(this is nothing related to the quality of your work - great work! This is only about removing a conditionality on integration to give @AurumnPegasus access to the functionality, and delaying the integration to a later piece of work)

@TNTran92
Copy link
Copy Markdown
Contributor Author

TNTran92 commented Jul 8, 2022

@TNTran92, I´ve thought about this. Given the complexity of some design questions, see #2957, I would suggest:

Let´s separate this PR from the datatypes testing framework., i.e., split off the _examples and possibly _registry.

I would recommend to:

  • put the examples and tests in a sub-folder of _adapters. Don´t try to integrate with the current test system, just make new tests with expected input/output pairs.
  • then, focus next on the back-conversion, while we think about [ENH] How to deal with mtypes that have soft dependencies? #2957 and integration with the current testing framework (the soft dep question needs to be addressed first).

That would prevent us from getting stuck, and allows @AurumnPegasus to use the conversions already for gluonts neural networks.

(this is nothing related to the quality of your work - great work! This is only about removing a conditionality on integration to give @AurumnPegasus access to the functionality, and delaying the integration to a later piece of work)

Understood. I will try to get the unit test and the back-conversion done as soon as possible.

@TNTran92
Copy link
Copy Markdown
Contributor Author

TNTran92 commented Jul 11, 2022

I have merge conflict that can't be resolved on this branch, so I will be closing this pull request and add the unit test plus everything done so far to the PR #2976

@TNTran92 TNTran92 closed this Jul 11, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[ENH] data conversion adapters to gluonts

2 participants