Skip to content

Fixed STC Bug#329

Merged
mloning merged 21 commits intosktime:devfrom
ABostrom:shapelet_bug
Jul 15, 2020
Merged

Fixed STC Bug#329
mloning merged 21 commits intosktime:devfrom
ABostrom:shapelet_bug

Conversation

@ABostrom
Copy link
Copy Markdown
Contributor

@ABostrom ABostrom commented Jul 2, 2020

Reference Issues/PRs

Fixes #321

What does this implement/fix? Explain your changes.

Checks if pd.series passed and converts to numpy for internal use.

Any other comments?

Added some additional tests for STC although may be unneccessary

Added fix to STC to check if pd.Series and convert
@ABostrom ABostrom changed the title added two shapelet tests to demonstrate pd/np fix. Fixed STC Bug Jul 2, 2020
@mloning
Copy link
Copy Markdown
Contributor

mloning commented Jul 2, 2020

Since you're on it, if you have time, it'd be great to make shapelet comply with our interface more generally see #257 :)

@ABostrom
Copy link
Copy Markdown
Contributor Author

ABostrom commented Jul 2, 2020

Okay. I'll figure out whatever that error is. I suspect my tests are playing up. I'll have a look at #257 but I'll probably do it tomorrow :)

@ABostrom
Copy link
Copy Markdown
Contributor Author

ABostrom commented Jul 6, 2020

@mloning I will get round to this, suddenly got loads of work to juggle at the moment. Hopefully try to get a PR done for end of the week

@ABostrom
Copy link
Copy Markdown
Contributor Author

ABostrom commented Jul 9, 2020

Okay @mloning noticed another bug with random seeding which im also fixing to make sure STC is correct. Taking longer than expected.

@ABostrom
Copy link
Copy Markdown
Contributor Author

ABostrom commented Jul 9, 2020

@mloning I also noticed in base.py for the Base Classifier the predict method doesn't called check_is_fitted(). See my newest commit

@mloning
Copy link
Copy Markdown
Contributor

mloning commented Jul 9, 2020

@ABostrom if you're feeling brave you can remove all Shapelet related estimators from this list here, this will run all standard unit tests on them.

You may have to change the test hyper-params to make it faster (e.g. contract time to 10 secs or so) in the same file in this dictionary.

@ABostrom
Copy link
Copy Markdown
Contributor Author

ABostrom commented Jul 9, 2020

oooh. scary, will see what happens.

ABostrom added 2 commits July 10, 2020 09:09
Fixed various issues with the checks. Idemptoency being a particular sticking point.
@ABostrom
Copy link
Copy Markdown
Contributor Author

So I think i've resolved the testing issues across all the shapelet family of algorithms. Learned alot about pytest so thanks for that @mloning :)

Copy link
Copy Markdown
Contributor

@mloning mloning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

few quick comments

X, y = check_X_y(X, y, enforce_univariate=True)
self.n_classes = np.unique(y).shape[0]
self.classes_ = class_distribution(np.asarray(y).reshape(-1, 1))[0][0]
_X, _y = check_X_y(X, y, enforce_univariate=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you create the underscore objects? Should work without the underscores, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

created the underscores as that was a suggestion you left me before about not overwriting input args
I'm happy to remove that.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually there was also a weird error, i think with the fact X was getting shuffled so i was getting an idemptoency failure because the shapes were mismatched.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah sorry I meant not overwriting hyper-parameters set in the constructor, input args can be overwritten

_y = _y.to_numpy()

# generate pipeline in fit so that random state can be propogated properly.
self.classifier = Pipeline([
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd call it self.classifier_, by sklearn convention, attributes changed in fit have a trailing underscore

# TODO fix estimators to pass all tests
EXCLUDED = [
'ContractedShapeletTransform',
# 'ContractedShapeletTransform',
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove all of the commented lines including the # 'MrSEQLClassifier' line to clean this up a bit

ABostrom added 3 commits July 10, 2020 12:21
…onds, rather than 6seconds.

This gives slightly less variability in contracting.
Also added a 25% of the runtime of the max shapelet runtime to give some leeway on multiple runtime incase of CPU discrepancies
@ABostrom
Copy link
Copy Markdown
Contributor Author

@mloning @jasonlines So I think the current build errors, show case the current differences between my tsml implementation and the sktime one. Between the same run of the same datasets you cannot guarantee that you will evaluate the same amount of shapelets in the same time frame. Especially on different architectures etc. Which results in idemptoency failures and inconsistent outputs on the same seeds / datasets.

Might need to re-include them in the exclusion list, and review this particular check for future contract approaches.

@mloning
Copy link
Copy Markdown
Contributor

mloning commented Jul 10, 2020

@ABostrom does this only apply to the contracted version? Or also the ShapeletTransformClassifier?

@ABostrom
Copy link
Copy Markdown
Contributor Author

@mloning shapelet transform classifier uses the contracted shapelets in a pipeline with radnom forest

@mloning
Copy link
Copy Markdown
Contributor

mloning commented Jul 11, 2020

Let's wait if @jasonlines has any idea how to fix this, otherwise we could try to write some code to skip/ignore the idempotency test for shapelets

@ABostrom
Copy link
Copy Markdown
Contributor Author

@mloning I agree, I will have a little look at making some changes to _config.py and test_all_estimators to have an exclusion list for certain estimators.

@mloning
Copy link
Copy Markdown
Contributor

mloning commented Jul 13, 2020

@ABostrom decorators may work and may be the cleanest solution, something along the following lines:

@skip_if(estimator=[ShapeletTransform])
def check_idempotency():
    ....

Pytest may even have decorators for that already

@ABostrom
Copy link
Copy Markdown
Contributor Author

my preference and ofc feel free to overrule on this is that all config for tests should stay in config. Seems like with decorators, whilst addmittedly a clean solution could make it difficult where to track down or reconfigure tests in the future.

EXCLUDED_FROM_TESTS = { "ShapeletTransformClassifier": [check_fit_idempotent], "ContractedShapeletTransform": [check_fit_idempotent], }

@mloning
Copy link
Copy Markdown
Contributor

mloning commented Jul 13, 2020

Sounds good to me!

@ABostrom
Copy link
Copy Markdown
Contributor Author

@mloning I'm not sure how this works. If i change the config / pytests would that run my version of the tests or the old versions through travis etc?

@mloning
Copy link
Copy Markdown
Contributor

mloning commented Jul 13, 2020

It'll run whatever is on the PR I think, of course you can run them locally first to see if everything is working, so feel free to change things and push to the same branch. You also need to sync with the latest changes on dev.

@ABostrom
Copy link
Copy Markdown
Contributor Author

Yeah they have all been passing locally. Might be I'm being stupid. will debug it.

@ABostrom
Copy link
Copy Markdown
Contributor Author

changed the config params for ST to make sure it can actually finish in a time manner suitable for the build/CI server

@ABostrom
Copy link
Copy Markdown
Contributor Author

@mloning if you can check it over Markus. I think it's looking pretty good. Turned into a bit more than some "simple" shapelet fix. As always these things do :)

Copy link
Copy Markdown
Contributor

@mloning mloning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ABostrom thanks for all the work! Looks good, left a few minor comments! Happy to merge afterwards :)


# if y is a pd.series then convert to array.
if isinstance(_y, pd.Series):
_y = _y.to_numpy()
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not also simply have y instead of _y?

# from sktime.datasets import load_italy_power_demand


# def test_stc_with_pd():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please remove the file if no longer needed - if we want to support y as a np.array we should probably add this to our collection of estimator checks and run it on all

This estimator
"""
X = check_X(X, enforce_univariate=True)
_X, _y = check_X_y(X, y, enforce_univariate=True)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, having X and y should be fine no (without the underscores)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just makes it consistent with most other code in sklearn and sktime



def check_estimator(Estimator):
def check_estimator(Estimator, Check_Exclusions=None):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for args please use lower case, so check_exclusions or simply exclude

check(Estimator)

# check if associated test is not included in the exclusion list
if not Check_Exclusions.__contains__(check.__name__):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not this? :)

Suggested change
if not Check_Exclusions.__contains__(check.__name__):
if check.__name__ not in Check_Exclusions:

from sktime.transformers.series_as_features.interpolate import TSInterpolator

# TODO fix estimators to pass all tests
EXCLUDED = [
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe rename to EXCLUDED_ESTIMATORS

'ShapeletTransformClassifier',
]

EXCLUDED_FROM_TESTS = {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this one to EXCLUDED_TESTS

@ABostrom
Copy link
Copy Markdown
Contributor Author

Will do these now. Was just sorting a PR for sktime-dl. but now thats done, I'll tidy this up.

@mloning
Copy link
Copy Markdown
Contributor

mloning commented Jul 15, 2020

@ABostrom looks all good to me now! Thanks for the clean up of shapelets! :) Ready to merge when you are.

@ABostrom
Copy link
Copy Markdown
Contributor Author

good to go Markus. Think it looks pretty comprehensive.

@ABostrom ABostrom deleted the shapelet_bug branch July 15, 2020 11:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants