Fixed STC Bug by ABostrom · Pull Request #329 · sktime/sktime

ABostrom · 2020-07-02T14:26:51Z

Reference Issues/PRs

Fixes #321

What does this implement/fix? Explain your changes.

Checks if pd.series passed and converts to numpy for internal use.

Any other comments?

Added some additional tests for STC although may be unneccessary

Added fix to STC to check if pd.Series and convert

mloning · 2020-07-02T14:52:02Z

Since you're on it, if you have time, it'd be great to make shapelet comply with our interface more generally see #257 :)

ABostrom · 2020-07-02T16:02:14Z

Okay. I'll figure out whatever that error is. I suspect my tests are playing up. I'll have a look at #257 but I'll probably do it tomorrow :)

ABostrom · 2020-07-06T13:09:09Z

@mloning I will get round to this, suddenly got loads of work to juggle at the moment. Hopefully try to get a PR done for end of the week

ABostrom · 2020-07-09T15:07:47Z

Okay @mloning noticed another bug with random seeding which im also fixing to make sure STC is correct. Taking longer than expected.

Added tests for verifying correctness of STC and underlying rotf/contracted transform.

linting error fix

ABostrom · 2020-07-09T15:35:39Z

@mloning I also noticed in base.py for the Base Classifier the predict method doesn't called check_is_fitted(). See my newest commit

mloning · 2020-07-09T16:02:40Z

@ABostrom if you're feeling brave you can remove all Shapelet related estimators from this list here, this will run all standard unit tests on them.

You may have to change the test hyper-params to make it faster (e.g. contract time to 10 secs or so) in the same file in this dictionary.

ABostrom · 2020-07-09T17:50:58Z

oooh. scary, will see what happens.

Fixed various issues with the checks. Idemptoency being a particular sticking point.

…to avoid seeding issue.

ABostrom · 2020-07-10T08:46:53Z

So I think i've resolved the testing issues across all the shapelet family of algorithms. Learned alot about pytest so thanks for that @mloning :)

mloning

few quick comments

mloning · 2020-07-10T10:14:09Z

sktime/classification/shapelet_based/_stc.py

-        X, y = check_X_y(X, y, enforce_univariate=True)
-        self.n_classes = np.unique(y).shape[0]
-        self.classes_ = class_distribution(np.asarray(y).reshape(-1, 1))[0][0]
+        _X, _y = check_X_y(X, y, enforce_univariate=True)


Why do you create the underscore objects? Should work without the underscores, no?

created the underscores as that was a suggestion you left me before about not overwriting input args
I'm happy to remove that.

actually there was also a weird error, i think with the fact X was getting shuffled so i was getting an idemptoency failure because the shapes were mismatched.

ah sorry I meant not overwriting hyper-parameters set in the constructor, input args can be overwritten

mloning · 2020-07-10T10:15:25Z

sktime/classification/shapelet_based/_stc.py

+            _y = _y.to_numpy()
+
+# generate pipeline in fit so that random state can be propogated properly.
+        self.classifier = Pipeline([


I'd call it self.classifier_, by sklearn convention, attributes changed in fit have a trailing underscore

mloning · 2020-07-10T10:16:28Z

sktime/tests/_config.py

 # TODO fix estimators to pass all tests
 EXCLUDED = [
-    'ContractedShapeletTransform',
+    # 'ContractedShapeletTransform',


please remove all of the commented lines including the # 'MrSEQLClassifier' line to clean this up a bit

…onds, rather than 6seconds. This gives slightly less variability in contracting. Also added a 25% of the runtime of the max shapelet runtime to give some leeway on multiple runtime incase of CPU discrepancies

ABostrom · 2020-07-10T13:57:17Z

@mloning @jasonlines So I think the current build errors, show case the current differences between my tsml implementation and the sktime one. Between the same run of the same datasets you cannot guarantee that you will evaluate the same amount of shapelets in the same time frame. Especially on different architectures etc. Which results in idemptoency failures and inconsistent outputs on the same seeds / datasets.

Might need to re-include them in the exclusion list, and review this particular check for future contract approaches.

mloning · 2020-07-10T15:02:43Z

@ABostrom does this only apply to the contracted version? Or also the ShapeletTransformClassifier?

ABostrom · 2020-07-10T15:59:47Z

@mloning shapelet transform classifier uses the contracted shapelets in a pipeline with radnom forest

mloning · 2020-07-11T16:22:24Z

Let's wait if @jasonlines has any idea how to fix this, otherwise we could try to write some code to skip/ignore the idempotency test for shapelets

ABostrom · 2020-07-13T09:11:07Z

@mloning I agree, I will have a little look at making some changes to _config.py and test_all_estimators to have an exclusion list for certain estimators.

mloning · 2020-07-13T09:21:31Z

@ABostrom decorators may work and may be the cleanest solution, something along the following lines:

@skip_if(estimator=[ShapeletTransform])
def check_idempotency():
    ....

Pytest may even have decorators for that already

ABostrom · 2020-07-13T09:44:14Z

my preference and ofc feel free to overrule on this is that all config for tests should stay in config. Seems like with decorators, whilst addmittedly a clean solution could make it difficult where to track down or reconfigure tests in the future.

EXCLUDED_FROM_TESTS = { "ShapeletTransformClassifier": [check_fit_idempotent], "ContractedShapeletTransform": [check_fit_idempotent], }

… certain tests.

mloning · 2020-07-13T11:07:41Z

Sounds good to me!

ABostrom · 2020-07-13T11:13:23Z

@mloning I'm not sure how this works. If i change the config / pytests would that run my version of the tests or the old versions through travis etc?

mloning · 2020-07-13T11:16:06Z

It'll run whatever is on the PR I think, of course you can run them locally first to see if everything is working, so feel free to change things and push to the same branch. You also need to sync with the latest changes on dev.

ABostrom · 2020-07-13T11:17:46Z

Yeah they have all been passing locally. Might be I'm being stupid. will debug it.

…anner

ABostrom · 2020-07-13T17:05:53Z

changed the config params for ST to make sure it can actually finish in a time manner suitable for the build/CI server

ABostrom · 2020-07-14T08:00:25Z

@mloning if you can check it over Markus. I think it's looking pretty good. Turned into a bit more than some "simple" shapelet fix. As always these things do :)

mloning

Hi @ABostrom thanks for all the work! Looks good, left a few minor comments! Happy to merge afterwards :)

mloning · 2020-07-14T08:11:29Z

sktime/classification/shapelet_based/_stc.py

+
+        # if y is a pd.series then convert to array.
+        if isinstance(_y, pd.Series):
+            _y = _y.to_numpy()


why not also simply have y instead of _y?

mloning · 2020-07-14T08:12:29Z

sktime/classification/shapelet_based/tests/test_stc.py

+# from sktime.datasets import load_italy_power_demand
+
+
+# def test_stc_with_pd():


please remove the file if no longer needed - if we want to support y as a np.array we should probably add this to our collection of estimator checks and run it on all

mloning · 2020-07-14T08:13:47Z

sktime/transformers/series_as_features/shapelets.py

            This estimator
        """
-        X = check_X(X, enforce_univariate=True)
+        _X, _y = check_X_y(X, y, enforce_univariate=True)


same here, having X and y should be fine no (without the underscores)?

just makes it consistent with most other code in sklearn and sktime

mloning · 2020-07-14T08:34:50Z

sktime/utils/_testing/estimator_checks.py



-def check_estimator(Estimator):
+def check_estimator(Estimator, Check_Exclusions=None):


for args please use lower case, so check_exclusions or simply exclude

mloning · 2020-07-14T08:37:04Z

sktime/utils/_testing/estimator_checks.py

-        check(Estimator)
+
+        # check if associated test is not included in the exclusion list
+        if not Check_Exclusions.__contains__(check.__name__):


Why not this? :)

Suggested change

if not Check_Exclusions.__contains__(check.__name__):

if check.__name__ not in Check_Exclusions:

mloning · 2020-07-14T08:37:56Z

sktime/tests/_config.py

-from sktime.transformers.series_as_features.interpolate import TSInterpolator

 # TODO fix estimators to pass all tests
 EXCLUDED = [


maybe rename to EXCLUDED_ESTIMATORS

mloning · 2020-07-14T08:38:11Z

sktime/tests/_config.py

-    'ShapeletTransformClassifier',
 ]

+EXCLUDED_FROM_TESTS = {


and this one to EXCLUDED_TESTS

ABostrom · 2020-07-14T18:10:27Z

Will do these now. Was just sorting a PR for sktime-dl. but now thats done, I'll tidy this up.

mloning · 2020-07-15T07:39:45Z

@ABostrom looks all good to me now! Thanks for the clean up of shapelets! :) Ready to merge when you are.

ABostrom · 2020-07-15T08:57:36Z

good to go Markus. Think it looks pretty comprehensive.

added two shapelet tests to demonstrate pd/np fix.

aed356f

Added fix to STC to check if pd.Series and convert

ABostrom changed the title ~~added two shapelet tests to demonstrate pd/np fix.~~ Fixed STC Bug Jul 2, 2020

Merge branch 'dev' into shapelet_bug

22eb167

ABostrom added 5 commits July 9, 2020 16:18

fixed seeding issue in STC.

8a7c05e

Added tests for verifying correctness of STC and underlying rotf/contracted transform.

Merge branch 'dev' into shapelet_bug

a586d01

input arg checking to conform with new style

1c6c26a

tidy up

236115f

added self.check_is_fitted() to base.py for baseclassifier

7974b24

linting error fix

ABostrom added 2 commits July 10, 2020 09:09

removed shapelets from the exclusions list.

0dbf8e5

Fixed various issues with the checks. Idemptoency being a particular sticking point.

fixed idemptoency issue for STC. Needed to construct pipeline in fit …

69dfdf9

…to avoid seeding issue.

linting error

b1ff99d

mloning reviewed Jul 10, 2020

View reviewed changes

ABostrom added 3 commits July 10, 2020 12:21

tidy up

864eced

changed the runtime to be slightly longer when testing. so it's 10sec…

3e4be2c

…onds, rather than 6seconds. This gives slightly less variability in contracting. Also added a 25% of the runtime of the max shapelet runtime to give some leeway on multiple runtime incase of CPU discrepancies

updated timing to be more concise

694c26c

additional changes to allow config to exclude certain estimators from…

8db084d

… certain tests.

ABostrom added 4 commits July 13, 2020 12:33

checking against wrong types in the dict

0cb321d

Merge branch 'dev' into shapelet_bug

3399691

_stc tests taking too long to complete

fd597f4

changed shapelet transform params to ensure it finishes in a timely m…

1079b52

…anner

mloning reviewed Jul 14, 2020

View reviewed changes

ABostrom added 3 commits July 14, 2020 19:16

clean ups and consistency

03c79ef

missed one.

70f768e

missed a refactor

87487fb

ABostrom deleted the shapelet_bug branch July 15, 2020 11:06

		# from sktime.datasets import load_italy_power_demand


		# def test_stc_with_pd():



		def check_estimator(Estimator):
		def check_estimator(Estimator, Check_Exclusions=None):

	if not Check_Exclusions.__contains__(check.__name__):
	if check.__name__ not in Check_Exclusions:

Uh oh!

Conversation

ABostrom commented Jul 2, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

mloning commented Jul 2, 2020

Uh oh!

ABostrom commented Jul 2, 2020

Uh oh!

ABostrom commented Jul 6, 2020

Uh oh!

ABostrom commented Jul 9, 2020

Uh oh!

ABostrom commented Jul 9, 2020

Uh oh!

mloning commented Jul 9, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ABostrom commented Jul 9, 2020

Uh oh!

ABostrom commented Jul 10, 2020

Uh oh!

mloning left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ABostrom commented Jul 10, 2020

Uh oh!

mloning commented Jul 10, 2020

Uh oh!

ABostrom commented Jul 10, 2020

Uh oh!

mloning commented Jul 11, 2020

Uh oh!

ABostrom commented Jul 13, 2020

Uh oh!

mloning commented Jul 13, 2020

Uh oh!

ABostrom commented Jul 13, 2020

Uh oh!

mloning commented Jul 13, 2020

Uh oh!

ABostrom commented Jul 13, 2020

Uh oh!

mloning commented Jul 13, 2020

Uh oh!

ABostrom commented Jul 13, 2020

Uh oh!

ABostrom commented Jul 13, 2020

Uh oh!

ABostrom commented Jul 14, 2020

Uh oh!

mloning left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ABostrom commented Jul 2, 2020 •

edited

Loading

mloning commented Jul 9, 2020 •

edited

Loading