Conversation
Added fix to STC to check if pd.Series and convert
|
Since you're on it, if you have time, it'd be great to make shapelet comply with our interface more generally see #257 :) |
|
Okay. I'll figure out whatever that error is. I suspect my tests are playing up. I'll have a look at #257 but I'll probably do it tomorrow :) |
|
@mloning I will get round to this, suddenly got loads of work to juggle at the moment. Hopefully try to get a PR done for end of the week |
|
Okay @mloning noticed another bug with random seeding which im also fixing to make sure STC is correct. Taking longer than expected. |
Added tests for verifying correctness of STC and underlying rotf/contracted transform.
linting error fix
|
@mloning I also noticed in base.py for the Base Classifier the predict method doesn't called check_is_fitted(). See my newest commit |
|
@ABostrom if you're feeling brave you can remove all Shapelet related estimators from this list here, this will run all standard unit tests on them. You may have to change the test hyper-params to make it faster (e.g. contract time to 10 secs or so) in the same file in this dictionary. |
|
oooh. scary, will see what happens. |
Fixed various issues with the checks. Idemptoency being a particular sticking point.
…to avoid seeding issue.
|
So I think i've resolved the testing issues across all the shapelet family of algorithms. Learned alot about pytest so thanks for that @mloning :) |
| X, y = check_X_y(X, y, enforce_univariate=True) | ||
| self.n_classes = np.unique(y).shape[0] | ||
| self.classes_ = class_distribution(np.asarray(y).reshape(-1, 1))[0][0] | ||
| _X, _y = check_X_y(X, y, enforce_univariate=True) |
There was a problem hiding this comment.
Why do you create the underscore objects? Should work without the underscores, no?
There was a problem hiding this comment.
created the underscores as that was a suggestion you left me before about not overwriting input args
I'm happy to remove that.
There was a problem hiding this comment.
actually there was also a weird error, i think with the fact X was getting shuffled so i was getting an idemptoency failure because the shapes were mismatched.
There was a problem hiding this comment.
ah sorry I meant not overwriting hyper-parameters set in the constructor, input args can be overwritten
| _y = _y.to_numpy() | ||
|
|
||
| # generate pipeline in fit so that random state can be propogated properly. | ||
| self.classifier = Pipeline([ |
There was a problem hiding this comment.
I'd call it self.classifier_, by sklearn convention, attributes changed in fit have a trailing underscore
sktime/tests/_config.py
Outdated
| # TODO fix estimators to pass all tests | ||
| EXCLUDED = [ | ||
| 'ContractedShapeletTransform', | ||
| # 'ContractedShapeletTransform', |
There was a problem hiding this comment.
please remove all of the commented lines including the # 'MrSEQLClassifier' line to clean this up a bit
…onds, rather than 6seconds. This gives slightly less variability in contracting. Also added a 25% of the runtime of the max shapelet runtime to give some leeway on multiple runtime incase of CPU discrepancies
|
@mloning @jasonlines So I think the current build errors, show case the current differences between my tsml implementation and the sktime one. Between the same run of the same datasets you cannot guarantee that you will evaluate the same amount of shapelets in the same time frame. Especially on different architectures etc. Which results in idemptoency failures and inconsistent outputs on the same seeds / datasets. Might need to re-include them in the exclusion list, and review this particular check for future contract approaches. |
|
@ABostrom does this only apply to the contracted version? Or also the |
|
@mloning shapelet transform classifier uses the contracted shapelets in a pipeline with radnom forest |
|
Let's wait if @jasonlines has any idea how to fix this, otherwise we could try to write some code to skip/ignore the idempotency test for shapelets |
|
@mloning I agree, I will have a little look at making some changes to _config.py and test_all_estimators to have an exclusion list for certain estimators. |
|
@ABostrom decorators may work and may be the cleanest solution, something along the following lines: @skip_if(estimator=[ShapeletTransform])
def check_idempotency():
....Pytest may even have decorators for that already |
|
my preference and ofc feel free to overrule on this is that all config for tests should stay in config. Seems like with decorators, whilst addmittedly a clean solution could make it difficult where to track down or reconfigure tests in the future.
|
|
Sounds good to me! |
|
@mloning I'm not sure how this works. If i change the config / pytests would that run my version of the tests or the old versions through travis etc? |
|
It'll run whatever is on the PR I think, of course you can run them locally first to see if everything is working, so feel free to change things and push to the same branch. You also need to sync with the latest changes on dev. |
|
Yeah they have all been passing locally. Might be I'm being stupid. will debug it. |
|
changed the config params for ST to make sure it can actually finish in a time manner suitable for the build/CI server |
|
@mloning if you can check it over Markus. I think it's looking pretty good. Turned into a bit more than some "simple" shapelet fix. As always these things do :) |
|
|
||
| # if y is a pd.series then convert to array. | ||
| if isinstance(_y, pd.Series): | ||
| _y = _y.to_numpy() |
There was a problem hiding this comment.
why not also simply have y instead of _y?
| # from sktime.datasets import load_italy_power_demand | ||
|
|
||
|
|
||
| # def test_stc_with_pd(): |
There was a problem hiding this comment.
please remove the file if no longer needed - if we want to support y as a np.array we should probably add this to our collection of estimator checks and run it on all
| This estimator | ||
| """ | ||
| X = check_X(X, enforce_univariate=True) | ||
| _X, _y = check_X_y(X, y, enforce_univariate=True) |
There was a problem hiding this comment.
same here, having X and y should be fine no (without the underscores)?
There was a problem hiding this comment.
just makes it consistent with most other code in sklearn and sktime
|
|
||
|
|
||
| def check_estimator(Estimator): | ||
| def check_estimator(Estimator, Check_Exclusions=None): |
There was a problem hiding this comment.
for args please use lower case, so check_exclusions or simply exclude
| check(Estimator) | ||
|
|
||
| # check if associated test is not included in the exclusion list | ||
| if not Check_Exclusions.__contains__(check.__name__): |
There was a problem hiding this comment.
Why not this? :)
| if not Check_Exclusions.__contains__(check.__name__): | |
| if check.__name__ not in Check_Exclusions: |
sktime/tests/_config.py
Outdated
| from sktime.transformers.series_as_features.interpolate import TSInterpolator | ||
|
|
||
| # TODO fix estimators to pass all tests | ||
| EXCLUDED = [ |
There was a problem hiding this comment.
maybe rename to EXCLUDED_ESTIMATORS
sktime/tests/_config.py
Outdated
| 'ShapeletTransformClassifier', | ||
| ] | ||
|
|
||
| EXCLUDED_FROM_TESTS = { |
There was a problem hiding this comment.
and this one to EXCLUDED_TESTS
|
Will do these now. Was just sorting a PR for sktime-dl. but now thats done, I'll tidy this up. |
|
@ABostrom looks all good to me now! Thanks for the clean up of shapelets! :) Ready to merge when you are. |
|
good to go Markus. Think it looks pretty comprehensive. |
Reference Issues/PRs
Fixes #321
What does this implement/fix? Explain your changes.
Checks if pd.series passed and converts to numpy for internal use.
Any other comments?
Added some additional tests for STC although may be unneccessary