[BUG] Fix bug when predicting segments from clasp change point annotator#6756
Conversation
fkiraly
left a comment
There was a problem hiding this comment.
we should add a test - at least for clasp, better for all annotators
|
I am having a go at writing a test for this but there is another issue. When making the big changes to >>> from sktime.annotation.clasp import ClaSPSegmentation
>>> from sktime.utils._testing.annotation import make_annotation_problem
>>> X_train = make_annotation_problem(n_timepoints=10)
>>> print(X_train)
2000-01-01 1.508174
2000-01-02 2.079181
2000-01-03 1.923736
2000-01-04 2.180592
2000-01-05 1.000000
2000-01-06 1.957757
2000-01-07 2.144369
2000-01-08 3.097359
2000-01-09 2.145177
2000-01-10 4.231551
Freq: D, dtype: float64
>>> clasp = ClaSPSegmentation(n_cps=1)
>>> clasp.fit(X_train)
>>> points = clasp.predict_points(X_train)
>>> print(points)
Freq: D, dtype: float64
0 0
>>> clasp.change_points_to_segments(points, start=X_train.index[0], end=X_train.index[-1])
Traceback (most recent call last):
File "/home/alex/documents/sktime/main.py", line 13, in <module>
clasp.change_points_to_segments(points, X_train.index[0], X_train.index[-1])
File "/home/alex/documents/sktime/sktime/annotation/base/_base.py", line 644, in change_points_to_segments
index = pd.IntervalIndex.from_breaks(breaks, copy=True, closed="left")
File "/home/alex/documents/sktime/.venv/lib/python3.10/site-packages/pandas/core/indexes/interval.py", line 274, in from_breaks
array = IntervalArray.from_breaks(
File "/home/alex/documents/sktime/.venv/lib/python3.10/site-packages/pandas/core/arrays/interval.py", line 463, in from_breaks
return cls.from_arrays(breaks[:-1], breaks[1:], closed, copy=copy, dtype=dtype)
File "/home/alex/documents/sktime/.venv/lib/python3.10/site-packages/pandas/core/arrays/interval.py", line 544, in from_arrays
left, right, dtype = cls._ensure_simple_new_inputs(
File "/home/alex/documents/sktime/.venv/lib/python3.10/site-packages/pandas/core/arrays/interval.py", line 352, in _ensure_simple_new_inputs
raise TypeError(msg)
TypeError: category, object, and string subtypes are not supported for IntervalIndexThe output from Possible Solutions
I prefer option 1. |
|
what is the current behaviour? I would suggest to keep it downwards compatible. I also did not think about the loc/iloc issue here, loc indeed seems more general. Are you saying that currently the return is always iloc? Either way, I think we should start thinking about a test suite for systematic testing, the devil is in the details. |
|
The current behaviour is that clasp specifically returns Here is an example of >>> import pandas as pd
>>> from sktime.annotation.clasp import ClaSPSegmentation
>>> index = [i * 5 for i in range(100)]
>>> x = pd.Series([1] * 50 + [5] * 50, index=index)
>>> clasp = ClaSPSegmentation(n_cps=1)
>>> clasp.fit(x)
>>> points = clasp.predict_points(x)
>>> print(points)
0 52
dtype: int64The returned index of the change point should be a multiple of 5 if Do you think we should start the rigorous test suite in this MR? I think it would be better in a separate issue and MR. I will write a simple test just for clasp for the time being to see if that is sufficient to satisfty this issue with the aim to start of more rigorous test suite in a different issue and MR. |
Agree, better in a separate MR - let's start. |
I can't do this Friday. Can we do next Friday? |
|
sure! |
* upstream/main: [ENH] Allow object dtype in series (sktime#5886) [ENH] `check_pdmultiindex_panel` to return names of invalid `object` columns if there are any (sktime#6797) [ENH] added feature_kind metadata in datatype checks (sktime#6490) [ENH] Adding tag for categorical support in `X` (sktime#6704) [BUG] Fix bug when predicting segments from clasp change point annotator (sktime#6756) [ENH] Adding support for GluonTS' PandasDataset object (sktime#6668)
commit 9f95d89 Merge: 02bd3fb 6caeb0d Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Tue Jul 30 22:01:03 2024 +0530 Merge branch 'main' into step3-4_categorical_support commit 02bd3fb Merge: 53ee977 a21d97f Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Mon Jul 29 21:47:41 2024 +0530 Merge branch 'main' into step3-4_categorical_support commit 53ee977 Merge: 74fd3f8 9b066f2 Author: Franz Király <f.kiraly@ucl.ac.uk> Date: Sun Jul 28 15:19:13 2024 +0100 Merge branch 'main' into pr/6732 commit 74fd3f8 Merge: e9f5e2b 116faa2 Author: Franz Király <f.kiraly@ucl.ac.uk> Date: Sun Jul 28 15:18:26 2024 +0100 Merge branch 'main' into pr/6732 commit e9f5e2b Author: Abhay-Lejith <Abhay-Lejith@users.noreply.github.com> Date: Sun Jul 28 10:32:15 2024 +0000 [AUTOMATED] update CONTRIBUTORS.md commit f3e212b Merge: 95c213b 81d2d08 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sun Jul 28 16:01:12 2024 +0530 Merge branch 'main' into step3-4_categorical_support commit 95c213b Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sun Jul 28 15:59:02 2024 +0530 added est.update to test_categorical commit 8d14bde Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sat Jul 27 09:35:51 2024 +0530 typo, docs commit 81b4862 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Thu Jul 25 22:27:00 2024 +0530 separate test for categorical in y commit 23b4279 Merge: a0d950d 3a55ca5 Author: Anirban Ray <39331844+yarnabrina@users.noreply.github.com> Date: Sat Jul 20 09:29:27 2024 +0530 Merge remote-tracking branch 'upstream/main' into pr/Abhay-Lejith/6732 * upstream/main: [ENH] Allow object dtype in series (sktime#5886) [ENH] `check_pdmultiindex_panel` to return names of invalid `object` columns if there are any (sktime#6797) [ENH] added feature_kind metadata in datatype checks (sktime#6490) [ENH] Adding tag for categorical support in `X` (sktime#6704) [BUG] Fix bug when predicting segments from clasp change point annotator (sktime#6756) [ENH] Adding support for GluonTS' PandasDataset object (sktime#6668) commit a0d950d Merge: 2d77fd6 1fa8cfc Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Wed Jul 17 22:03:27 2024 +0530 Merge branch 'main' into step3-4_categorical_support commit 2d77fd6 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sat Jul 13 18:25:44 2024 +0530 test error fix commit 194ec28 Merge: 143a176 23bbb9b Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sat Jul 13 17:16:00 2024 +0530 Merge updated branch 'feature_kind' with timedelta dtype added commit 143a176 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sat Jul 13 17:11:59 2024 +0530 added tests for categorical commit 23bbb9b Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Fri Jul 12 12:31:52 2024 +0530 added timedelta to DATETIME category commit f926c58 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Fri Jul 12 10:40:29 2024 +0530 added ignores-exog-X check to test for invalid X input commit 460caba Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Fri Jul 12 09:32:09 2024 +0530 added checks in transformers commit 08aefa2 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Thu Jul 11 13:16:09 2024 +0530 added categorical checks in other base classes commit 0812220 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Tue Jul 9 13:29:22 2024 +0530 linting blank line fix commit 5f3be53 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Tue Jul 9 13:16:07 2024 +0530 added feature_kind to req metadata in early classifiers commit 1d7e443 Merge: 7377837 b1f6167 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Tue Jul 9 13:11:34 2024 +0530 Merge branch updated 'feature_kind' with xarray and dask metadata support commit b1f6167 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Tue Jul 9 13:00:36 2024 +0530 added metadata for xarray and dask with tests commit 0905bbb Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Mon Jul 8 13:23:36 2024 +0530 dataloader test error workaround commit 7377837 Merge: d36e05a 8652ba9 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Mon Jul 8 12:33:23 2024 +0530 Merge branch updated 'feature_kind' with new bool mapping commit 8652ba9 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Mon Jul 8 12:32:22 2024 +0530 changed mapping of boolean dtypekind to float commit d36e05a Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sun Jul 7 16:56:21 2024 +0530 moved handle_categorical call to after check_is_error commit 403ee6e Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sun Jul 7 16:52:39 2024 +0530 indexing fix commit a006004 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sun Jul 7 16:52:39 2024 +0530 indexing fix commit 3696f7b Author: Abhay-Lejith <Abhay-Lejith@users.noreply.github.com> Date: Sun Jul 7 10:17:26 2024 +0000 [AUTOMATED] update CONTRIBUTORS.md commit 05c452c Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sun Jul 7 15:46:37 2024 +0530 updated tag name commit 4b557c6 Merge: 8cd48ed 978263e Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sun Jul 7 15:43:14 2024 +0530 Merge updated branch 'categorical_tag' with class for new tag commit 978263e Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sun Jul 7 15:41:19 2024 +0530 added class for tag commit 8cd48ed Merge: 90c9f31 c1af183 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sun Jul 7 13:19:34 2024 +0530 Merge updated branch 'feature_kind' with table support commit c1af183 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sun Jul 7 13:07:56 2024 +0530 extended to table scitype along with tests commit 90c9f31 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sun Jul 7 11:19:35 2024 +0530 calling _handle_categorical in forecasting, classification, regression commit 655f6e0 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sun Jul 7 11:19:08 2024 +0530 added categorical handling logic commit db94559 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Thu Jul 4 10:29:48 2024 +0530 added docstring and modified code acc to suggestion commit 1321e1e Merge: de6c92b 0eb9231 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Tue Jul 2 13:11:12 2024 +0530 Merge pr sktime#5886 into step3-4_categorical_support commit de6c92b Merge: e376768 3f62d98 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Tue Jul 2 13:09:50 2024 +0530 Merge branch 'feature_kind' into step3-4_categorical_support commit e376768 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Tue Jul 2 11:06:07 2024 +0530 added categorical tag commit 3f62d98 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Mon Jul 1 11:08:47 2024 +0530 added separate tests for testing with categorical data commit e806cc1 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Mon Jun 24 11:09:08 2024 +0530 renamed fields and rmeoved use of pandas func for numpy based mtypes commit 7a58087 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Wed Jun 12 18:33:18 2024 +0530 added simple_feature_kind and reverted changes to examples commit ee0bc1c Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Mon Jun 3 12:02:30 2024 +0530 changed dtype to float in panel examples which I missed earlier commit 2f84b24 Merge: 1a64d2f e22d8dc Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Mon Jun 3 10:11:30 2024 +0530 Merge branch 'main' into feature_kind commit 1a64d2f Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Mon Jun 3 10:07:19 2024 +0530 changed dtype to float in panel test examples commit dd81851 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Sat Jun 1 16:20:17 2024 +0530 added dtypekind metadata for panel and hierarchical commit 83b6fe8 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Fri May 31 10:25:28 2024 +0530 using pandas api functions to detect dtype commit c0962f9 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Thu May 30 12:17:51 2024 +0530 added function for getting dtypekind of series commit 1e2110a Merge: 898b839 6f42855 Author: Franz Király <f.kiraly@ucl.ac.uk> Date: Wed May 29 13:54:43 2024 +0100 Merge branch 'main' into pr/6490 commit 898b839 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Wed May 29 11:04:28 2024 +0530 using dtypekind enum values commit ee8e0c4 Author: Abhay-Lejith <abhaylejith@gmail.com> Date: Tue May 28 14:50:43 2024 +0530 added feature_kind metadata in series check commit 0eb9231 Author: Anirban Ray <39331844+yarnabrina@users.noreply.github.com> Date: Fri Feb 16 23:37:36 2024 +0530 @fkiraly suggestion: add rule to differentiate nested_univ and Panel commit 0e28842 Author: Anirban Ray <39331844+yarnabrina@users.noreply.github.com> Date: Fri Feb 16 10:46:16 2024 +0530 support "object" dtype in Panel (cherry picked from commit 1477012) commit 25eb6f5 Author: Anirban Ray <39331844+yarnabrina@users.noreply.github.com> Date: Fri Feb 16 10:42:17 2024 +0530 @fkiraly edit: add rule to differentiate nested_univ and pd.DataFrame Ref. sktime#5886 (review) (cherry picked from commit 2cecd6b) commit a729843 Author: Anirban Ray <39331844+yarnabrina@users.noreply.github.com> Date: Sat Feb 3 18:29:15 2024 +0530 added test (cherry picked from commit 2de7d8c) commit c624e42 Author: Anirban Ray <39331844+yarnabrina@users.noreply.github.com> Date: Sat Feb 3 18:28:47 2024 +0530 support "object" dtype in Series (cherry picked from commit cc3d056)
Reference Issues/PRs
Closes #6741.
What does this implement/fix? Explain your changes.
The
predict_segmentsmethod was failing for change point detection algorithm because it was callingsegments_to_change_pointsnotchange_points_to_segments.Does your contribution introduce a new dependency? If yes, which one?
No
Did you add any tests for the change?
No but it highlights the need for better annotation module tests. I will open a related issue.
Any other comments?
The original issue contained a minimal example demonstrating the bug. This is now fixed.
Here is the output from
main.py.PR checklist
For all contributions
How to: add yourself to the all-contributors file in the
sktimeroot directory (not theCONTRIBUTORS.md). Common badges:code- fixing a bug, or adding code logic.doc- writing or improving documentation or docstrings.bug- reporting or diagnosing a bug (get this pluscodeif you also fixed the bug in the PR).maintenance- CI, test framework, release.See here for full badge reference
maintainerstag - do this if you want to become the owner or maintainer of an estimator you added.See here for further details on the algorithm maintainer role.
For new estimators
docs/source/api_reference/taskname.rst, follow the pattern.Examplessection.python_dependenciestag and ensureddependency isolation, see the estimator dependencies guide.