Skip to content

[BUG] Fix bug when predicting segments from clasp change point annotator#6756

Merged
alex-gregory-ds merged 2 commits intosktime:mainfrom
alex-gregory-ds:6741-fix-predict_segments_bug
Jul 18, 2024
Merged

[BUG] Fix bug when predicting segments from clasp change point annotator#6756
alex-gregory-ds merged 2 commits intosktime:mainfrom
alex-gregory-ds:6741-fix-predict_segments_bug

Conversation

@alex-gregory-ds
Copy link
Copy Markdown
Contributor

@alex-gregory-ds alex-gregory-ds commented Jul 12, 2024

Reference Issues/PRs

Closes #6741.

What does this implement/fix? Explain your changes.

The predict_segments method was failing for change point detection algorithm because it was calling segments_to_change_points not change_points_to_segments.

Does your contribution introduce a new dependency? If yes, which one?

No

Did you add any tests for the change?

No but it highlights the need for better annotation module tests. I will open a related issue.

Any other comments?

The original issue contained a minimal example demonstrating the bug. This is now fixed.

# main.py

from sktime.annotation.clasp import ClaSPSegmentation
from sktime.datasets import load_gun_point_segmentation                                                                                                                                   from sktime.tests.test_switch import run_test_for_class
 
ts, period_size, cps = load_gun_point_segmentation()

clasp = ClaSPSegmentation(period_size, n_cps=1)
clasp.fit(ts)

segments = clasp.predict_segments(ts)                                                                                                                                                      print(segments)

Here is the output from main.py.

$ python main.py
[1, 893)      -1
[893, 1875)    1
dtype: int64

PR checklist

For all contributions
  • I've added myself to the list of contributors with any new badges I've earned :-)
    How to: add yourself to the all-contributors file in the sktime root directory (not the CONTRIBUTORS.md). Common badges: code - fixing a bug, or adding code logic. doc - writing or improving documentation or docstrings. bug - reporting or diagnosing a bug (get this plus code if you also fixed the bug in the PR).maintenance - CI, test framework, release.
    See here for full badge reference
  • Optionally, for added estimators: I've added myself and possibly to the maintainers tag - do this if you want to become the owner or maintainer of an estimator you added.
    See here for further details on the algorithm maintainer role.
  • The PR title starts with either [ENH], [MNT], [DOC], or [BUG]. [BUG] - bugfix, [MNT] - CI, test framework, [ENH] - adding or improving code, [DOC] - writing or improving documentation or docstrings.
For new estimators
  • I've added the estimator to the API reference - in docs/source/api_reference/taskname.rst, follow the pattern.
  • I've added one or more illustrative usage examples to the docstring, in a pydocstyle compliant Examples section.
  • If the estimator relies on a soft dependency, I've set the python_dependencies tag and ensured
    dependency isolation, see the estimator dependencies guide.

@alex-gregory-ds alex-gregory-ds added module:detection detectors module: outliers, change points, segmentation bugfix Fixes a known bug or removes unintended behavior labels Jul 12, 2024
@alex-gregory-ds alex-gregory-ds self-assigned this Jul 12, 2024
Copy link
Copy Markdown
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should add a test - at least for clasp, better for all annotators

@alex-gregory-ds
Copy link
Copy Markdown
Contributor Author

I am having a go at writing a test for this but there is another issue. When making the big changes to BaseSeriesAnnotator, I decided not to change the output format of predict for backwards compatability. So, we have the following:

>>> from sktime.annotation.clasp import ClaSPSegmentation
>>> from sktime.utils._testing.annotation import make_annotation_problem
>>> X_train = make_annotation_problem(n_timepoints=10)
>>> print(X_train)
2000-01-01    1.508174
2000-01-02    2.079181
2000-01-03    1.923736
2000-01-04    2.180592
2000-01-05    1.000000
2000-01-06    1.957757
2000-01-07    2.144369
2000-01-08    3.097359
2000-01-09    2.145177
2000-01-10    4.231551
Freq: D, dtype: float64
>>> clasp = ClaSPSegmentation(n_cps=1)
>>> clasp.fit(X_train)
>>> points = clasp.predict_points(X_train)
>>> print(points)
Freq: D, dtype: float64
0    0                                                                                                                                                                        
>>> clasp.change_points_to_segments(points, start=X_train.index[0], end=X_train.index[-1])
Traceback (most recent call last):
  File "/home/alex/documents/sktime/main.py", line 13, in <module>
    clasp.change_points_to_segments(points, X_train.index[0], X_train.index[-1])
  File "/home/alex/documents/sktime/sktime/annotation/base/_base.py", line 644, in change_points_to_segments
    index = pd.IntervalIndex.from_breaks(breaks, copy=True, closed="left")
  File "/home/alex/documents/sktime/.venv/lib/python3.10/site-packages/pandas/core/indexes/interval.py", line 274, in from_breaks
    array = IntervalArray.from_breaks(
  File "/home/alex/documents/sktime/.venv/lib/python3.10/site-packages/pandas/core/arrays/interval.py", line 463, in from_breaks
    return cls.from_arrays(breaks[:-1], breaks[1:], closed, copy=copy, dtype=dtype)
  File "/home/alex/documents/sktime/.venv/lib/python3.10/site-packages/pandas/core/arrays/interval.py", line 544, in from_arrays
    left, right, dtype = cls._ensure_simple_new_inputs(
  File "/home/alex/documents/sktime/.venv/lib/python3.10/site-packages/pandas/core/arrays/interval.py", line 352, in _ensure_simple_new_inputs
    raise TypeError(msg)
TypeError: category, object, and string subtypes are not supported for IntervalIndex

The output from predict_points is a list of change points in terms of iloc, the change points should be datetimes not integers. Since the predicted change points are integers we cannot use change_points_to_segments since this method expects integers and datetimes cannot be compared.

Possible Solutions

  1. Change the output type of predict to match the type of the index of X. Would this require a deprecation warning?
  2. Only support integer change points for clasp so skip raise an error in clasp if X has in index data type other than integer.

I prefer option 1.

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jul 14, 2024

what is the current behaviour? I would suggest to keep it downwards compatible. I also did not think about the loc/iloc issue here, loc indeed seems more general. Are you saying that currently the return is always iloc?

Either way, I think we should start thinking about a test suite for systematic testing, the devil is in the details.

@alex-gregory-ds
Copy link
Copy Markdown
Contributor Author

The current behaviour is that clasp specifically returns iloc rather than loc. That was done to keep to keep its previous behaviour when making the changes in #6265. Eventually, I would like to move all annotators away from using iloc to loc but agree that doing proper deprication is essential.

Here is an example of iloc predictions from clasp.

>>> import pandas as pd
>>> from sktime.annotation.clasp import ClaSPSegmentation
>>> index = [i * 5 for i in range(100)]
>>> x = pd.Series([1] * 50 + [5] * 50, index=index)
>>> clasp = ClaSPSegmentation(n_cps=1)
>>> clasp.fit(x)
>>> points = clasp.predict_points(x)
>>> print(points)
0    52
dtype: int64

The returned index of the change point should be a multiple of 5 if loc were used.

Do you think we should start the rigorous test suite in this MR? I think it would be better in a separate issue and MR.

I will write a simple test just for clasp for the time being to see if that is sufficient to satisfty this issue with the aim to start of more rigorous test suite in a different issue and MR.

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jul 17, 2024

Do you think we should start the rigorous test suite in this MR? I think it would be better in a separate issue and MR.

Agree, better in a separate MR - let's start.
Shall we chat in the Fri meeutp?

@alex-gregory-ds
Copy link
Copy Markdown
Contributor Author

Shall we chat in the Fri meeutp?

I can't do this Friday. Can we do next Friday?

@alex-gregory-ds alex-gregory-ds merged commit 14e312e into sktime:main Jul 18, 2024
@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jul 18, 2024

sure!

yarnabrina added a commit to Abhay-Lejith/sktime that referenced this pull request Jul 20, 2024
* upstream/main:
  [ENH] Allow object dtype in series (sktime#5886)
  [ENH] `check_pdmultiindex_panel` to return names of invalid `object` columns if there are any (sktime#6797)
  [ENH] added feature_kind metadata in datatype checks (sktime#6490)
  [ENH] Adding tag for categorical support in `X` (sktime#6704)
  [BUG] Fix bug when predicting segments from clasp change point annotator (sktime#6756)
  [ENH] Adding support for GluonTS' PandasDataset object (sktime#6668)
Abhay-Lejith added a commit to Abhay-Lejith/sktime that referenced this pull request Jul 31, 2024
commit 9f95d89
Merge: 02bd3fb 6caeb0d
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Tue Jul 30 22:01:03 2024 +0530

    Merge branch 'main' into step3-4_categorical_support

commit 02bd3fb
Merge: 53ee977 a21d97f
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Mon Jul 29 21:47:41 2024 +0530

    Merge branch 'main' into step3-4_categorical_support

commit 53ee977
Merge: 74fd3f8 9b066f2
Author: Franz Király <f.kiraly@ucl.ac.uk>
Date:   Sun Jul 28 15:19:13 2024 +0100

    Merge branch 'main' into pr/6732

commit 74fd3f8
Merge: e9f5e2b 116faa2
Author: Franz Király <f.kiraly@ucl.ac.uk>
Date:   Sun Jul 28 15:18:26 2024 +0100

    Merge branch 'main' into pr/6732

commit e9f5e2b
Author: Abhay-Lejith <Abhay-Lejith@users.noreply.github.com>
Date:   Sun Jul 28 10:32:15 2024 +0000

    [AUTOMATED] update CONTRIBUTORS.md

commit f3e212b
Merge: 95c213b 81d2d08
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sun Jul 28 16:01:12 2024 +0530

    Merge branch 'main' into step3-4_categorical_support

commit 95c213b
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sun Jul 28 15:59:02 2024 +0530

    added est.update to test_categorical

commit 8d14bde
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sat Jul 27 09:35:51 2024 +0530

    typo, docs

commit 81b4862
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Thu Jul 25 22:27:00 2024 +0530

    separate test for categorical in y

commit 23b4279
Merge: a0d950d 3a55ca5
Author: Anirban Ray <39331844+yarnabrina@users.noreply.github.com>
Date:   Sat Jul 20 09:29:27 2024 +0530

    Merge remote-tracking branch 'upstream/main' into pr/Abhay-Lejith/6732

    * upstream/main:
      [ENH] Allow object dtype in series (sktime#5886)
      [ENH] `check_pdmultiindex_panel` to return names of invalid `object` columns if there are any (sktime#6797)
      [ENH] added feature_kind metadata in datatype checks (sktime#6490)
      [ENH] Adding tag for categorical support in `X` (sktime#6704)
      [BUG] Fix bug when predicting segments from clasp change point annotator (sktime#6756)
      [ENH] Adding support for GluonTS' PandasDataset object (sktime#6668)

commit a0d950d
Merge: 2d77fd6 1fa8cfc
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Wed Jul 17 22:03:27 2024 +0530

    Merge branch 'main' into step3-4_categorical_support

commit 2d77fd6
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sat Jul 13 18:25:44 2024 +0530

    test error fix

commit 194ec28
Merge: 143a176 23bbb9b
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sat Jul 13 17:16:00 2024 +0530

    Merge updated branch 'feature_kind' with timedelta dtype added

commit 143a176
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sat Jul 13 17:11:59 2024 +0530

    added tests for categorical

commit 23bbb9b
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Fri Jul 12 12:31:52 2024 +0530

    added timedelta to DATETIME category

commit f926c58
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Fri Jul 12 10:40:29 2024 +0530

    added ignores-exog-X check to test for invalid X input

commit 460caba
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Fri Jul 12 09:32:09 2024 +0530

    added checks in transformers

commit 08aefa2
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Thu Jul 11 13:16:09 2024 +0530

    added categorical checks in other base classes

commit 0812220
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Tue Jul 9 13:29:22 2024 +0530

    linting blank line fix

commit 5f3be53
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Tue Jul 9 13:16:07 2024 +0530

    added feature_kind to req metadata in early classifiers

commit 1d7e443
Merge: 7377837 b1f6167
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Tue Jul 9 13:11:34 2024 +0530

    Merge branch updated 'feature_kind' with xarray and dask metadata support

commit b1f6167
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Tue Jul 9 13:00:36 2024 +0530

    added metadata for xarray and dask with tests

commit 0905bbb
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Mon Jul 8 13:23:36 2024 +0530

    dataloader test error workaround

commit 7377837
Merge: d36e05a 8652ba9
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Mon Jul 8 12:33:23 2024 +0530

    Merge branch updated 'feature_kind' with new bool mapping

commit 8652ba9
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Mon Jul 8 12:32:22 2024 +0530

    changed mapping of boolean dtypekind to float

commit d36e05a
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sun Jul 7 16:56:21 2024 +0530

    moved handle_categorical call to after check_is_error

commit 403ee6e
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sun Jul 7 16:52:39 2024 +0530

    indexing fix

commit a006004
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sun Jul 7 16:52:39 2024 +0530

    indexing fix

commit 3696f7b
Author: Abhay-Lejith <Abhay-Lejith@users.noreply.github.com>
Date:   Sun Jul 7 10:17:26 2024 +0000

    [AUTOMATED] update CONTRIBUTORS.md

commit 05c452c
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sun Jul 7 15:46:37 2024 +0530

    updated tag name

commit 4b557c6
Merge: 8cd48ed 978263e
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sun Jul 7 15:43:14 2024 +0530

    Merge updated branch 'categorical_tag' with class for new tag

commit 978263e
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sun Jul 7 15:41:19 2024 +0530

    added class for tag

commit 8cd48ed
Merge: 90c9f31 c1af183
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sun Jul 7 13:19:34 2024 +0530

    Merge updated branch 'feature_kind' with table support

commit c1af183
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sun Jul 7 13:07:56 2024 +0530

    extended to table scitype along with tests

commit 90c9f31
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sun Jul 7 11:19:35 2024 +0530

    calling _handle_categorical in forecasting, classification, regression

commit 655f6e0
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sun Jul 7 11:19:08 2024 +0530

    added categorical handling logic

commit db94559
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Thu Jul 4 10:29:48 2024 +0530

    added docstring and modified code acc to suggestion

commit 1321e1e
Merge: de6c92b 0eb9231
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Tue Jul 2 13:11:12 2024 +0530

    Merge pr sktime#5886 into step3-4_categorical_support

commit de6c92b
Merge: e376768 3f62d98
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Tue Jul 2 13:09:50 2024 +0530

    Merge branch 'feature_kind' into step3-4_categorical_support

commit e376768
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Tue Jul 2 11:06:07 2024 +0530

    added categorical tag

commit 3f62d98
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Mon Jul 1 11:08:47 2024 +0530

    added separate tests for testing with categorical data

commit e806cc1
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Mon Jun 24 11:09:08 2024 +0530

    renamed fields and rmeoved use of pandas func for numpy based mtypes

commit 7a58087
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Wed Jun 12 18:33:18 2024 +0530

    added simple_feature_kind and reverted changes to examples

commit ee0bc1c
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Mon Jun 3 12:02:30 2024 +0530

    changed dtype to float in panel examples which I missed earlier

commit 2f84b24
Merge: 1a64d2f e22d8dc
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Mon Jun 3 10:11:30 2024 +0530

    Merge branch 'main' into feature_kind

commit 1a64d2f
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Mon Jun 3 10:07:19 2024 +0530

    changed dtype to float in panel test examples

commit dd81851
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Sat Jun 1 16:20:17 2024 +0530

    added dtypekind metadata for panel and hierarchical

commit 83b6fe8
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Fri May 31 10:25:28 2024 +0530

    using pandas api functions to detect dtype

commit c0962f9
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Thu May 30 12:17:51 2024 +0530

    added function for getting dtypekind of series

commit 1e2110a
Merge: 898b839 6f42855
Author: Franz Király <f.kiraly@ucl.ac.uk>
Date:   Wed May 29 13:54:43 2024 +0100

    Merge branch 'main' into pr/6490

commit 898b839
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Wed May 29 11:04:28 2024 +0530

    using dtypekind enum values

commit ee8e0c4
Author: Abhay-Lejith <abhaylejith@gmail.com>
Date:   Tue May 28 14:50:43 2024 +0530

    added feature_kind metadata in series check

commit 0eb9231
Author: Anirban Ray <39331844+yarnabrina@users.noreply.github.com>
Date:   Fri Feb 16 23:37:36 2024 +0530

    @fkiraly suggestion: add rule to differentiate nested_univ and Panel

commit 0e28842
Author: Anirban Ray <39331844+yarnabrina@users.noreply.github.com>
Date:   Fri Feb 16 10:46:16 2024 +0530

    support "object" dtype in Panel

    (cherry picked from commit 1477012)

commit 25eb6f5
Author: Anirban Ray <39331844+yarnabrina@users.noreply.github.com>
Date:   Fri Feb 16 10:42:17 2024 +0530

    @fkiraly edit: add rule to differentiate nested_univ and pd.DataFrame

    Ref. sktime#5886 (review)

    (cherry picked from commit 2cecd6b)

commit a729843
Author: Anirban Ray <39331844+yarnabrina@users.noreply.github.com>
Date:   Sat Feb 3 18:29:15 2024 +0530

    added test

    (cherry picked from commit 2de7d8c)

commit c624e42
Author: Anirban Ray <39331844+yarnabrina@users.noreply.github.com>
Date:   Sat Feb 3 18:28:47 2024 +0530

    support "object" dtype in Series

    (cherry picked from commit cc3d056)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bugfix Fixes a known bug or removes unintended behavior module:detection detectors module: outliers, change points, segmentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] segmentation fails for change point detectors

2 participants