Replies: 8 comments 6 replies
-
|
Hello everyone, Good idea! Okay, here are some initial thoughts for discussion. In the context of output types, I’m considering anomalies and segments as the same, except segments must partition the index. Some specific examples that come to mind: Example 1. A binary anomaly indicator
Question 1: would we consider a class of standardised outputs? We could have ^these both defined as standardised Outputs() with helper functions to map between them? (inheriting the index information from somewhere). Example 2. Multiple anomaly types
Example 3. Rolling windows Some algorithms have rolling windows as inputs, or we can input a feature (like ewma) to any anomaly algorithm, so I guess with a window size Example 4. Reducing the outputs of the features Some algorithms look at a multivariate timeseries and output a combined assessment of the system (like intrinsic dimensionality). Input would be Any more examples? Thoughts on the above? Cheers! |
Beta Was this translation helpful? Give feedback.
-
|
I think Katie's examples are great! Here are a few other cases I wanted to add on in addition to hers:
One could imagine a similar case if you had k change points that you were trying to find, ie - you could try to return a probability about where they were, here is a 1 change point example of what that might look like:
In the multiple change point case this of course might lead to one accidentally having overlapping probabilities for separate change points, something you might want to restrict in the algorithm itself. I wonder if it might be convenient to have most algorithms return something that looks like a pandas |
Beta Was this translation helpful? Give feedback.
-
|
Open Questions:
|
Beta Was this translation helpful? Give feedback.
-
|
Current data signature of the GGS implementation is:
|
Beta Was this translation helpful? Give feedback.
-
|
Similarly, current implementation of HMM is:
Should be easy to extend to multi-variate. I like the numpy array input because I have a lot of numpy math optimization under the hood (inspired by some work done by @conniesaur on a different project) |
Beta Was this translation helpful? Give feedback.
-
|
In advance of beginning of base class discussion tomorrow some thoughts/suggestions I had:
Looking forward to discussion tomorrow! |
Beta Was this translation helpful? Give feedback.
-
|
That sounds really reasonable^^ This In the first case, it's clear to me how boolean/float values describe outliers/anomalies e.g. X = [1, 2 ,3 ,100, 2] -> output = [False, False, False, True, False] or as float (scores). But how would we describe segments? e.g. X = [1, 2 ,3 ,100, 102] -> output = [0,0,0,1,1] (incrementing floats? up until K?) (lazy notation, I don't mean they are lists, imagine them as pd.DataFrame or np.array) |
Beta Was this translation helpful? Give feedback.
-
|
(Copying my comment (and Mirae's reply) because I used incorrect account for my contribution.) I think first thing to do is pin down (in writing) what we want the input and output formats to be, and what learning tasks we intend to cover, before deciding on the nitty gritty of implementation details and where input checks will live. My preferences for input output types:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi All,
As we are ramping up our efforts around time series annotation it might be worth opening up the discussion about proposed interfaces, data structures and formats for the learning tasks in scope of annotation. Hopefully we will be able to converge at a proposal for a standard for sktime to adopt.
Please share freely any thoughts, examples, suggestions that might inform our discussion.
FYI @miraep8 @KatieBuc @fkiraly @Lovkush-A @NoaBenAmi @lielleravid
Beta Was this translation helpful? Give feedback.
All reactions