Skip to content

[BUG] KNN with DTW #413

@manu-torres

Description

@manu-torres

Describe the bug

KNeighborsClassifier(n_neighbors = 1, metric = "euclidean")) always gives exactly the same result than KNeighborsTimeSeriesClassifier(n_neighbors = 1, metric = "dtw")

To Reproduce

Go to the example jupyter notebook 02_classification_univariate, section "K-nearest-neighbours classifier for time series".

The last two code cells before the "Other clasiffiers" section show the difference between using regular euclidean distance and dynamic time warping for a K Neighbors Classifier. When executing those two cells both models always give the same score. The results only change when repeating the train test split, but both models still have the same score with the same samples.

import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sktime.datasets import load_basic_motions
from sklearn.pipeline import make_pipeline
from sktime.transformers.series_as_features.reduce import Tabularizer
X, y = load_basic_motions(return_X_y = True)
x_train, x_test, y_train, y_test = train_test_split(X.iloc[:, [0]], y)

#sklearn version
from sklearn.neighbors import KNeighborsClassifier

knn = make_pipeline(
        Tabularizer(),
        KNeighborsClassifier(n_neighbors = 1,
                             metric = "euclidean"))

knn.fit(x_train, y_train)
print("sklearn model performance (euclidean distance): " + str(knn.score(x_test, y_test)))
print(classification_report(y_test, knn.predict(x_test)))

#sktime adaptation (with the metric changed)
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier

knn = KNeighborsTimeSeriesClassifier(n_neighbors = 1,
                           metric = "dtw")

knn.fit(x_train, y_train)
print("sktime model performance (with dynamic time warping): " + str(knn.score(x_test, y_test)))
print(classification_report(y_test, knn.predict(x_test)))

Expected behavior

The sktime version with dwt should have better performance, or at least not exactly the same.

Additional context

I was following the tutorial at PyData Amsterdam 2020 and all the output from my local installation mirrored the one in the video. However, those two cells of code had a strange behaviour. In the video it seemed that the sklearn classifier had a score of 0.65 while the sktime implementation had a 1. Of course different results are to be expected when running the code since there is randomness involved, but it is still odd.

I thought I just made a mistake somewhere while copying the code, or maybe there was a problem with my installation (version 0.4.1), so I went to the notebook hosted on Binder and saw the same behaviour there.

Versions

Details

Local installation version of sktime: 0.4.1

Also experienced the issue on the Binder notebook, with version 0.4.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions