-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Describe the bug
KNeighborsClassifier(n_neighbors = 1, metric = "euclidean")) always gives exactly the same result than KNeighborsTimeSeriesClassifier(n_neighbors = 1, metric = "dtw")
To Reproduce
Go to the example jupyter notebook 02_classification_univariate, section "K-nearest-neighbours classifier for time series".
The last two code cells before the "Other clasiffiers" section show the difference between using regular euclidean distance and dynamic time warping for a K Neighbors Classifier. When executing those two cells both models always give the same score. The results only change when repeating the train test split, but both models still have the same score with the same samples.
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report
from sktime.datasets import load_basic_motions
from sklearn.pipeline import make_pipeline
from sktime.transformers.series_as_features.reduce import Tabularizer
X, y = load_basic_motions(return_X_y = True)
x_train, x_test, y_train, y_test = train_test_split(X.iloc[:, [0]], y)
#sklearn version
from sklearn.neighbors import KNeighborsClassifier
knn = make_pipeline(
Tabularizer(),
KNeighborsClassifier(n_neighbors = 1,
metric = "euclidean"))
knn.fit(x_train, y_train)
print("sklearn model performance (euclidean distance): " + str(knn.score(x_test, y_test)))
print(classification_report(y_test, knn.predict(x_test)))
#sktime adaptation (with the metric changed)
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
knn = KNeighborsTimeSeriesClassifier(n_neighbors = 1,
metric = "dtw")
knn.fit(x_train, y_train)
print("sktime model performance (with dynamic time warping): " + str(knn.score(x_test, y_test)))
print(classification_report(y_test, knn.predict(x_test)))Expected behavior
The sktime version with dwt should have better performance, or at least not exactly the same.
Additional context
I was following the tutorial at PyData Amsterdam 2020 and all the output from my local installation mirrored the one in the video. However, those two cells of code had a strange behaviour. In the video it seemed that the sklearn classifier had a score of 0.65 while the sktime implementation had a 1. Of course different results are to be expected when running the code since there is randomness involved, but it is still odd.
I thought I just made a mistake somewhere while copying the code, or maybe there was a problem with my installation (version 0.4.1), so I went to the notebook hosted on Binder and saw the same behaviour there.
Versions
Details
Local installation version of sktime: 0.4.1
Also experienced the issue on the Binder notebook, with version 0.4.2