-
-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
Describe the bug
Using scikit-learn cross validation functions on sktime's KNN classifier to set the distance measure hyperparameters always produces the same score.
To Reproduce
from sktime.classification.distance_based import KNeighborsTimeSeriesClassifier
from sktime.datasets import load_basic_motions
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
import pandas as pd
X, y = load_basic_motions(return_X_y = True)
x_train, x_test, y_train, y_test = train_test_split(X.iloc[:, [0]], y)
param_matrix = {
"distance_params": [{"w": x / 100} for x in range(0, 100)]
}
grid = GridSearchCV(
estimator=KNeighborsTimeSeriesClassifier(
distance="dtw", n_neighbors=1
),
param_grid=param_matrix,
cv=10,
scoring="accuracy",
)
grid.fit(x_train, y_train)
print(pd.DataFrame(grid.cv_results_))
mean_fit_time std_fit_time mean_score_time std_score_time param_distance_params ... split8_test_score split9_test_score mean_test_score std_test_score rank_test_score
0 0.003862 0.000397 0.033888 0.001333 {'w': 0.0} ... 1.0 1.0 1.0 0.0 1
1 0.003609 0.000302 0.033037 0.000653 {'w': 0.01} ... 1.0 1.0 1.0 0.0 1
2 0.003663 0.000329 0.033945 0.002118 {'w': 0.02} ... 1.0 1.0 1.0 0.0 1
3 0.003961 0.000472 0.034648 0.001769 {'w': 0.03} ... 1.0 1.0 1.0 0.0 1
4 0.003810 0.000332 0.033339 0.000514 {'w': 0.04} ... 1.0 1.0 1.0 0.0 1
.. ... ... ... ... ... ... ... ... ... ... ...
95 0.003410 0.000301 0.033238 0.001309 {'w': 0.95} ... 1.0 1.0 1.0 0.0 1
96 0.003910 0.000375 0.033840 0.001541 {'w': 0.96} ... 1.0 1.0 1.0 0.0 1
97 0.003459 0.000270 0.032236 0.000451 {'w': 0.97} ... 1.0 1.0 1.0 0.0 1
98 0.003660 0.000230 0.032135 0.000351 {'w': 0.98} ... 1.0 1.0 1.0 0.0 1
99 0.003659 0.000230 0.032436 0.000898 {'w': 0.99} ... 1.0 1.0 1.0 0.0 1
[100 rows x 19 columns]Expected behavior
Using different parameters should produce different scores.
Additional context
I've tested this bug using multiple different datasets and distance measures, and all produced the same issue. I believe this issue is caused by the use of both "distance_params" and "metric_params" to store the distance measure parameters. Only the value in metric_params is actually passed to the distance measures, but when using cross validation, this is never set as the constuctor is not called, but rather the inital classifier is cloned, so the default value is always used.
This could be fixed by removing "distance_params" entirely and replacing with "metric_params", which does lead to different scores for different hyperparameters, as shown in the results below.
mean_fit_time std_fit_time mean_score_time std_score_time param_metric_params ... split8_test_score split9_test_score mean_test_score std_test_score rank_test_score
0 0.003861 0.000395 0.017496 0.000875 {'w': 0.0} ... 0.333333 0.833333 0.600000 0.213437 100
1 0.004018 0.000502 0.018389 0.000853 {'w': 0.01} ... 1.000000 0.833333 0.933333 0.110554 99
2 0.003804 0.000402 0.018915 0.000394 {'w': 0.02} ... 1.000000 0.833333 0.950000 0.076376 98
3 0.004018 0.000325 0.021227 0.003138 {'w': 0.03} ... 1.000000 0.833333 0.966667 0.066667 97
4 0.004812 0.001080 0.022917 0.002269 {'w': 0.04} ... 1.000000 0.833333 0.983333 0.050000 94
.. ... ... ... ... ... ... ... ... ... ... ...
95 0.003760 0.000336 0.033539 0.000612 {'w': 0.95} ... 1.000000 1.000000 1.000000 0.000000 1
96 0.003957 0.000815 0.035548 0.002654 {'w': 0.96} ... 1.000000 1.000000 1.000000 0.000000 1
97 0.003910 0.000301 0.033188 0.000201 {'w': 0.97} ... 1.000000 1.000000 1.000000 0.000000 1
98 0.003609 0.000201 0.033138 0.000351 {'w': 0.98} ... 1.000000 1.000000 1.000000 0.000000 1
99 0.003762 0.000248 0.033348 0.000860 {'w': 0.99} ... 1.000000 1.000000 1.000000 0.000000 1
[100 rows x 19 columns]I am raising this as an issue as there may be better solutions whilst keeping the variable names clear.
Versions
Details
System:
python: 3.7.9 (tags/v3.7.9:13c94747c7, Aug 17 2020, 18:58:18) [MSC v.1900 64 bit (AMD64)]
executable: C:\Users\XXX\AppData\Local\Programs\Python\Python37\python.exe
machine: Windows-10-10.0.16299-SP0
Python dependencies:
pip: 20.2.3
setuptools: 56.2.0
sklearn: 0.24.2
sktime: 0.6.1
statsmodels: 0.12.2
numpy: 1.20.3
scipy: 1.5.3
Cython: 0.29.14
pandas: 1.1.3
matplotlib: 3.3.2
joblib: 0.17.0
numba: 0.52.0rc3
pmdarima: 1.7.1
tsfresh: 0.17.0