Skip to content

[BUG] Knn bugfix to allow GridsearchCV and usage with column ensemble.#1903

Merged
TonyBagnall merged 4 commits intomainfrom
knn_changes
Jan 21, 2022
Merged

[BUG] Knn bugfix to allow GridsearchCV and usage with column ensemble.#1903
TonyBagnall merged 4 commits intomainfrom
knn_changes

Conversation

@TonyBagnall
Copy link
Copy Markdown
Contributor

@TonyBagnall TonyBagnall commented Jan 21, 2022

Reference Issues/PRs

Fixes #1713
Fixes #578
Fixes #1057

What does this implement/fix? Explain your changes.

Simple fix to allow for correct parameter setting with GridCV. It removes a temporary measure to allow KNN to pass tests and fixes thanks to the new distance factory. Note we should do some work on making the valid parameters clearer for the different distance measures

Below code all works

code

knn = KNeighborsTimeSeriesClassifier(distance="dtw", n_neighbors=1, distance_params={
"window":0.5})
knn2 = KNeighborsTimeSeriesClassifier(distance="dtw", n_neighbors=1)

param_matrix = {"distance_params": [{"window": x/10.0} for x in range(0,10)]
}

CV Parameter search

grid = GridSearchCV(knn2, param_grid=param_matrix, cv=2, scoring="accuracy")
grid.fit(x_train, y_train)
print(pd.DataFrame(grid.cv_results_))
print(grid)
print(" params after CV = ",knn2.get_params())
knn4 = KNeighborsTimeSeriesClassifier(distance="dtw", n_neighbors=1, distance_params={
"window":0.0})

estimators = [
("knn", knn, [0, 1, 2]),
("knn2", knn2, [3, 4]),
("knn4", knn4, [5]),
]
x_train, y_train = load_basic_motions(split="train")
x_test, y_test = load_basic_motions(split="test")

train column ensemble

col_ens = ColumnEnsembleClassifier(estimators=estimators)
col_ens.fit(x_train, y_train)
print(" Score = ",col_ens.score(x_train, y_train))

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jan 21, 2022

stupid question - what metric do you tune a clustering algorithm for?

@chrisholder
Copy link
Copy Markdown
Collaborator

chrisholder commented Jan 21, 2022

It's a classifier @fkiraly and various distance metrics i.e. some distances are better for certain datasets than others

@TonyBagnall
Copy link
Copy Markdown
Contributor Author

stupid question - what metric do you tune a clustering algorithm for?

number of clusters usually, but this is classification, just closing some very old issues

@fkiraly
Copy link
Copy Markdown
Collaborator

fkiraly commented Jan 21, 2022

ah, I misread knn for kmeans

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

3 participants