KNN+X is a proposed improvement to the well-known KNN algorithm.
To label a new sample, the k nearest samples from the training data are found
and then another classification algorithm X is used on the k neighbors.
This is a replacement for the original majority choice in KNN, this is meant to make a more informed choice in cases where the majority is slight.
We enhance the proposed method by adding thresholds. Using thresholds, we can decide that if the majority label is 80% of the k-neighborhood, we choose it (t=0.8 in code).
So if we have a sufficient majority we don't train a new classifier - this saves runtime.
The experiment python code requires:
- numpy
- pandas
- h5py
- sklearn
- joblib
The plotting notebook also requires Matplotlib.
When running experiments you can control various variables:
- Runs amount - how many experiment rounds to run.
- K values - which values of
kto experiment with. e.g. 5, 13, 20, 100. These should be positive integers - t values - which values of
tto experiment with. e.g. 0.2,0.6,0.9,1. We require0 <= t <= 1. - Datasets - which
datasetsto experiment with (Currently implemented:covertype,glass,mnist,skin,shuttle,usps,wineandyeast). - KNN algo - which KNN algorithm to use. (Choose between
brute,kd_treeandball_tree).
To run the experiment code, optionally choose the run amount, datasets, threshold values, k values, and knn algorithm to try in the main function in knn_plus_x.py and run it.
python knn_plus_x.py
You can plot the results using plot_results.ipynb