-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
KMeans singnificantly slower on 0.23 #17208
Copy link
Copy link
Closed
Labels
Description
Describe the bug
With the latest changes, KMeans is significantly slower on small datasets. The time needed to compute clusters is around ten times longer.
Steps/Code to Reproduce
Times with the following code are:
scikit-lern 0.22: ~0.015
scikit-learn 0.23: ~0.15
import time
import sklearn.cluster
from sklearn import datasets
data = datasets.load_iris()['data']
t = time.time()
sklearn.cluster.KMeans(n_clusters=2).fit(data)
print(time.time() - t)
I also tried on a bigger dataset with shape (300, 25) where clustering with the new version needed 3-4s while before it happened in miliseconds.
Expected Results
Clusters would be computed as fast as before.
Versions
System:
python: 3.7.6 | packaged by conda-forge | (default, Jan 7 2020, 22:05:27) [Clang 9.0.1 ]
executable: /Users/primoz/miniconda3/envs/orange/bin/python
machine: Darwin-19.0.0-x86_64-i386-64bit
Python dependencies:
pip: 20.1
setuptools: 46.1.3
sklearn: 0.23.0
numpy: 1.18.4
scipy: 1.4.1
Cython: None
pandas: 1.0.3
matplotlib: 3.2.1
joblib: 0.14.1
Built with OpenMP: True
Reactions are currently unavailable