Skip to content

KMeans singnificantly slower on 0.23 #17208

@PrimozGodec

Description

@PrimozGodec

Describe the bug

With the latest changes, KMeans is significantly slower on small datasets. The time needed to compute clusters is around ten times longer.

Steps/Code to Reproduce

Times with the following code are:
scikit-lern 0.22: ~0.015
scikit-learn 0.23: ~0.15

import time

import sklearn.cluster
from sklearn import datasets

data = datasets.load_iris()['data']

t = time.time()
sklearn.cluster.KMeans(n_clusters=2).fit(data)
print(time.time() - t)

I also tried on a bigger dataset with shape (300, 25) where clustering with the new version needed 3-4s while before it happened in miliseconds.

Expected Results

Clusters would be computed as fast as before.

Versions

System:
    python: 3.7.6 | packaged by conda-forge | (default, Jan  7 2020, 22:05:27)  [Clang 9.0.1 ]
executable: /Users/primoz/miniconda3/envs/orange/bin/python
   machine: Darwin-19.0.0-x86_64-i386-64bit
Python dependencies:
       pip: 20.1
setuptools: 46.1.3
   sklearn: 0.23.0
     numpy: 1.18.4
     scipy: 1.4.1
    Cython: None
    pandas: 1.0.3
matplotlib: 3.2.1
    joblib: 0.14.1
Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions