KMeans cluster_centers_ Occasionally Don't Match label_ Results

#### Description
Occasionally the cluster_centers_ attribute of KMeans do not agree with the attribute labels_. That is, if the cluster_centers_ are compared to the centroids manually computed using labels_, they occasionally are different. Based on my understanding of Lloyds algorithm, this looks like an issue. It looks greater than simply rounding error.

I stumbled on this issue when I was working on the abalone dataset available from the UCI machine learning repository: http://archive.ics.uci.edu/ml/machine-learning-databases/abalone/abalone.data

I tried to make the code to reproduce the issue as minimal as possible. However, I was unable to reproduce the issue on randomly generated data. 

To run the code, first download the data and save it as: abalone.data

I chose specific arguments for the KMeans constructor, but it seems to happen for a lot of different argument combinations. 

#### Steps/Code to Reproduce
```python
import numpy as np
import pandas as pd
from sklearn.cluster import KMeans
#%%Load data
D = pd.read_csv('abalone.data', header = None)
D[0] = D[0].map({'F': -1, 'I': 0, 'M': 1})          #Map sexes to numbers
AR = D[D.columns[:-1]].values                       #Last column are targets
A = (AR - AR.mean(axis = 0)) / AR.std(axis = 0)     #Standardize
#%%Compute clusters
kmc = KMeans(algorithm = 'full', n_clusters = 8, precompute_distances = False, random_state = 1, n_jobs = 1)
kmc.fit(A)
CL = kmc.labels_
for i in range(kmc.n_clusters):
    CLi = CL == i
    AMi = A[CLi].mean(axis = 0)
    if not np.isclose(AMi, kmc.cluster_centers_[i]).all():
        print('FAIL: {:f}'.format(np.linalg.norm(AMi - kmc.cluster_centers_[i])))
```

#### Expected Results
I would expect that cluster_centers_[i] should be "close" to A[labels_ == i].mean(axis = 0) on termination of the algorithm. Minor differences due to rounding error are expected in numerical algorithms. What seems strange here is the difference is usually almost 0 and then occasionally quite different than 0.

#### Actual Results
Occasionally the clusters centers computed manually using numpy.mean and the KMeans attribute labels_ do not match the attribute cluster_centers_

#### Versions
Windows-2012ServerR2-6.3.9600-SP0
Python 3.6.6 |Anaconda custom (64-bit)| (default, Jun 28 2018, 11:27:44) [MSC v.1900 64 bit (AMD64)]
NumPy 1.15.1
SciPy 1.1.0
Scikit-Learn 0.19.1


I stepped through the code for a while and noticed that there is even what looks like a check to handle this condition at the bottom of "_kmeans_single_lloyd." I can potentially look into this more, but I wanted to file this to make sure it wasn't a known issue or something I was missing. I didn't see any similar past issue or pull request. Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

KMeans cluster_centers_ Occasionally Don't Match label_ Results #12506

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

KMeans cluster_centers_ Occasionally Don't Match label_ Results #12506

Description

Description

Steps/Code to Reproduce

Expected Results

Actual Results

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions