Skip to content

_EfficientKMeans.fit uses cluster centroids sum instead of intra-cluster errors for cluster reduction #2698

@Ebraheem1

Description

@Ebraheem1

🐞Describing the bug

  • In _efficient_kmeans.py you can see that reduce_min_error is left unused and reduce_cluster_centers_ are the one used for computing the rmse_error which is responsible for cluster reduction.
  • This is actually a two-fold error:
  1. The computed rmse_error is not correct, so its comparison with error_bnd is pointless.
  2. In case the data points push the centroids towards the -ve sign in the sum happening in line 262, then basically rmse_error will be nan, thus, it will again skip the comparison against error_bnd.

To Reproduce

import torch
from coremltools.optimize.torch.palettization._efficient_kmeans import _EfficientKMeans

# All-negative data: cluster centers sum to a negative number → sqrt(negative/N) = NaN
X = torch.tensor([[-1.0], [-1.1], [-0.9], [-10.0], [-10.1], [-9.9]], dtype=torch.float32)

init_centers = torch.tensor([[-1.0], [-10.0], [-50.0]], dtype=torch.float32)
labels = torch.tensor([0, 0, 0, 1, 1, 1], dtype=torch.int64)

kmeans = _EfficientKMeans(n_clusters=3, init=init_centers, labels=labels,
                          max_iter=5, tol=1e-4, error_bnd=2.0)
kmeans.fit(X)

print(f"Final n_clusters           : {kmeans.n_clusters}")

The final number of clusters is still 3; it should be reduced to 2.

System environment (please complete the following information):

  • coremltools version: 9.0

This is a platform-agnostic bug.

Additional context

Suggested Fix
Update line 262 to sum the actual minimum distance errors instead of the cluster centers:

reduce_inertia = reduce_min_error.sum()

I can create that in a separate PR.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugUnexpected behaviour that should be corrected (type)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions