import torch
from coremltools.optimize.torch.palettization._efficient_kmeans import _EfficientKMeans
# All-negative data: cluster centers sum to a negative number → sqrt(negative/N) = NaN
X = torch.tensor([[-1.0], [-1.1], [-0.9], [-10.0], [-10.1], [-9.9]], dtype=torch.float32)
init_centers = torch.tensor([[-1.0], [-10.0], [-50.0]], dtype=torch.float32)
labels = torch.tensor([0, 0, 0, 1, 1, 1], dtype=torch.int64)
kmeans = _EfficientKMeans(n_clusters=3, init=init_centers, labels=labels,
max_iter=5, tol=1e-4, error_bnd=2.0)
kmeans.fit(X)
print(f"Final n_clusters : {kmeans.n_clusters}")
The final number of clusters is still 3; it should be reduced to 2.
This is a platform-agnostic bug.
reduce_inertia = reduce_min_error.sum()
I can create that in a separate PR.
🐞Describing the bug
_efficient_kmeans.pyyou can see thatreduce_min_erroris left unused andreduce_cluster_centers_are the one used for computing thermse_errorwhich is responsible for cluster reduction.rmse_erroris not correct, so its comparison witherror_bndis pointless.rmse_errorwill benan, thus, it will again skip the comparison againsterror_bnd.To Reproduce
The final number of clusters is still 3; it should be reduced to 2.
System environment (please complete the following information):
This is a platform-agnostic bug.
Additional context
Suggested Fix
Update line 262 to sum the actual minimum distance errors instead of the cluster centers:
I can create that in a separate PR.