-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
Closed
Description
I'm facing a problem with the davies bouldin measure.
This is the warning that I get:
.local/lib/python3.7/site-packages/sklearn/metrics/cluster/unsupervised.py:342: RuntimeWarning: divide by zero encountered in true_divide
score = (intra_dists[:, None] + intra_dists) / centroid_distances
This is the implementation in sklearn:
def davies_bouldin_score(X, labels):
X, labels = check_X_y(X, labels)
le = LabelEncoder()
labels = le.fit_transform(labels)
n_samples, _ = X.shape
n_labels = len(le.classes_)
check_number_of_labels(n_labels, n_samples)
intra_dists = np.zeros(n_labels)
centroids = np.zeros((n_labels, len(X[0])), dtype=np.float)
for k in range(n_labels):
cluster_k = safe_indexing(X, labels == k)
centroid = cluster_k.mean(axis=0)
centroids[k] = centroid
intra_dists[k] = np.average(pairwise_distances(
cluster_k, [centroid]))
centroid_distances = pairwise_distances(centroids)
if np.allclose(intra_dists, 0) or np.allclose(centroid_distances, 0):
return 0.0
score = (intra_dists[:, None] + intra_dists) / centroid_distances
score[score == np.inf] = np.nan
return np.mean(np.nanmax(score, axis=1))I found another implementation on stack overflow:
from scipy.spatial.distance import pdist, euclidean
def DaviesBouldin(X, labels):
n_cluster = len(np.bincount(labels))
cluster_k = [X[labels == k] for k in range(n_cluster)]
centroids = [np.mean(k, axis = 0) for k in cluster_k]
variances = [np.mean([euclidean(p, centroids[i]) for p in k]) for i, k in enumerate(cluster_k)]
db = []
for i in range(n_cluster):
for j in range(n_cluster):
if j != i:
db.append((variances[i] + variances[j]) / euclidean(centroids[i], centroids[j]))
return(np.max(db) / n_cluster)With this implementation I don't get any warnings, but the results differ:
Stack overflow implementation: 0.012955275662036738
/home/luca/.local/lib/python3.7/site-packages/sklearn/metrics/cluster/unsupervised.py:342: RuntimeWarning: divide by zero encountered in true_divide
score = (intra_dists[:, None] + intra_dists) / centroid_distances
Sklearn implementation: 2.1936185396772485
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels