-
-
Notifications
You must be signed in to change notification settings - Fork 26.9k
python crashed when computing silhouette_score/ silhouette_samples of KMeans on large amounts of data #4701
Copy link
Copy link
Closed
Labels
Description
Firstly, show the code:
km = joblib.load(filename)
cluster_labels = km.predict(X)
silhouette_avg = silhouette_score(X, cluster_labels)
# Compute the silhouette scores for each sample
sample_silhouette_values = silhouette_samples(X, cluster_labels)
‘km’ is the model that I trained using training data. see details:
km = KMeans(n_clusters=num_k, init='k-means++', max_iter=300, n_init=1, verbose=False)
km.fit(X)
When the amount of X less than 30 thousands rows, both of silhouette_score and silhouette_samples are OK and can get expected results. But when the amount of X more than 100 thousands, the program crashed and get "Segmentation fault (core dumped)". See the detail error information:
Traceback (most recent call last):
File "test19_statistic_silhouette_score.py", line 87, in <module> out()
File "test19_statistic_silhouette_score.py", line 63, in out sample_silhouette_values = silhouette_samples(X, cluster_labels)
File "/home/supermicro/.local/lib/python2.7/site-packages/sklearn/metrics/cluster/unsupervised.py", line 153, in silhouette_samples
distances = pairwise_distances(X, metric=metric, **kwds)
File "/home/supermicro/.local/lib/python2.7/site-packages/sklearn/metrics/pairwise.py", line 1112, in pairwise_distances
return _parallel_pairwise(X, Y, func, n_jobs, **kwds)
File "/home/supermicro/.local/lib/python2.7/site-packages/sklearn/metrics/pairwise.py", line 962, in _parallel_pairwise
return func(X, Y, **kwds)
File "/home/supermicro/.local/lib/python2.7/site-packages/sklearn/metrics/pairwise.py", line 207, in euclidean_distances
distances = safe_sparse_dot(X, Y.T, dense_output=True)
File "/home/supermicro/.local/lib/python2.7/site-packages/sklearn/utils/extmath.py", line 178, in safe_sparse_dot
ret = a * b
File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 303, in __mul__
return self._mul_sparse_matrix(other)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 528, in _mul_sparse_matrix
return self.__class__((data,indices,indptr),shape=(M,N))
File "/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 84, in __init__
self.check_format(full_check=False)
File "/usr/lib/python2.7/dist-packages/scipy/sparse/compressed.py", line 144, in check_format raise ValueError("Last value of index pointer should be less than "
ValueError: Last value of index pointer should be less than the size of index and data arrays
*** Error in `python': munmap_chunk(): invalid pointer: 0x00007f9249d68010 ***
Aborted (core dumped)
Reactions are currently unavailable