-
-
Notifications
You must be signed in to change notification settings - Fork 260
Closed
dask/dask
#7391Description
What happened:
Exception raised: IndexError("Too many indices for array") as a direct result of following the steps from the K-Means|| example.
What you expected to happen:
The K-Means algorithm to successfully complete the fitting stage.
Minimal Complete Verifiable Example:
Reproducible via following the steps from the K-Means|| example.
import dask_ml.datasets
import dask_ml.cluster
X, y = dask_ml.datasets.make_blobs(n_samples=10000000,
chunks=1000000,
random_state=0,
centers=3)
X = X.persist()
km = dask_ml.cluster.KMeans(n_clusters=3, init_max_iter=2, oversampling_factor=10)
km.fit(X)Anything else we need to know?:
Traceback:
In []: km.fit(X)
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-4-e41873a7fbf2> in <module>
----> 1 km.fit(X)
~/anaconda3/lib/python3.8/site-packages/dask_ml/cluster/k_means.py in fit(self, X, y)
194 def fit(self, X, y=None):
195 X = self._check_array(X)
--> 196 labels, centroids, inertia, n_iter = k_means(
197 X,
198 self.n_clusters,
~/anaconda3/lib/python3.8/site-packages/dask_ml/cluster/k_means.py in k_means(X, n_clusters, init, precompute_distances, n_init, max_iter, verbose, tol, random_state, copy_x, n_jobs, algorithm, return_n_iter, oversampling_factor, init_max_iter)
266 * n_jobs=-1
267 """
--> 268 labels, inertia, centers, n_iter = _kmeans_single_lloyd(
269 X,
270 n_clusters,
~/anaconda3/lib/python3.8/site-packages/dask_ml/cluster/k_means.py in _kmeans_single_lloyd(X, n_clusters, max_iter, init, verbose, x_squared_norms, random_state, tol, precompute_distances, oversampling_factor, init_max_iter)
569 # Require at least one per bucket, to avoid division by 0.
570 counts = da.maximum(counts, 1)
--> 571 new_centers = new_centers / counts[:, None]
572 (new_centers,) = compute(new_centers)
573
~/anaconda3/lib/python3.8/site-packages/dask/array/core.py in __getitem__(self, index)
1694 )
1695
-> 1696 index2 = normalize_index(index, self.shape)
1697 dependencies = {self.name}
1698 for i in index2:
~/anaconda3/lib/python3.8/site-packages/dask/array/slicing.py in normalize_index(idx, shape)
895 idx = idx + (slice(None),) * (len(shape) - n_sliced_dims)
896 if len([i for i in idx if i is not None]) > len(shape):
--> 897 raise IndexError("Too many indices for array")
898
899 none_shape = []
IndexError: Too many indices for arrayInspection of values via ipdb:
In []: km.fit(X)
> /root/anaconda3/lib/python3.8/site-packages/dask/array/slicing.py(898)normalize_index()
897 import ipdb; ipdb.set_trace()
--> 898 raise IndexError("Too many indices for array")
899
ipdb> list
893 n_sliced_dims += 1
894
895 idx = idx + (slice(None),) * (len(shape) - n_sliced_dims)
896 if len([i for i in idx if i is not None]) > len(shape):
897 import ipdb; ipdb.set_trace()
--> 898 raise IndexError("Too many indices for array")
899
900 none_shape = []
901 i = 0
902 for ind in idx:
903 if ind is not None:
ipdb> pp idx
(slice(None, None, None), None)
ipdb> pp n_sliced_dims
1
ipdb> pp len([i for i in idx if i is not None])
1
ipdb> pp shape
()Possibly related to #802.
Environment:
- Dask version: 2021.03.0
- Dask_ml version: 1.8.0
- Python version: 3.8.5 (default, Sep 4 2020, 07:30:14), [GCC 7.3.0]
- Operating System: Debian Buster
- Install method (conda, pip, source): pip
If this is not reproducible on the maintainers' side, I'll be happy to provide a Dockerfile with a bit more concrete details about the underlying environment.
Also, please, let me know if this should be moved to the Dask repository instead.
Thanks!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels