IndexError("Too many indices for array") raised when attempting to run the K-Means|| example

**What happened**:
Exception raised: `IndexError("Too many indices for array")` as a direct result of following the steps from the [K-Means|| example](https://examples.dask.org/machine-learning.html#Training-on-Large-Datasets).

**What you expected to happen**:
The K-Means algorithm to successfully complete the fitting stage.

**Minimal Complete Verifiable Example**:

Reproducible via following the steps from the [K-Means|| example](https://examples.dask.org/machine-learning.html#Training-on-Large-Datasets).
```python
import dask_ml.datasets
import dask_ml.cluster


X, y = dask_ml.datasets.make_blobs(n_samples=10000000,
                                   chunks=1000000,
                                   random_state=0,
                                   centers=3)
X = X.persist()

km = dask_ml.cluster.KMeans(n_clusters=3, init_max_iter=2, oversampling_factor=10)
km.fit(X)
```

**Anything else we need to know?**:
Traceback:
```python
In []: km.fit(X)
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-4-e41873a7fbf2> in <module>
----> 1 km.fit(X)

~/anaconda3/lib/python3.8/site-packages/dask_ml/cluster/k_means.py in fit(self, X, y)
    194     def fit(self, X, y=None):
    195         X = self._check_array(X)
--> 196         labels, centroids, inertia, n_iter = k_means(
    197             X,
    198             self.n_clusters,

~/anaconda3/lib/python3.8/site-packages/dask_ml/cluster/k_means.py in k_means(X, n_clusters, init, precompute_distances, n_init, max_iter, verbose, tol, random_state, copy_x, n_jobs, algorithm, return_n_iter, oversampling_factor, init_max_iter)
    266     * n_jobs=-1
    267     """
--> 268     labels, inertia, centers, n_iter = _kmeans_single_lloyd(
    269         X,
    270         n_clusters,

~/anaconda3/lib/python3.8/site-packages/dask_ml/cluster/k_means.py in _kmeans_single_lloyd(X, n_clusters, max_iter, init, verbose, x_squared_norms, random_state, tol, precompute_distances, oversampling_factor, init_max_iter)
    569             # Require at least one per bucket, to avoid division by 0.
    570             counts = da.maximum(counts, 1)
--> 571             new_centers = new_centers / counts[:, None]
    572             (new_centers,) = compute(new_centers)
    573 

~/anaconda3/lib/python3.8/site-packages/dask/array/core.py in __getitem__(self, index)
   1694         )
   1695 
-> 1696         index2 = normalize_index(index, self.shape)
   1697         dependencies = {self.name}
   1698         for i in index2:

~/anaconda3/lib/python3.8/site-packages/dask/array/slicing.py in normalize_index(idx, shape)
    895     idx = idx + (slice(None),) * (len(shape) - n_sliced_dims)
    896     if len([i for i in idx if i is not None]) > len(shape):
--> 897         raise IndexError("Too many indices for array")
    898 
    899     none_shape = []

IndexError: Too many indices for array
```

Inspection of values via `ipdb`:
```python
In []: km.fit(X)
> /root/anaconda3/lib/python3.8/site-packages/dask/array/slicing.py(898)normalize_index()
    897         import ipdb; ipdb.set_trace()
--> 898         raise IndexError("Too many indices for array")
    899 

ipdb> list
    893             n_sliced_dims += 1
    894 
    895     idx = idx + (slice(None),) * (len(shape) - n_sliced_dims)
    896     if len([i for i in idx if i is not None]) > len(shape):
    897         import ipdb; ipdb.set_trace()
--> 898         raise IndexError("Too many indices for array")
    899 
    900     none_shape = []
    901     i = 0
    902     for ind in idx:
    903         if ind is not None:

ipdb> pp idx
(slice(None, None, None), None)
ipdb> pp n_sliced_dims
1
ipdb> pp len([i for i in idx if i is not None])
1
ipdb> pp shape
()
```

Possibly related to https://github.com/dask/dask-ml/issues/802.


**Environment**:

- Dask version: 2021.03.0
- Dask_ml version: 1.8.0
- Python version: 3.8.5 (default, Sep  4 2020, 07:30:14), [GCC 7.3.0]
- Operating System: Debian Buster
- Install method (conda, pip, source): pip

If this is not reproducible on the maintainers' side, I'll be happy to provide a `Dockerfile` with a bit more concrete details about the underlying environment.

Also, please, let me know if this should be moved to the Dask repository instead.

Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IndexError("Too many indices for array") raised when attempting to run the K-Means|| example #803

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

IndexError("Too many indices for array") raised when attempting to run the K-Means|| example #803

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions