mean of sparse array: AxisError: axis 0 is out of bounds for array of dimension 0#2842
mean of sparse array: AxisError: axis 0 is out of bounds for array of dimension 0#2842ogrisel wants to merge 2 commits intodask:masterfrom
Conversation
dask/array/reductions.py
Outdated
| x_ones[:] = 1 | ||
| else: | ||
| x_ones = np.ones_like(x) | ||
| return chunk.sum(x_ones, **kwargs) |
There was a problem hiding this comment.
Instead perhaps we replace np.ones_like with np.ones(shape=x.shape, dtype='u1') ?
There was a problem hiding this comment.
I amended my commit with your suggestion. It can still allocate a lot of unnecessary memory if the arrays are very sparse and the chunk dimensions comparatively large.
There was a problem hiding this comment.
Actually using u1 is wrong. We should probably use u8 to be able to count large dimensions.
There was a problem hiding this comment.
Yes, ideally we would re-implement the axis and keepdims logic. We've been lazy so far.
U1 seems to work for me?
In [1]: import numpy as np
In [2]: np.ones(shape=(1000, 2000), dtype='u1').sum()
Out[2]: 2000000There was a problem hiding this comment.
There are still broken tests with masked arrays. Need to investigate.
|
Is this still an issue today @ogrisel? It looks like things might work on the current In [1]: import dask.array as da
In [2]: import sparse
In [3]: x = da.random.random((100, 100), chunks=(10, 10))
In [4]: x[x < 0.95] = 0
In [5]: s = x.map_blocks(sparse.COO)
In [6]: s.mean(axis=0).compute()
Out[6]: <COO: shape=(100,), dtype=float64, nnz=99, fill_value=0.0> |
|
Closing as the originally posted issue seems to be resolved. @ogrisel feel free to re-open if this is not the case. |
s.mean(axis=0).compute()triggers the following uncaught exception:This PR adds a non-regression test along with a naive fix that calls
todense()on each chunk during the reduction. This is probably sub-optimal from a performance point of view but I am not sure which kwargs should be supported innumelbesidesaxis.I will add a entry to the changelog if we agree on the correct fix.
flake8 daskdocs/source/changelog.rstfor all changesand one of the
docs/source/*-api.rstfiles for new API