Automatic rechunking and coercion of numpy.ndarrays#622
Conversation
dask/array/core.py
Outdated
There was a problem hiding this comment.
Note that this is not really the right logic here. The dtype signature is fixed independent of the number of dimensions except for 0d arrays, for which the dtype depends on the value according to some complex set of rules: numpy/numpy#6240
dask/array/core.py
Outdated
There was a problem hiding this comment.
0 for np.array
10 for np.matrix
|
This ended up being a lot simpler than I anticipated. |
|
Looks like there are some errors on travis.ci |
|
Ah, I missed that |
|
Yes, that seems reasonable. |
6e2cfaa to
c84a184
Compare
Fixes GH290 The implementation of rechunking here is the most basic version possible: it only succeeds if at most one of the arrays has multiple chunks defined along an axis. This suffices for coercing ndarrays -- in the future we can consider more complete solutions. Also includes a change to rechunking, where it doesn't bother if the chunk size is unchanged. We could remove this (and do it in unify_chunks instead), but it seemed like a general improvement.
|
Did another refactor. Decided to keep |
dask/array/tests/test_rechunk.py
Outdated
There was a problem hiding this comment.
I'm curious, what's the type of this?
There was a problem hiding this comment.
it's a numpy.ndarray
But actually, I think I meant to be checking y here, not x...
|
This all looks pretty sane to me. |
|
+1 |
|
Tweaked things a little bit. Now elemwise does not consider objects without a shape attribute to be ndarrays -- those arguments will be passed directly into the graph. |
Automatic rechunking and coercion of numpy.ndarrays
There was a problem hiding this comment.
The actual error includes the repr of a set object (for which the order of elements will differ on different Python versions), and doctests can't catch generic error messages.
Fixes #290
The implementation of automatic rechunking here is the most basic version possible: it only succeeds if at most one of the arrays has multiple chunks defined along an axis.
This suffices for coercing ndarrays -- in the future we can consider more complete solutions.
Also includes a change to rechunking, where it doesn't bother if the chunk size is unchanged. We could remove this (and do it in
unify_chunksinstead), but it seemed like a general improvement.