Conversation
sparse arrays change the fill_values in unpredictable ways, resulting in un-concatenatable intermediate results.
|
Okay, as I feared, this does indeed make cupy unhappy -- I'll work on pushing some of this into the dispatch system |
ian-r-rose
left a comment
There was a problem hiding this comment.
Okay, I think this is close-to-ready for some eyes.
I've added a new set of dispatches for numel and nannumel, since we want different behaviors for different backends, and doing tricky hasattr checks was feeling unsustainable. I haven't touched cupyx or scipy.sparse yet, they are not really covered by the existing test suite. But the GPU CI bot seems reasonably happy with this.
I'd especially appreciate some thoughts from anyone on the @dask/gpu team as to whether this is likely to cause any problems for cupy.
| ) | ||
| from dask.array.creation import arange, diagonal | ||
|
|
||
| # Keep empty_lookup here for backwards compatibility |
There was a problem hiding this comment.
empty_lookup didn't seem to be used anywhere, and that has been the case for over a year.
There was a problem hiding this comment.
Thanks for cleaning up @ian-r-rose. Seems reasonable to remove.
I didn't see any reference to empty_lookup over in the cudf repo but cc @pentschev @jakirkham just in case for visibility
There was a problem hiding this comment.
Am not aware of any usages of empty_lookup. Searching RAPIDS doesn't turn anything up either
| ) | ||
|
|
||
|
|
||
| @pytest.mark.xfail(reason="upstream change", strict=False) |
There was a problem hiding this comment.
These tests were xfailed over four years ago -- it doesn't seem that sparse.COO is going to support mixed concatenation with numpy ndarrays, so I don't see the point in keeping these around
There was a problem hiding this comment.
Yeah, fair point. If this isn't going to be supported I agree we can remove these.
@hameerabbasi, just checking, do you think sparse.COO will support mixed concatenation with NumPy arrays?
dask/array/backends.py
Outdated
| return _numel(x, coerce_ndarray=False, **kwargs) | ||
|
|
||
|
|
||
| def _numel(x, coerce_ndarray: bool, **kwargs): |
There was a problem hiding this comment.
This is mostly the same as the existing numel function, but has an additional argument in coerce_ndarray, which, if true, forces the result to be an ndarray. If false, it lets np.full_like determine the output type, which will usually be the same as what we give it. We want coerce_ndarray for sparse arrays, and we don't want it for masked arrays, cupy, etc.
This is all private, so I think it's fairly safe (though I suppose I could obfuscate the kwarg name a bit more)
|
Thanks for the ping @ian-r-rose . As you noted yourself, unfortunately CuPy Sparse is largely uncovered at the moment, I may be mistaken but I think it's usage is fairly limited today, which is also why it didn't get much attention lately. In briefly trying to increase coverage, I found that Dask arrays backed by |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @ian-r-rose! Overall this looks great.
@jakirkham do you have any thoughts on Peter's comment here #9342 (comment)?
It looks like this also closes #8280?
| ) | ||
|
|
||
|
|
||
| @pytest.mark.xfail(reason="upstream change", strict=False) |
There was a problem hiding this comment.
Yeah, fair point. If this isn't going to be supported I agree we can remove these.
@hameerabbasi, just checking, do you think sparse.COO will support mixed concatenation with NumPy arrays?
| ) | ||
| from dask.array.creation import arange, diagonal | ||
|
|
||
| # Keep empty_lookup here for backwards compatibility |
There was a problem hiding this comment.
Thanks for cleaning up @ian-r-rose. Seems reasonable to remove.
I didn't see any reference to empty_lookup over in the cudf repo but cc @pentschev @jakirkham just in case for visibility
I haven't really tackled that issue here -- it seems more involved and may require actual work on the algorithm (based on the WIP commit linked by the user) |
jrbourbeau
left a comment
There was a problem hiding this comment.
Thanks @ian-r-rose, I'll plan to merge this tomorrow if there are no further comments on #9342 (comment)
it seems more involved and may require actual work on the algorithm
I was just going off the reproducer in the OP passing with the changes in this PR
|
Just added one more commit adding some additional test coverage, in case you have a few minutes @jrbourbeau |
|
Sorry for the lack of reply here. Responded in one thread above where I was pinged. IIUC the other question is whether we want to support sparse CuPy matrices with |
This avoids creating dense auxiliary
sparse.COOarrays in reductions, which will in general not have the samefill_valueas actually sparse arrays, preventing basic arithmetic between them.This works, but I'm a bit worried that it will cause problems for other array implementations like
cupy, and it may be better to go through the dispatch mechanism.pre-commit run --all-files