Sparse array reductions by ian-r-rose · Pull Request #9342 · dask/dask

ian-r-rose · 2022-08-02T20:07:44Z

This avoids creating dense auxiliary sparse.COO arrays in reductions, which will in general not have the same fill_value as actually sparse arrays, preventing basic arithmetic between them.

This works, but I'm a bit worried that it will cause problems for other array implementations like cupy, and it may be better to go through the dispatch mechanism.

Closes Sparse .var() getting wrong fill values #7169
Tests added / passed
Passes pre-commit run --all-files

sparse arrays change the fill_values in unpredictable ways, resulting in un-concatenatable intermediate results.

ian-r-rose · 2022-08-02T22:22:56Z

Okay, as I feared, this does indeed make cupy unhappy -- I'll work on pushing some of this into the dispatch system

ian-r-rose

Okay, I think this is close-to-ready for some eyes.

I've added a new set of dispatches for numel and nannumel, since we want different behaviors for different backends, and doing tricky hasattr checks was feeling unsustainable. I haven't touched cupyx or scipy.sparse yet, they are not really covered by the existing test suite. But the GPU CI bot seems reasonably happy with this.

I'd especially appreciate some thoughts from anyone on the @dask/gpu team as to whether this is likely to cause any problems for cupy.

ian-r-rose · 2022-08-02T23:08:06Z

dask/array/reductions.py

 )
 from dask.array.creation import arange, diagonal
-
-# Keep empty_lookup here for backwards compatibility


empty_lookup didn't seem to be used anywhere, and that has been the case for over a year.

Thanks for cleaning up @ian-r-rose. Seems reasonable to remove.

I didn't see any reference to empty_lookup over in the cudf repo but cc @pentschev @jakirkham just in case for visibility

Am not aware of any usages of empty_lookup. Searching RAPIDS doesn't turn anything up either

ian-r-rose · 2022-08-02T23:10:38Z

dask/array/tests/test_sparse.py

    )


-@pytest.mark.xfail(reason="upstream change", strict=False)


These tests were xfailed over four years ago -- it doesn't seem that sparse.COO is going to support mixed concatenation with numpy ndarrays, so I don't see the point in keeping these around

Yeah, fair point. If this isn't going to be supported I agree we can remove these.

@hameerabbasi, just checking, do you think sparse.COO will support mixed concatenation with NumPy arrays?

ian-r-rose · 2022-08-02T23:13:22Z

dask/array/backends.py

+    return _numel(x, coerce_ndarray=False, **kwargs)
+
+
+def _numel(x, coerce_ndarray: bool, **kwargs):


This is mostly the same as the existing numel function, but has an additional argument in coerce_ndarray, which, if true, forces the result to be an ndarray. If false, it lets np.full_like determine the output type, which will usually be the same as what we give it. We want coerce_ndarray for sparse arrays, and we don't want it for masked arrays, cupy, etc.

This is all private, so I think it's fairly safe (though I suppose I could obfuscate the kwarg name a bit more)

pentschev · 2022-08-03T17:01:16Z

Thanks for the ping @ian-r-rose . As you noted yourself, unfortunately CuPy Sparse is largely uncovered at the moment, I may be mistaken but I think it's usage is fairly limited today, which is also why it didn't get much attention lately.

In briefly trying to increase coverage, I found that Dask arrays backed by cupyx.scipy.sparse.coo_matrix don't respect the chunktype, even if we pass meta=cupyx.scipy.sparse.coo_matrix((0,0)), and CSR matrices don't seem to respect chunktypes for matrices larger than 2 dimensions. I don't have enough bandwidth now to look further, @jakirkham is this something you would be interested/have bandwidth to look at? If not, then I'd suggest this PR may go in even without further CuPy testing.

jrbourbeau

Thanks @ian-r-rose! Overall this looks great.

@jakirkham do you have any thoughts on Peter's comment here #9342 (comment)?

It looks like this also closes #8280?

jrbourbeau · 2022-08-08T20:40:30Z

dask/array/tests/test_sparse.py

    )


-@pytest.mark.xfail(reason="upstream change", strict=False)


Yeah, fair point. If this isn't going to be supported I agree we can remove these.

@hameerabbasi, just checking, do you think sparse.COO will support mixed concatenation with NumPy arrays?

dask/array/tests/test_sparse.py

jrbourbeau · 2022-08-08T20:55:04Z

dask/array/reductions.py

 )
 from dask.array.creation import arange, diagonal
-
-# Keep empty_lookup here for backwards compatibility


Thanks for cleaning up @ian-r-rose. Seems reasonable to remove.

I didn't see any reference to empty_lookup over in the cudf repo but cc @pentschev @jakirkham just in case for visibility

dask/array/backends.py

ian-r-rose · 2022-08-08T21:43:41Z

It looks like this also closes #8280?

I haven't really tackled that issue here -- it seems more involved and may require actual work on the algorithm (based on the WIP commit linked by the user)

jrbourbeau

Thanks @ian-r-rose, I'll plan to merge this tomorrow if there are no further comments on #9342 (comment)

it seems more involved and may require actual work on the algorithm

I was just going off the reproducer in the OP passing with the changes in this PR

ian-r-rose · 2022-08-10T22:48:01Z

Just added one more commit adding some additional test coverage, in case you have a few minutes @jrbourbeau

jrbourbeau

Thanks @ian-r-rose!

jakirkham · 2022-08-12T06:44:02Z

Sorry for the lack of reply here. Responded in one thread above where I was pinged.

IIUC the other question is whether we want to support sparse CuPy matrices with numel? As CuPy sparse matrices are fairly similar to their SciPy sparse matrix counterparts, would ask whether SciPy sparse matrices are supported? If not, then wouldn't worry about CuPy. If someone asks about numel for sparse matrices, we can worry about it then :)

Ian Rose added 4 commits August 2, 2022 11:20

WIP ensure that numel don't result in sparse arrays, as aggregations on

aed711f

sparse arrays change the fill_values in unpredictable ways, resulting in un-concatenatable intermediate results.

Un-xfail

7340023

Increase test coverage to include nan variants of reductions

e25d6da

Remove tests which have not been supported by sparse since 2018

686d9c2

github-actions bot added the array label Aug 2, 2022

Revert change to mask

549112d

Implement numel and nannumel as dispatch methods

1446132

github-actions bot added the dispatch Related to `Dispatch` extension objects label Aug 2, 2022

ian-r-rose commented Aug 2, 2022

View reviewed changes

ian-r-rose marked this pull request as ready for review August 2, 2022 23:16

Ian Rose added 2 commits August 2, 2022 18:09

Don't bind dispatch to tasks, we don't want to serialize it.

34402ae

Add some docstrings

5515155

jrbourbeau self-requested a review August 4, 2022 16:12

jrbourbeau reviewed Aug 8, 2022

View reviewed changes

Address review comments

ac46d82

Clarify that we are coercing to a numpy.ndarray for sparse numel

51d2126

jrbourbeau approved these changes Aug 10, 2022

View reviewed changes

Add more test coverage for nannumel

e96ccb0

jrbourbeau approved these changes Aug 12, 2022

View reviewed changes

jrbourbeau merged commit 8b95f98 into dask:main Aug 12, 2022

		return _numel(x, coerce_ndarray=False, **kwargs)


		def _numel(x, coerce_ndarray: bool, **kwargs):

Uh oh!

Conversation

ian-r-rose commented Aug 2, 2022

Uh oh!

ian-r-rose commented Aug 2, 2022

Uh oh!

ian-r-rose left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pentschev commented Aug 3, 2022

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ian-r-rose commented Aug 8, 2022

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

ian-r-rose commented Aug 10, 2022

Uh oh!

jrbourbeau left a comment

Choose a reason for hiding this comment

Uh oh!

jakirkham commented Aug 12, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants