improved the efficiency of `matmul()` by completely removing concatenation by ParticularMiner · Pull Request #8423 · dask/dask

ParticularMiner · 2021-11-27T22:46:24Z

Supplements Rewrite matmul as blockwise without concatenated contraction #7000
Tests passed
Passes pre-commit run --all-files

This PR avoids concatenation in both blockwise() and reduction() within the implementation of matmul(). Previously, matmul() avoided concatenation only in blockwise().

Due to the special nature of the preceding blockwise contraction, all chunk-operands of a sum during the reduction phase are expected to have exactly the same shape, with a size of 1 for the reduction axis. This makes mere element-wise addition of the chunks possible. Furthermore, the summation result can be merely reshaped to lose the reduction axis when requested. In principle, both these operations (element-wise addition and reshaping) are performance boosters compared to the default concatenation-mediated reduction which often involves data rearrangement.

The code of matmul() has also been shortened, making it a bit more readable. This is a direct consequence of ensuring that the position of the new-axis introduced in the output of chunk-function _matmul() [the first argument to blockwise()] is consistent with its expected position in the output of blockwise() (the position of the contraction-axis).

Performance is comparable to the original matmul() implementation in all cases except one, in which this new implementation is approximately 20% faster, namely, "the case where all dimensions are chunked". This outcome is consistent with the total absence of concatenation-overhead in the new version. Performance results can be found in this jupyter notebook.

See discussion with @ravwojdyla at #7000 (comment) for the motivation of this PR.

GPUtester · 2021-11-27T22:46:27Z

Can one of the admins verify this patch?

ParticularMiner · 2021-11-28T11:27:43Z

Attention: @dask, admins

Many CI tests are broken (but not because of this PR), as seen here (#8423) and elsewhere (#8424).

ParticularMiner · 2021-11-28T14:05:53Z

Hi @dask

I need some help debugging this PR for the unit-test dask/array/tests/test_array_function.py::test_unregistered_func[<lambda>6] , which passes with numpy version 1.21.4, but not versions ≤ 1.19.5.

The bug originates from using a list a of arrays of type dask.array.tests.test_dispatch.EncapsulateNDArray innumpy.sum which is implemented by numpy.add.reduce(). I have not been able to determine what changed in numpy.add.reduce() between versions 1.19.5 and 1.21.4 that killed this bug. The error thrown is

ValueError: setting an array element with a sequence.

EDIT:

This bug has been fixed. Basically, I changed the line

        out = chunk.sum(a, dtype=dtype, axis=0)

in _chunk_sum() to

        item_type = type(a[0])
        if item_type is np.ndarray:
            out = chunk.sum(a, dtype=dtype, axis=0)
        else:
            a = [item.__array__() for item in a]
            out = chunk.sum(a, dtype=dtype, axis=0)
            out = item_type(out)

A bit crude. But it passes the tests.

Unfortunately, many CI tests are still breaking after GitHub crashed 2 days ago.

ParticularMiner · 2021-11-30T08:37:52Z

Yay! CI tests are now operational. Thanks @dask admin!

jsignell · 2021-11-30T20:19:43Z

Sorry for the lack of response @ParticularMiner. I'll leave review to @ravwojdyla or maybe @eriknw

ravwojdyla

great work @ParticularMiner, see some questions/comments below.

ravwojdyla · 2021-12-02T03:23:55Z

dask/array/routines.py

+        dtype = getattr(np.zeros(1, dtype=a.dtype).sum(), "dtype", object)
+
+    if a.shape[axis] == 1:
+        return a.reshape(_shape_minus_axis(a.shape, axis))


This reshape could probably be just a squeeze with axis parameter, might be a bit more clear that way?

Aha! You're precisely right. I forgot all about squeeze. I will change that.

ravwojdyla · 2021-12-02T03:28:05Z

dask/array/routines.py

+    if a.shape[axis] == 1:
+        return a.reshape(_shape_minus_axis(a.shape, axis))
+
+    result = reduction(


Could probably be just return?

Of course. Sorry for that.

dask/array/routines.py

ravwojdyla · 2021-12-02T03:49:46Z

dask/array/routines.py

    return dot(a.conj().ravel(), b.ravel())


+def _shape_minus_axis(shape, axis):


I wonder if you even need this method and could instead just use squeeze more?

Indeed. I'll be using squeeze instead. Thanks for this.

ravwojdyla · 2021-12-02T04:01:25Z

dask/array/routines.py

+        item_type = type(a[0])
+        if item_type is np.ndarray:
+            out = chunk.sum(a, dtype=dtype, axis=0)
+        else:
+            a = [item.__array__() for item in a]
+            out = chunk.sum(a, dtype=dtype, axis=0)
+            out = item_type(out)


I'm probably missing something, but why does chunk.sum doesn't work here? could we let numpy delegate to the array types? __array__ might lead to numpy array materialisation right, which can be expensive? afair CuPy doesn't support __array__.

I agree that there's a problem here. But I'm not sure how to fix it properly.

It turns out that chunk.sum works for the latest version of numpy so then this conditional branch is unnecessary. But the CI tests revealed that earlier versions of numpy raise a ValueError: setting an array element with a sequence.

I'll explore further. Thanks.

I forgot to mention — the specific chunk type that raises the above exception is dask.array.tests.test_dispatch.EncapsulateNDArray found defined in dask/array/tests/test_dispatch.py

I think you need to add squeeze to EncapsulateNDArray, see

dask/dask/array/tests/test_dispatch.py

Lines 84 to 87 in a5aecac

astype = wrap("astype")

sum = wrap("sum")

prod = wrap("prod")

reshape = wrap("reshape")

If we fix that what else is missing?

I wanted to do that earlier but hesitated because I got the impression the creators wanted the EncapsulateNDArray interface to remain minimal as mentioned in the docstring for WrappedArray in dask/dask/array/tests/test_dispatch.py:

class WrappedArray(np.lib.mixins.NDArrayOperatorsMixin): """ Another mock duck array class (like EncapsulateNDArray), but designed to be above Dask in the type casting hierarchy (that is, WrappedArray wraps Dask Array) and be even more minimal in API. Tests that Dask defers properly to upcast types. """

Perhaps you feel that's not critical?

It is my believe, that if we go forward with using squeeze in matmul (which arguable is a pretty common op), adding squeeze to that test WrappedArray is "justified".

True. So should I add it and revert back to using numpy.sum instead of numpy.add with functools.reduce? Is there anyone who'd be unhappy with that?

I'm +1, I would be interested to see the impact there, also to the other comment: #8423 (comment)

ParticularMiner · 2021-12-02T11:37:35Z

@ravwojdyla

Using squeeze() in _chunk_sum() would have been my preference.

But unfortunately squeeze() raised an error for numpy arrays whose shape had values like (0, 0, 0). This is because squeeze() always expects an axis of size 1.

Besides, for the array type dask.array.tests.test_dispatch.EncapsulateNDArray (found defined in dask/array/tests/test_dispatch.py) squeeze() raises AttributeError: 'EncapsulateNDArray' object has no attribute 'squeeze'.

So I reluctantly have to revert to using reshape() instead.

I however retain usage of squeeze() everywhere else.

ParticularMiner · 2021-12-02T18:36:51Z

@ravwojdyla

I have now replaced the following code in _chunk_sum():

        item_type = type(a[0])
        if item_type is np.ndarray:
            out = chunk.sum(a, dtype=dtype, axis=0)
        else:
            a = [item.__array__() for item in a]
            out = chunk.sum(a, dtype=dtype, axis=0)
            out = item_type(out)

with this

        xp = np

        if is_cupy_type(a[0]):
            import cupy

            xp = cupy

        out = reduce(partial(xp.add, dtype=dtype), a)

where reduce is imported from the package functools. Fortunately, all CI tests pass with these changes. Moreover, just like in _matmul() I explicitly test for the existence of a cupy type.

The latter code snippet seems to give a slight performance edge over the former making it more than 25% faster (compared to 20% by the former) than than the original matmul in the case where all dimensions are chunked.

ravwojdyla · 2021-12-02T20:23:45Z

dask/array/routines.py

+    else:
+        if a.shape == (0,):
+            return a


What is this case? And does it need to be a separate branch?

This branch turns out to be necessary, as _chunk_sum is often called with an empty array a parameter passed in order to compute meta data — a peculiarity of dask.array.reduction

@ParticularMiner if you remove that branch and let it go into the other branch, eventually into the reduce/add, what happens, how does it fail?

I get an IndexError: too many indices for array: array is 1-dimensional, but 2 were indexed
at the reduce/add line when computing the meta data. Would you like the full traceback log?

Here's the error log from one test:
tracebacklog.txt

The problem is that the error occurs in dask outside any of the code that I wrote in this PR.

@ParticularMiner thanks, oh, so this error happens only for a specific case: [(7,), (7,), (), ()],, which will reduce to a scalar, are you sure there isn't something going on here? 🤔

Oh no. It's not the only case. I just uploaded the error for one case. There are at least 15 other cases where the error occurs for that test unit test_matmul().

ravwojdyla · 2021-12-02T20:24:28Z

dask/array/routines.py

+        if is_cupy_type(a[0]):
+            import cupy
+
+            xp = cupy


Why is this needed? wouldn't numpy handle the dispatch to cupy below?

Honestly, I do not know. But the exact same construct is being used in _matmul() (which you allowed to remain), which is why I copied it over here.

@ParticularMiner that part actually came after my change via #7563. @quasiben would appreciate your input here? When is that is_cupy_type needed?

Sorry I assumed it was yours @ravwojdyla

is_cupy_type is normally needed when we need to support the non-NumPy API part of CuPy, such as cupyx.scipy, this is why matmul requires that check. I think the reduction here may not need the same, or at least I can't think of a need for that right now. I would then suggest just replacing all this by np.add and it should work. If gpuCI tests pass we should be good to what we know is supported today.

@quasiben please feel free to correct me.

Sorry @pentschev , I didn't see your comment above until just now. Nevertheless, I've already taken steps that agree with your comment. Thanks!

ravwojdyla · 2021-12-02T20:25:28Z

dask/array/routines.py

+    if keepdims:
+        return out
+    else:
+        return out.reshape(_shape_minus_axis(out.shape, axis[0]))


We can probably make the squeeze work (see the other comment), but otherwise if this method is used only once, probably not worth a complete new method.

I agree. Let me know, after you read my response to adding a squeeze method to EncapsulatedNDArray, what you think.

ParticularMiner · 2021-12-02T21:39:25Z

Thanks for your detailed review on this @ravwojdyla ! I really appreciate that.

It's getting to midnight here, so I'll likely respond to any more messages tomorrow.

ParticularMiner · 2021-12-03T03:16:30Z

@ravwojdyla

You will find that I added the squeeze() method to EncapsulateNDArray, discarded shape_minus_axis together with one conditional branch of _chunk_sum. Everything seems to work well with these changes. Performance seems to be even somewhat faster than before.

I guess the only outstanding issue is the one regarding the cupy type array.

Thanks.

ravwojdyla · 2021-12-04T01:10:41Z

@ParticularMiner perfect, thank you. I suspect that cupy branch is not needed, but there's a chance I don't fully understand why it would be there, maybe let's give @quasiben some time to respond to the comment in #8423 (comment)

ParticularMiner · 2021-12-04T04:12:02Z

Sure @ravwojdyla. Thanks for the great suggestions.

I suspect that cupy branch is not needed, but there's a chance I don't fully understand why it would be there, maybe let's give @quasiben some time to respond to the comment in #8423 (comment)

It seems @jakirkham might also be able to address this question:

As dask.array is right now, is it necessary to test for the cupy chunk type (as @quasiben introduced:

dask/dask/array/routines.py

Lines 335 to 347 in 4324780

    
           def _matmul(a, b): 
        
               xp = np 
        
               if is_cupy_type(a): 
        
                   import cupy 
        
                   xp = cupy 
        
               chunk = xp.matmul(a, b) 
        
               # Since we have performed the contraction via matmul 
        
               # but blockwise expects all dimensions back, we need 
        
               # to add one dummy dimension back 
        
               return chunk[..., xp.newaxis]

for dask.array.matmul in #7563) whenever one needs to use the cupy implementation of some numpy function like numpy.add or numpy.sum?
Is there a reason it would be less convenient to use a more centralized approach, like register the particular cupy function implementation in a "lookup registry", as has been done, for example, with concatenate for various chunk types including cupy itself?

dask/dask/array/backends.py

Lines 113 to 129 in 4324780

    
           @tensordot_lookup.register_lazy("cupy") 
        
           @concatenate_lookup.register_lazy("cupy") 
        
           def register_cupy(): 
        
               import cupy 
        
               from dask.array.dispatch import percentile_lookup 
        
               concatenate_lookup.register(cupy.ndarray, cupy.concatenate) 
        
               tensordot_lookup.register(cupy.ndarray, cupy.tensordot) 
        
               percentile_lookup.register(cupy.ndarray, percentile) 
        
               @einsum_lookup.register(cupy.ndarray) 
        
               def _cupy_einsum(*args, **kwargs): 
        
                   # NB: cupy does not accept `order` or `casting` kwargs - ignore 
        
                   kwargs.pop("casting", None) 
        
                   kwargs.pop("order", None) 
        
                   return cupy.einsum(*args, **kwargs)

ravwojdyla

Approving*, great work @ParticularMiner

* pending resolution of the question about the need for cupy check ( #8423 (comment)). @ParticularMiner that said if you prefer to merge sooner, and open an issue to discuss that branch/question etc, that sounds good to me too.

FYI @jsignell (and thanks for the ping).

ParticularMiner · 2021-12-04T05:09:12Z

That's fine, @ravwojdyla . It's not urgent to merge now.

pending resolution of the question about the need for cupy check ( #8423 (comment)). @ParticularMiner that said if you prefer to merge sooner, and open an issue to discuss that branch/question etc, that sounds good to me too.

jsignell · 2021-12-13T21:36:10Z

@ravwojdyla thank you so much for reviewing this work and thank you @ParticularMiner for engaging! I'll broaden the ping on #8423 (comment) a bit to see if anyone on @dask/gpu knows if that branch is still necessary.

dask/array/routines.py

Co-authored-by: Peter Andreas Entschev <peter@entschev.com>

ParticularMiner · 2021-12-14T05:11:36Z

@ravwojdyla
Thanks again for clearing things up!

@jsignell
It appears all outstanding issues here have been addressed, and this PR is now awaiting your deployment.
Many thanks for your effective oversight!

ParticularMiner · 2021-12-14T05:36:28Z

I believe the observed test-failures for Python 3.7 on Windows are not related to this PR. (Correct me if I'm wrong.)

jsignell · 2021-12-14T14:29:56Z

I agree that the failing tests are unrelated. They are also showing up on main.

jsignell · 2021-12-14T17:14:16Z

Thanks for this work! The end diff looks super clean :)

simplified the matmul() routine

f644660

github-actions bot added the array label Nov 27, 2021

ParticularMiner force-pushed the matmul branch 4 times, most recently from 7b1f63f to 6a1516d Compare November 28, 2021 08:16

ParticularMiner force-pushed the matmul branch 3 times, most recently from 273280d to b2a0538 Compare November 29, 2021 06:31

This was referenced Nov 29, 2021

Rewrite matmul as blockwise without concatenated contraction #7000

Merged

made a few changes in order to keep up with latest grblas version metagraph-dev/dask-grblas#4

Merged

ParticularMiner force-pushed the matmul branch from b2a0538 to 07f52a2 Compare November 30, 2021 04:45

accelerated matmul() by completely removing concatenation

afcc474

ParticularMiner force-pushed the matmul branch from 07f52a2 to afcc474 Compare November 30, 2021 06:28

ravwojdyla suggested changes Dec 2, 2021

View reviewed changes

ParticularMiner force-pushed the matmul branch from 0ddfe1f to fca4ad1 Compare December 2, 2021 14:29

followed reviewer @ravwojdyla's suggestions/corrections

bacb45a

ravwojdyla suggested changes Dec 2, 2021

View reviewed changes

added squeeze method to EncapsulateNDArray

f5cdb5a

ParticularMiner force-pushed the matmul branch from fca4ad1 to f5cdb5a Compare December 3, 2021 02:45

removed unnecessary branch for (0,)-shaped chunks in _chunk_sum

485b4f2

ParticularMiner force-pushed the matmul branch from 7fbc017 to 485b4f2 Compare December 3, 2021 02:55

ravwojdyla approved these changes Dec 4, 2021

View reviewed changes

pentschev reviewed Dec 13, 2021

View reviewed changes

dask/array/routines.py Outdated Show resolved Hide resolved

ParticularMiner and others added 2 commits December 14, 2021 00:42

removed is_cupy_type() test branch from _chunk_sum

5d7c24d

Co-authored-by: Peter Andreas Entschev <peter@entschev.com>

added backward-compatibility comment about cupy.matmul()

2dc309e

ParticularMiner force-pushed the matmul branch from 8f68728 to 2dc309e Compare December 14, 2021 04:42

jsignell merged commit 33b2886 into dask:main Dec 14, 2021

		return dot(a.conj().ravel(), b.ravel())


		def _shape_minus_axis(shape, axis):

	astype = wrap("astype")
	sum = wrap("sum")
	prod = wrap("prod")
	reshape = wrap("reshape")

Uh oh!

Conversation

ParticularMiner commented Nov 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

GPUtester commented Nov 27, 2021

Uh oh!

ParticularMiner commented Nov 28, 2021

Uh oh!

ParticularMiner commented Nov 28, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ParticularMiner commented Nov 30, 2021

Uh oh!

jsignell commented Nov 30, 2021

Uh oh!

ravwojdyla left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ParticularMiner Dec 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ParticularMiner Dec 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ParticularMiner commented Dec 2, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ParticularMiner commented Dec 2, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

ParticularMiner commented Nov 27, 2021 •

edited

Loading

ParticularMiner commented Nov 28, 2021 •

edited

Loading

ParticularMiner Dec 2, 2021 •

edited

Loading

ParticularMiner Dec 2, 2021 •

edited

Loading

ParticularMiner commented Dec 2, 2021 •

edited

Loading

ParticularMiner Dec 14, 2021 •

edited

Loading