Quick fix for unbounded memory usage in tensordot #7980
Conversation
|
Here's a teeny tiny example to show with the graph visualization where the problem is. import dask.array as da
x = da.random.random((10, 10), chunks=(10, 1))
y = da.random.random((10, 10), chunks=(1, 10))
z = da.tensordot(x, y, axes=(1, 0))
z.visualize()After #6846 everything gets shoved into a single tensordot operation at the very end of the computation graph. This happens regardless of the size of the input arrays, and is the reason we can run out of memory. This is what it looks like before that change (and again with the fix for non-sparse arrays in this PR). The layers of sum aggregation really take the pressure off. |
|
I had been assuming that the sparse arrays on the cpu would function equivalent to sparse arrays on the GPU. Looking at the GPU CI (thanks for setting this up @quasiben & team) it appears that assumption is faulty. Unfortunately I haven't been able to get cupy working on my laptop again since the last time it broke. As I recall it was a pretty fragile installation to begin with, and following the (very detailed) notes I took about how we got it working earlier isn't producing the same result this time. I'm having problems selecting the right NVIDIA driver - I need it to (1) make cupy happy, and (2) still allow me to use my external monitor. |
|
Giving up on cupy for tonight - I have CUDA but not cupy working. I'm getting an incompatible version, not sure how to force it to be the correct version since it looks like I'm doing the right things to pin it already. I've posted on the RAPIDS slack to see if anyone has more advice & I'll come back to it tomorrow. (Would Coiled be useful as a workaround for these install issues @mrocklin? You suggested I consider using it for some of the tensordot work, but I think GPU compute was still a little experimental last time I looked.) |
@quasiben does NVIDIA have anyone who could look at restoring cupy sparse support here? |
|
@anaruse or @pentschev if you have time to comment on this it would be appreciated -- currently things are failing in the cupy's sparse/_index.py . When running locally I get some more output: |
|
From a quick check, this seems to have been introduced in CuPy 9.0.0, there are no errors in 8.6.0. |
|
It seems also that the I must admit I'm a bit ignorant of the code change, but it seems to be due to a change in the input/output dimensions. @GenevieveBuckley could you help me understand what this change should mean w.r.t. dimensionality changes? |
|
Perhaps this test is suspect (though it does pass now. I say suspect because when I convert that test to numpy / scipy it also fails with a similar @pytest.mark.parametrize("sp_format", ["csr", "csc"])
def test_np_sparse_dot(sp_format):
import scipy.sparse
if sp_format == "csr":
sp_matrix = scipy.sparse.csr_matrix
elif sp_format == "csc":
sp_matrix = scipy.sparse.csc_matrix
dtype = "f"
density = 0.3
x_shape, x_chunks = (4, 8), (2, 4)
y_shape, y_chunks = (8, 6), (4, 3)
x = np.random.random(x_shape)
y = np.random.random(y_shape)
x[x < 1 - density] = 0
y[y < 1 - density] = 0
z = x.dot(y)
da_x = da.from_array(x, chunks=x_chunks, asarray=False, fancy=False)
da_y = da.from_array(y, chunks=y_chunks, asarray=False, fancy=False)
da_x = da_x.map_blocks(sp_matrix, dtype=dtype)
da_y = da_y.map_blocks(sp_matrix, dtype=dtype)
da_z = da.dot(da_x, da_y).compute()
assert scipy.sparse.isspmatrix(da_z)
assert_eq(z, da_z.todense()) |
What do you mean by "though it does pass now"? Passes locally? The gpuCI still marks a lot of failed tests) I'll look more at When I make a similar version, using the pydata sparse library (not def test_pydata_sparse_dot():
import sparse
density = 0.3
x_shape, x_chunks = (4, 8), (2, 4)
y_shape, y_chunks = (8, 6), (4, 3)
x = np.random.random(x_shape)
y = np.random.random(y_shape)
x[x < 1 - density] = 0
y[y < 1 - density] = 0
z = x.dot(y)
da_x = da.from_array(x, chunks=x_chunks, asarray=False, fancy=False)
da_y = da.from_array(y, chunks=y_chunks, asarray=False, fancy=False)
da_x = da_x.map_blocks(sparse.COO.from_numpy, dtype=x.dtype)
da_y = da_y.map_blocks(sparse.COO.from_numpy, dtype=y.dtype)
da_z = da.dot(da_x, da_y).compute()
assert isinstance(da_z, sparse.COO)
assert_eq(z, da_z.todense()) |
I don't think there should be changes to the input/output dimensions, so that's a bit worrying. EDIT: It looks like that line is trying to do the equivalent of |
Sorry, I thought they were run again already. Those other tests should have been fixed when we upgraded to NumPy 1.21 in rapidsai/dask-build-environment#3 (which is the case for gpuCI now), but also resolved in #7982 by skipping them with NumPy < 1.20. I'll rerun gpuCI, those other failures should hopefully go away. |
|
rerun tests |
|
In the latest gpuCI run, all those other tests that were previously failing now pass, the only failures left are the @GenevieveBuckley I'm not super familiar with the |
|
I think I've got it now - it's only failing when we input mixed array types into tensordot (eg: a numpy and a sparse array). I need to pass in another argument to Thanks for your help here @pentschev |
|
Here's some tests that can be run locally to check different input array types (you can uncomment the cupyx lines if you have cupy available). import pytest
import dask.array as da
from dask.array.utils import assert_eq, same_keys
import numpy as np
import sparse
import scipy.sparse
#import cupyx.sparse
@pytest.mark.parametrize("func, arraytype", [
(lambda x: x, 'numpy'),
(sparse.COO.from_numpy, 'sparse'),
(scipy.sparse.csr_matrix, 'sparse'),
(scipy.sparse.csc_matrix, 'sparse'),
#(cupyx.sparse.csr_matrix, 'sparse'),
#(cupyx.sparse.csr_matrix, 'sparse'),
])
def test_tensordot_arraytypes(func, arraytype):
x = np.arange(400).reshape((20, 20))
a = da.from_array(x, chunks=(5, 4))
y = np.arange(200).reshape((20, 10))
b = da.from_array(y, chunks=(4, 5))
a = a.map_blocks(func)
b = b.map_blocks(func)
assert arraytype in str(type(a._meta))
assert arraytype in str(type(b._meta))
z = np.tensordot(x, y, axes=(1, 0))
c = da.tensordot(a, b, axes=(1, 0))
assert_eq(c, z, check_meta=False, check_type=False)
for axes in [1, (1, 0)]:
assert_eq(da.tensordot(a, b, axes=axes), np.tensordot(x, y, axes=axes), check_meta=False, check_type=False)
assert_eq(da.tensordot(x, b, axes=axes), np.tensordot(x, y, axes=axes), check_meta=False, check_type=False)
assert_eq(da.tensordot(a, y, axes=axes), np.tensordot(x, y, axes=axes), check_meta=False, check_type=False)
assert same_keys(da.tensordot(a, b, axes=(1, 0)), da.tensordot(a, b, axes=(1, 0))) |
|
Ah, and the final assert statement in I can pass in Separately, a lot of the sparse array stuff doesn't necessarily propagate the metadata for the array type properly through the whole dask computation. We might look at that more broadly later and try and preserve this information. |
|
Here's what the task graphs look like again import sparse
import scipy.sparse
import dask.array as da
from dask.array.utils import assert_eq
x = da.random.random((2, 3, 4), chunks=(1, 2, 2))
x[x < 0.8] = 0
y = da.random.random((4, 3, 2), chunks=(2, 2, 1))
y[y < 0.8] = 0
# dask arrays with numpy chunks
z = da.tensordot(x, y, axes=((1, 2), (1, 0)))
# dask arrays with pydata sparse chunks
x_sparse = x.map_blocks(sparse.COO.from_numpy)
y_sparse = y.map_blocks(sparse.COO.from_numpy)
z_sparse = da.tensordot(x_sparse, y_sparse, axes=((1, 2), (1, 0)))
# dask arrays with scipy sparse csr_matrix chunks
x_csrmatrix = x.map_blocks(scipy.sparse.csr_matrix, dtype=x.dtype)
y_csrmatrix = y.map_blocks(scipy.sparse.csr_matrix, dtype=y.dtype)
z_csrmatrix = da.tensordot(x_csrmatrix, y_csrmatrix, axes=((1, 2), (1, 0)))
# dask arrays with scipy sparse csc_matrix chunks
x_sparse_csc_matrix = x.map_blocks(scipy.sparse.csc_matrix, dtype=x.dtype)
y_sparse_csc_matrix = y.map_blocks(scipy.sparse.csc_matrix, dtype=y.dtype)
z_sparse_csc_matrix = da.tensordot(x_sparse_csc_matrix, y_sparse_csc_matrix, axes=((1, 2), (1, 0)))
z.visualize() # numpy chunks
z_sparse.visualize() # pydata sparse chunks
z_csrmatrix.visualize() # scipy sparse csc_matrix chunks
z_sparse_csc_matrix.visualize() # scipy sparse csc_matrix chunks |
|
Not the cleanest way to resolve the issue with dask/dask/array/tests/test_cupy.py Lines 1165 to 1166 in c59a509 da_x = da_x.map_blocks(sp_matrix, meta=sp_matrix(cupy.array([0], dtype=dtype)))
da_y = da_y.map_blocks(sp_matrix, meta=sp_matrix(cupy.array([0], dtype=dtype)))Could you try that @GenevieveBuckley ? |
|
It's worth noting that the pydata sparse arrays are being caught by my check for sparse arrays and forced down the less efficient code path - but there's no real technical reason we need to do this. Pydata sparse arrays work just as well as numpy arrays do, because they can be n-dimensional. ... writing this out tells me I should go back and make my check for sparse arrays exclude the pydata sparse arrays. EDIT: Done |
|
I was able to confirm @pentschev's fixes for the test though I know that's not what's being discussed in your last few comments @GenevieveBuckley |
Thanks @quasiben Details: |
|
It's possible we still need to check against this concern, too
EDIT: yes, this is still a thing with this PR #6907 |
|
Marking this PR as a draft, so it doesn't accidentally get merged. Needs a simultaneous fix for the unbounded memory usage while also not reintroducing this error #6907 EDIT: looked into 6907 and uncovered more weirdness, detailed here #6907 (comment) |
Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>
Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>
Co-authored-by: Klaus Zimmermann <klaus.zimmermann@smhi.se>
|
Thank you for the excellent explanation @zklaus I just tried your suggestions, but it seems the problem in issue 6907 still exists. I've added this example as a regression test in this PR (I figured it's helpful to have included in the CI runs). |
|
I think maybe it has to do with the way dask names the task keys? |
|
Since we've been blocked here for a while and it seems that the issue only happens for square matrices, if people (perhaps @mrocklin) would like to have the issue solved even if not completely for now, I propose doing the following:
Thoughts on the above? |
|
Thoughts: It's possible it isn't a square/non-square chunk thing. I was playing with very tiny arrays there, we might find that it's a case between "chunk spans a full row / chunk spans a full column" vs "chunk does not span a full row / a full column". I could do some more experimenting to find out if that's the case? |
|
@GenevieveBuckley can you merge |
|
@dcherian , tests are running and looking good |
|
Wonderful. Thanks @martindurant ! |
|
Windows failure is about missing PDF backend for graphviz |
|
@dask/maintenance - I don't know much about the implementation here, but things seem to pass aside from something completely unrelated. Does anyone have a chance to look this over? |
|
To try and capture the potential breaking points we discussed in #7980 (comment) and #7980 (comment), I opened GenevieveBuckley#1 against @GenevieveBuckley 's branch. Locally the tests all pass, so it would be great if we could have them in before merging this, but otherwise I think it covers the issues we've seen with CuPy as well. Thanks everyone for the work in this very extensive issue! |
|
If @GenevieveBuckley is not available, we can either push to this branch (maintainers can), or start a new one; but only she can merge your PR into her branch. |
|
Thank you @pentschev @martindurant and @dcherian! I am going to pull in the @pentschev's changes to this branch. |
|
Ok this is in!! |
|
Great thanks everyone |
|
Thanks for picking this up @[pentschev |







This is a quick fix for issue #6916
It removes the memory problem for non-sparse array types. Note that tensordot with sparse arrays still needs a proper fix for the memory issues observed after #6846
added/ passedblack dask/flake8 dask/isort dask