Skip to content

Improve single chunk array handling in svd #6612

@eric-czech

Description

@eric-czech

The current SVD implementation has very unintuitive performance characteristics for small arrays:

import dask.array as da
# Create short-fat (small) single-chunk array
x = da.random.random((100, 100000), chunks=(-1, -1))
x.nbytes # 80000000 = 80MB

# Despite x being 80MB, this uses ~80 **GB** of RAM and
# I have no idea how long it would take to run (at least more than 5 minutes)
da.linalg.svd(x)[0].mean().compute()

# This on the other hand runs almost instantly and barely uses any memory
da.linalg.svd(x.T)[2].T.mean().compute() # the U array is V.T when given x.T

With #6591 in place, it would make sense to identify single chunk arrays and transpose them first to avoid this. That still wouldn't stop a user from providing an array with weird chunking, but I think a special case for single-chunk arrays is appropriate.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions