Skip to content

slow SpMV  #2790

@smarkesini

Description

@smarkesini

Hello,
I mention this issue earlier in #2365 as performance issue; the bug was fixed but performance is still an issue. When I perform a sparse matrix vector multiply using CSR format, it is over 200 times slower the first time it is computed and it is always over 200 times slower when multiplying a sparse matrix in CSC format with a vector from the left hand side. Internally cuda should perform the same operation, and there should be no need to re-sort the entries twice.

The attached sparse_test.py performs the same computation in different ways, compares with scipy and csc/csr formats, using a sparse matrix with dimension (10k by 10k) 10M non-zero elements giving the following output:

all cpu-gpu close?: True True
all lm/rm close? : True True
all 1st/2nd close?: True True
time scipy lm/rm :0.00446,0.00461
wall time cupy lm (1st and 2nd time) :0.0686, 0.00024
wall time cupy rm (1st and 2nd time) :0.0594, 0.061
cuda time cupy lm (1st and 2nd time):0.0594, 0.000229
cuda time cupy lm (1st and 2nd time):0.0593, 0.061

The performance issue goes away if modify cupyx/scipy/sparse/coo.py as attached by setting _has_canonical_format=True after sorting. I am not sure if it has other unintended consequences, but I have not observed any issues in my code.

sparse_test.py.txt
coo.py.txt


configuration: ubuntu 18.04, cuda 10.1, rtx-titan gpu

CuPy Version : 7.0.0
CUDA Root : /usr/local/cuda-10.1
CUDA Build Version : 10010
CUDA Driver Version : 10010
CUDA Runtime Version : 10010
cuBLAS Version : 10201
cuFFT Version : 10101
cuRAND Version : 10101
cuSOLVER Version : (10, 2, 0)
cuSPARSE Version : 10300
NVRTC Version : (10, 1)
cuDNN Build Version : 7605
cuDNN Version : 7605
NCCL Build Version : 2402
NCCL Runtime Version : 2402

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions