20 Comments
User's avatar
⭠ Return to thread
Antoine Levitt's avatar

Depends what you mean by everything, but this should work:

using CUDA

A = CUDA.randn(100, 5, 5)

x = CUDA.randn(100, 5)

y = CUDA.zeros(100, 5)

for i = 1:100

@views y[i, :] .= A[i, :, :] \ y[i, :]

end

It's just going to launch 100 GPU kernels, which is not very efficient in ML use cases (and also indexing is reversed in julia so this is not good). I don't know how to do it with one cuda call (but I don't know anything about GPUs)