Depends what you mean by everything, but this should work:
using CUDA
A = CUDA.randn(100, 5, 5)
x = CUDA.randn(100, 5)
y = CUDA.zeros(100, 5)
for i = 1:100
@views y[i, :] .= A[i, :, :] \ y[i, :]
end
It's just going to launch 100 GPU kernels, which is not very efficient in ML use cases (and also indexing is reversed in julia so this is not good). I don't know how to do it with one cuda call (but I don't know anything about GPUs)
Depends what you mean by everything, but this should work:
using CUDA
A = CUDA.randn(100, 5, 5)
x = CUDA.randn(100, 5)
y = CUDA.zeros(100, 5)
for i = 1:100
@views y[i, :] .= A[i, :, :] \ y[i, :]
end
It's just going to launch 100 GPU kernels, which is not very efficient in ML use cases (and also indexing is reversed in julia so this is not good). I don't know how to do it with one cuda call (but I don't know anything about GPUs)