cuda : fix supports_op condition for get_rows when number of blocks is too large#15868
cuda : fix supports_op condition for get_rows when number of blocks is too large#15868
Conversation
|
The value const int64_t i12 = i03%ne12;
const int64_t i11 = i02%ne11;
const int64_t i10 = i;In the CUDA code: const int i10 = blockIdx.x;
const int i11 = blockIdx.z / ne12; // gridDim.z == ne11*ne12
const int i12 = blockIdx.z % ne12;In the CUDA code the same values are used for |
|
Ok, I didn't look in the implementation and assumed it was not implemented. So, will update the PR to fix implementation.
The intention of the operator is that i10 queries rows from So I think the CPU implementation is correct. Looking into this. |
|
The CUDA implementation is correct. The problem is that in one of the new GET_ROWS tests, the number of blocks along the 3rd dimension of the kernel exceeds 65536: llama.cpp/ggml/src/ggml-cuda/getrows.cu Line 134 in 2aee620 Here For now, I updated the |
…s too large (#15868) * cuda : fix supports_op condition for get_rows when src1->ne2 > 1 ggml-ci * ggml : add comment about ggml_get_rows ggml-ci * cuda : add FIXME [no ci] * cuda : update support condition ggml-ci
cont #15687
Mark this case as unsupported until actual support is implemented.