cuda : fix supports_op condition for get_rows when number of blocks is too large by ggerganov · Pull Request #15868 · ggml-org/llama.cpp

ggerganov · 2025-09-08T07:15:06Z

Mark this case as unsupported until actual support is implemented.

ggml-ci

JohannesGaessler · 2025-09-08T07:31:56Z

The value ne12 is used in the CUDA code, but I think the indices are being calculated incorrectly. In the CPU code:

const int64_t i12 = i03%ne12;
const int64_t i11 = i02%ne11;
const int64_t i10 = i;

In the CUDA code:

const int i10 = blockIdx.x;
const int i11 = blockIdx.z / ne12; // gridDim.z == ne11*ne12
const int i12 = blockIdx.z % ne12;

In the CUDA code the same values are used for i11/i01 and i12/i02.

JohannesGaessler

If it fixes an immediate issue it would still be fine to merge this for now. But please add a FIXME comment.

ggerganov · 2025-09-08T07:40:01Z

Ok, I didn't look in the implementation and assumed it was not implemented. So, will update the PR to fix implementation.

In the CUDA code the same values are used for i11/i01 and i12/i02.

The intention of the operator is that i10 queries rows from src0, hence it corresponds to i01. Respectively:

i10 -> i01
i11 -> i02
i12 -> i03

So I think the CPU implementation is correct. Looking into this.

ggml-ci

ggerganov · 2025-09-08T08:50:50Z

The CUDA implementation is correct. The problem is that in one of the new GET_ROWS tests, the number of blocks along the 3rd dimension of the kernel exceeds 65536:

llama.cpp/ggml/src/ggml-cuda/getrows.cu

Line 134 in 2aee620

const dim3 block_nums(ne10, MIN(block_num_y, MAX_GRIDDIM_Y), ne11*ne12);

Here ne11*n12 > 2^16 and it causes the kernel launch to fail.

For now, I updated the support_op condition to bail out in such cases. Will leave it to you to add proper support for larger sizes.

…s too large (#15868) * cuda : fix supports_op condition for get_rows when src1->ne2 > 1 ggml-ci * ggml : add comment about ggml_get_rows ggml-ci * cuda : add FIXME [no ci] * cuda : update support condition ggml-ci

cuda : fix supports_op condition for get_rows when src1->ne2 > 1

4c1c270

ggml-ci

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Sep 8, 2025

ggerganov requested a review from JohannesGaessler September 8, 2025 07:16

CISC mentioned this pull request Sep 8, 2025

model : avoid ggml_cont_3d for fused QKV weights #15662

Merged

JohannesGaessler approved these changes Sep 8, 2025

View reviewed changes

ggerganov added 3 commits September 8, 2025 11:17

ggml : add comment about ggml_get_rows

c453e5e

ggml-ci

cuda : add FIXME [no ci]

8c2a5fa

cuda : update support condition

2aee620

ggml-ci

ggerganov changed the title ~~cuda : fix supports_op condition for get_rows when src1->ne2 > 1~~ cuda : fix supports_op condition for get_rows when number of blocks is too large Sep 8, 2025

ggerganov merged commit b0d5299 into master Sep 8, 2025
51 of 55 checks passed

ggerganov deleted the gg/cuda-fix-supports-get-rows branch September 8, 2025 10:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

cuda : fix supports_op condition for get_rows when number of blocks is too large#15868

cuda : fix supports_op condition for get_rows when number of blocks is too large#15868
ggerganov merged 4 commits intomasterfrom
gg/cuda-fix-supports-get-rows

ggerganov commented Sep 8, 2025

Uh oh!

JohannesGaessler commented Sep 8, 2025 •

edited

Loading

Uh oh!

JohannesGaessler left a comment •

edited

Loading

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ggerganov commented Sep 8, 2025

Uh oh!

JohannesGaessler commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JohannesGaessler left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

ggerganov commented Sep 8, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JohannesGaessler commented Sep 8, 2025 •

edited

Loading

JohannesGaessler left a comment •

edited

Loading