vulkan: fix top_k bug when there are ties in the input by jeffbolznv · Pull Request #17659 · ggml-org/llama.cpp

jeffbolznv · 2025-12-01T16:34:09Z

And update tests to exercise this case.

This is stacked on #17623.

- Compute row size for the temp buffer based on the output of the first pass. - Update shader addressing math to use the output row size - Pass the output row size as "ncols_output", what used to be "ncols_output" is now "k" For the common case of K=40 and src0=(200000,1,1,1), this reduces the temporary buffer from about 3.2MB to 500KB.

I noticed by inspection a bug in the vulkan top_k shader where if the least value in the top_k appears multiple times we could end up writing those extra copies out rather than some larger values (if the larger values are on higher numbered threads). I rewrote the test verification to handle this case, where the final index set is not necessarily the same.

ggerganov

Ack on the test-backend-ops changes

tests/test-backend-ops.cpp

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

* vulkan: Reduce temporary memory usage for TOP_K - Compute row size for the temp buffer based on the output of the first pass. - Update shader addressing math to use the output row size - Pass the output row size as "ncols_output", what used to be "ncols_output" is now "k" For the common case of K=40 and src0=(200000,1,1,1), this reduces the temporary buffer from about 3.2MB to 500KB. * vulkan: fix top_k bug when there are ties in the input I noticed by inspection a bug in the vulkan top_k shader where if the least value in the top_k appears multiple times we could end up writing those extra copies out rather than some larger values (if the larger values are on higher numbered threads). I rewrote the test verification to handle this case, where the final index set is not necessarily the same. * Update tests/test-backend-ops.cpp Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> --------- Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

jeffbolznv added 2 commits November 30, 2025 09:49

jeffbolznv requested review from 0cc4m and ggerganov as code owners December 1, 2025 16:34

loci-dev mentioned this pull request Dec 1, 2025

UPSTREAM PR #17659: vulkan: fix top_k bug when there are ties in the input auroralabs-loci/llama.cpp#391

Open

github-actions bot added testing Everything test related Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Dec 1, 2025

ggerganov approved these changes Dec 2, 2025

View reviewed changes

tests/test-backend-ops.cpp Outdated Show resolved Hide resolved

Update tests/test-backend-ops.cpp

4c57520

Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>

0cc4m approved these changes Dec 5, 2025

View reviewed changes

0cc4m merged commit a0f3897 into ggml-org:master Dec 5, 2025
68 of 74 checks passed

gabe-l-hart mentioned this pull request Dec 10, 2025

feat: llama.cpp bump (17f7f4) for SSM performance improvements ollama/ollama#13408

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: fix top_k bug when there are ties in the input#17659

vulkan: fix top_k bug when there are ties in the input#17659
0cc4m merged 3 commits intoggml-org:masterfrom
jeffbolznv:topk_ties

jeffbolznv commented Dec 1, 2025

Uh oh!

ggerganov left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jeffbolznv commented Dec 1, 2025

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants