CUDA: use registers instead of smem in topk-moe by am17an · Pull Request #16647 · ggml-org/llama.cpp

am17an · 2025-10-18T05:35:59Z

Uses the technique used in the vulkan PR #16641. Neat trick!

Uses the technique used in the vulkan PR ggml-org#16641. Neat trick!

am17an · 2025-10-18T07:15:24Z

I am not able to restart tests for CI, The nvidia-cuda test is failing with

The self-hosted runner lost communication with the server. Verify the machine is running and has a healthy network connection. Anything in your workflow that terminates the runner process, starves it for CPU/Memory, or blocks its network access can cause this error.

I'd rather not push another commit to trigger the CI again. @ggerganov is this a permission issue and can it be changed?

JohannesGaessler · 2025-10-18T08:27:06Z

If one first clicks on a job and then one of the buttons it is possible to manually re-run it. Since permissions were recently tightened I don't know whether you could do that yourself though (I just did).

am17an · 2025-10-18T08:44:03Z

If one first clicks on a job and then one of the buttons it is possible to manually re-run it. Since permissions were recently tightened I don't know whether you could do that yourself though (I just did).

Yes I was able to do that earlier as well

ggerganov · 2025-10-18T09:54:48Z

@am17an It seems that approving workflows requires write access, so I just loosened up the approval requirement to "first-time contributors" from "all external contributors":

Your workflows should always run now without additional approval.

If this becomes too heavy for the CI or we think it can become a security concern, we might switch back to manual workflow approval for all PRs.

am17an · 2025-10-18T10:20:59Z

Thanks. Seems like "collaborator" is not a well thought design by Github, my intuition would be a collaborator would be able to run workflows on their own PRs (at the least)

am17an · 2025-10-18T10:38:30Z

Actually I'm still not able to re-run jobs. This is the screen I see

Uses the technique used in the vulkan PR ggml-org#16641. Neat trick!

Uses the technique used in the vulkan PR #16641. Neat trick!

CUDA: use registers instead of smem in topk-moe

06cd6bd

Uses the technique used in the vulkan PR ggml-org#16641. Neat trick!

github-actions bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Oct 18, 2025

am17an mentioned this pull request Oct 18, 2025

CUDA: add a fused top-K MoE kernel #16130

Merged

JohannesGaessler approved these changes Oct 18, 2025

View reviewed changes

JohannesGaessler merged commit 38355c6 into ggml-org:master Oct 18, 2025
125 of 126 checks passed

am17an deleted the cuda_topk_optimize branch October 18, 2025 10:21

pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025

CUDA: use registers instead of smem in topk-moe (ggml-org#16647)

ac040c3

Uses the technique used in the vulkan PR ggml-org#16641. Neat trick!

Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026

CUDA: use registers instead of smem in topk-moe (ggml-org#16647)

d0ade76

Uses the technique used in the vulkan PR ggml-org#16641. Neat trick!

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

CUDA: use registers instead of smem in topk-moe (#16647)

f2d0793

Uses the technique used in the vulkan PR #16641. Neat trick!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA: use registers instead of smem in topk-moe#16647

CUDA: use registers instead of smem in topk-moe#16647
JohannesGaessler merged 1 commit intoggml-org:masterfrom
am17an:cuda_topk_optimize

am17an commented Oct 18, 2025

Uh oh!

am17an commented Oct 18, 2025

Uh oh!

JohannesGaessler commented Oct 18, 2025

Uh oh!

am17an commented Oct 18, 2025

Uh oh!

Uh oh!

ggerganov commented Oct 18, 2025

Uh oh!

am17an commented Oct 18, 2025

Uh oh!

am17an commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

am17an commented Oct 18, 2025

Uh oh!

am17an commented Oct 18, 2025

Uh oh!

JohannesGaessler commented Oct 18, 2025

Uh oh!

am17an commented Oct 18, 2025

Uh oh!

Uh oh!

ggerganov commented Oct 18, 2025

Uh oh!

am17an commented Oct 18, 2025

Uh oh!

am17an commented Oct 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants