vulkan: add more num_blocks instantiations in rms_norm by jeffbolznv · Pull Request #17701 · ggml-org/llama.cpp

jeffbolznv · 2025-12-02T19:38:12Z

I noticed a couple models had rms_norm running slower than others, because the sizes from 8-16 were all doing 16 iterations. This change adds a couple more special cases to improve these sizes:

DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf
before:
RMS_NORM_MUL RMS_NORM(5120,1,1,1): 124257 x 6.172 us
| qwen2 14B Q4_K - Medium        |   8.37 GiB |    14.77 B | Vulkan     |  99 |  1 |           tg128 |        136.93 ± 0.09 |

after: 
RMS_NORM_MUL RMS_NORM(5120,1,1,1): 124257 x 4.274 us
| qwen2 14B Q4_K - Medium        |   8.37 GiB |    14.77 B | Vulkan     |  99 |  1 |           tg128 |        139.89 ± 0.15 |

Mistral-22B-v0.2-Q4_K_M.gguf
before:
RMS_NORM_MUL RMS_NORM(6144,1,1,1): 144753 x 6.215 us
| llama ?B Q4_K - Medium         |  12.42 GiB |    22.24 B | Vulkan     |  99 |  1 |           tg128 |         95.67 ± 0.14 |

after:
RMS_NORM_MUL RMS_NORM(6144,1,1,1): 144753 x 4.766 
| llama ?B Q4_K - Medium         |  12.42 GiB |    22.24 B | Vulkan     |  99 |  1 |           tg128 |         97.10 ± 0.08 |

The perf_logger results are over a full run of tg128 with -r 10.

vulkan: add more num_blocks instantiations in rms_norm

b7e83f7

jeffbolznv requested a review from 0cc4m as a code owner December 2, 2025 19:38

github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Dec 2, 2025

0cc4m approved these changes Dec 5, 2025

View reviewed changes

0cc4m merged commit 933414c into ggml-org:master Dec 5, 2025
58 of 63 checks passed

JayZenith pushed a commit to JayZenith/llama.cpp that referenced this pull request Dec 7, 2025

vulkan: add more num_blocks instantiations in rms_norm (ggml-org#17701)

b41aa80

gabe-l-hart mentioned this pull request Dec 10, 2025

feat: llama.cpp bump (17f7f4) for SSM performance improvements ollama/ollama#13408

Merged

0Marble pushed a commit to 0Marble/llama.cpp that referenced this pull request Dec 18, 2025

vulkan: add more num_blocks instantiations in rms_norm (ggml-org#17701)

8e8b729

Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026

vulkan: add more num_blocks instantiations in rms_norm (ggml-org#17701)

e3ecd31

blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026

vulkan: add more num_blocks instantiations in rms_norm (#17701)

a288939

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: add more num_blocks instantiations in rms_norm#17701

vulkan: add more num_blocks instantiations in rms_norm#17701
0cc4m merged 1 commit intoggml-org:masterfrom
jeffbolznv:rms_norm_more_cases

jeffbolznv commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jeffbolznv commented Dec 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants