Skip to content

vulkan: add more num_blocks instantiations in rms_norm#17701

Merged
0cc4m merged 1 commit intoggml-org:masterfrom
jeffbolznv:rms_norm_more_cases
Dec 5, 2025
Merged

vulkan: add more num_blocks instantiations in rms_norm#17701
0cc4m merged 1 commit intoggml-org:masterfrom
jeffbolznv:rms_norm_more_cases

Conversation

@jeffbolznv
Copy link
Collaborator

I noticed a couple models had rms_norm running slower than others, because the sizes from 8-16 were all doing 16 iterations. This change adds a couple more special cases to improve these sizes:

DeepSeek-R1-Distill-Qwen-14B-Q4_K_M.gguf
before:
RMS_NORM_MUL RMS_NORM(5120,1,1,1): 124257 x 6.172 us
| qwen2 14B Q4_K - Medium        |   8.37 GiB |    14.77 B | Vulkan     |  99 |  1 |           tg128 |        136.93 ± 0.09 |

after: 
RMS_NORM_MUL RMS_NORM(5120,1,1,1): 124257 x 4.274 us
| qwen2 14B Q4_K - Medium        |   8.37 GiB |    14.77 B | Vulkan     |  99 |  1 |           tg128 |        139.89 ± 0.15 |

Mistral-22B-v0.2-Q4_K_M.gguf
before:
RMS_NORM_MUL RMS_NORM(6144,1,1,1): 144753 x 6.215 us
| llama ?B Q4_K - Medium         |  12.42 GiB |    22.24 B | Vulkan     |  99 |  1 |           tg128 |         95.67 ± 0.14 |

after:
RMS_NORM_MUL RMS_NORM(6144,1,1,1): 144753 x 4.766 
| llama ?B Q4_K - Medium         |  12.42 GiB |    22.24 B | Vulkan     |  99 |  1 |           tg128 |         97.10 ± 0.08 |

The perf_logger results are over a full run of tg128 with -r 10.

@jeffbolznv jeffbolznv requested a review from 0cc4m as a code owner December 2, 2025 19:38
@github-actions github-actions bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Dec 2, 2025
@0cc4m 0cc4m merged commit 933414c into ggml-org:master Dec 5, 2025
58 of 63 checks passed
JayZenith pushed a commit to JayZenith/llama.cpp that referenced this pull request Dec 7, 2025
0Marble pushed a commit to 0Marble/llama.cpp that referenced this pull request Dec 18, 2025
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants