Skip to content

metal : fuse NORM + MUL + ADD, support non-multiples of 4#16220

Merged
ggerganov merged 3 commits intomasterfrom
gg/metal-norm-generic
Sep 25, 2025
Merged

metal : fuse NORM + MUL + ADD, support non-multiples of 4#16220
ggerganov merged 3 commits intomasterfrom
gg/metal-norm-generic

Conversation

@ggerganov
Copy link
Member

Unify the RMS_NORM and NORM implementations and extend support for more shapes.

@ggerganov ggerganov requested a review from slaren as a code owner September 24, 2025 10:36
@github-actions github-actions bot added testing Everything test related ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Sep 24, 2025
@ggerganov ggerganov merged commit dfcd53f into master Sep 25, 2025
1 check passed
@ggerganov ggerganov deleted the gg/metal-norm-generic branch September 25, 2025 08:30
@joseph777111
Copy link

joseph777111 commented Sep 26, 2025

Superior inference quality exhibited on METAL after updating to the current (at the time) version of llama.cpp (835b2b9). The difference is literally night and day - this is incredibly noticeable when running quantized versions of gpt-oss-20B. And, the improvements seem to enhance all quantized models run with METAL. Thank you, @ggerganov! This is the smartest gpt-oss-20B has ever been on my M1 MacBook Pro. I appreciate all that you and the llama.cpp team do. You guys are the best! 😋

struct pushed a commit to struct/llama.cpp that referenced this pull request Sep 26, 2025
…6220)

* metal : fuse NORM + MUL + ADD

* metal : support norms of non-multiple of 4

* cont : fix comment [no ci]
pwilkin pushed a commit to pwilkin/llama.cpp that referenced this pull request Oct 23, 2025
…6220)

* metal : fuse NORM + MUL + ADD

* metal : support norms of non-multiple of 4

* cont : fix comment [no ci]
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
…6220)

* metal : fuse NORM + MUL + ADD

* metal : support norms of non-multiple of 4

* cont : fix comment [no ci]
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* metal : fuse NORM + MUL + ADD

* metal : support norms of non-multiple of 4

* cont : fix comment [no ci]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants