Skip to content

metal : consolidate bin kernels#19390

Merged
ggerganov merged 3 commits intomasterfrom
gg/metal-bin-opt
Feb 7, 2026
Merged

metal : consolidate bin kernels#19390
ggerganov merged 3 commits intomasterfrom
gg/metal-bin-opt

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Feb 6, 2026

Refactor and consolidate the implementation of the binary Metal kernels.

Model Test t/s master t/s gg/metal-bin-opt Speedup
deepseek2 30B.A3B Q8_0 pp1 61.70 62.56 1.01
deepseek2 30B.A3B Q8_0 pp2 93.80 98.81 1.05
deepseek2 30B.A3B Q8_0 pp3 110.17 114.29 1.04
deepseek2 30B.A3B Q8_0 pp4 128.77 132.67 1.03
deepseek2 30B.A3B Q8_0 pp5 136.85 140.46 1.03
deepseek2 30B.A3B Q8_0 pp6 158.44 161.87 1.02
deepseek2 30B.A3B Q8_0 pp7 165.29 169.24 1.02
deepseek2 30B.A3B Q8_0 pp8 173.63 177.12 1.02
deepseek2 30B.A3B Q8_0 pp16 185.31 186.85 1.01
deepseek2 30B.A3B Q8_0 pp32 310.21 313.34 1.01
deepseek2 30B.A3B Q8_0 pp64 531.47 534.13 1.01
deepseek2 30B.A3B Q8_0 pp128 838.32 838.40 1.00
deepseek2 30B.A3B Q8_0 pp512 1477.68 1464.33 0.99
deepseek2 30B.A3B Q8_0 tg32 63.24 64.34 1.02
gemma3 1B Q4_0 pp1 215.76 233.76 1.08
gemma3 1B Q4_0 pp2 404.34 405.30 1.00
gemma3 1B Q4_0 pp3 532.60 533.82 1.00
gemma3 1B Q4_0 pp4 672.13 683.52 1.02
gemma3 1B Q4_0 pp5 744.98 748.97 1.01
gemma3 1B Q4_0 pp6 984.54 986.85 1.00
gemma3 1B Q4_0 pp7 1071.51 1080.29 1.01
gemma3 1B Q4_0 pp8 1220.31 1226.52 1.01
gemma3 1B Q4_0 pp16 1204.52 1207.10 1.00
gemma3 1B Q4_0 pp32 2369.32 2394.92 1.01
gemma3 1B Q4_0 pp64 3682.32 3704.24 1.01
gemma3 1B Q4_0 pp128 6337.92 6347.59 1.00
gemma3 1B Q4_0 pp512 11088.88 11104.15 1.00
gemma3 1B Q4_0 tg32 227.64 230.65 1.01
gemma3 4B Q4_0 pp1 139.20 138.42 0.99
gemma3 4B Q4_0 pp2 219.07 217.34 0.99
gemma3 4B Q4_0 pp3 262.66 263.25 1.00
gemma3 4B Q4_0 pp4 326.40 325.99 1.00
gemma3 4B Q4_0 pp5 347.80 346.83 1.00
gemma3 4B Q4_0 pp6 440.49 440.66 1.00
gemma3 4B Q4_0 pp7 449.09 449.09 1.00
gemma3 4B Q4_0 pp8 513.63 511.34 1.00
gemma3 4B Q4_0 pp16 460.24 459.71 1.00
gemma3 4B Q4_0 pp32 905.19 905.03 1.00
gemma3 4B Q4_0 pp64 1480.61 1479.00 1.00
gemma3 4B Q4_0 pp128 2152.00 2151.60 1.00
gemma3 4B Q4_0 pp512 2789.92 2796.25 1.00
gemma3 4B Q4_0 tg32 140.40 140.53 1.00
gpt-oss 120B MXFP4 MoE pp1 86.98 87.67 1.01
gpt-oss 120B MXFP4 MoE pp2 127.00 128.31 1.01
gpt-oss 120B MXFP4 MoE pp3 152.04 152.74 1.00
gpt-oss 120B MXFP4 MoE pp4 170.54 171.17 1.00
gpt-oss 120B MXFP4 MoE pp5 182.20 181.12 0.99
gpt-oss 120B MXFP4 MoE pp6 194.97 194.08 1.00
gpt-oss 120B MXFP4 MoE pp7 198.97 198.88 1.00
gpt-oss 120B MXFP4 MoE pp8 206.27 205.57 1.00
gpt-oss 120B MXFP4 MoE pp16 220.65 220.98 1.00
gpt-oss 120B MXFP4 MoE pp32 282.04 283.57 1.01
gpt-oss 120B MXFP4 MoE pp64 438.21 439.24 1.00
gpt-oss 120B MXFP4 MoE pp128 647.15 650.00 1.00
gpt-oss 120B MXFP4 MoE pp512 1212.28 1220.79 1.01
gpt-oss 120B MXFP4 MoE tg32 89.86 89.47 1.00
gpt-oss 20B MXFP4 MoE pp1 131.70 130.78 0.99
gpt-oss 20B MXFP4 MoE pp2 189.90 191.19 1.01
gpt-oss 20B MXFP4 MoE pp3 229.75 232.11 1.01
gpt-oss 20B MXFP4 MoE pp4 258.55 259.00 1.00
gpt-oss 20B MXFP4 MoE pp5 274.97 274.30 1.00
gpt-oss 20B MXFP4 MoE pp6 295.82 293.64 0.99
gpt-oss 20B MXFP4 MoE pp7 303.55 301.10 0.99
gpt-oss 20B MXFP4 MoE pp8 314.36 311.21 0.99
gpt-oss 20B MXFP4 MoE pp16 339.29 337.97 1.00
gpt-oss 20B MXFP4 MoE pp32 573.34 573.44 1.00
gpt-oss 20B MXFP4 MoE pp64 925.40 923.69 1.00
gpt-oss 20B MXFP4 MoE pp128 1430.89 1433.18 1.00
gpt-oss 20B MXFP4 MoE pp512 2408.36 2412.68 1.00
gpt-oss 20B MXFP4 MoE tg32 134.61 133.25 0.99
qwen3 0.6B Q4_0 pp1 274.60 325.97 1.19
qwen3 0.6B Q4_0 pp2 510.21 561.30 1.10
qwen3 0.6B Q4_0 pp3 650.74 705.36 1.08
qwen3 0.6B Q4_0 pp4 811.53 881.11 1.09
qwen3 0.6B Q4_0 pp5 891.22 949.36 1.07
qwen3 0.6B Q4_0 pp6 1171.68 1255.99 1.07
qwen3 0.6B Q4_0 pp7 1296.29 1380.00 1.06
qwen3 0.6B Q4_0 pp8 1472.37 1568.10 1.07
qwen3 0.6B Q4_0 pp16 1382.63 1420.85 1.03
qwen3 0.6B Q4_0 pp32 2780.44 2872.22 1.03
qwen3 0.6B Q4_0 pp64 5477.53 5612.91 1.02
qwen3 0.6B Q4_0 pp128 8979.15 9224.22 1.03
qwen3 0.6B Q4_0 pp512 14241.52 14392.58 1.01
qwen3 0.6B Q4_0 tg32 340.66 343.08 1.01
qwen3 0.6B Q8_0 pp1 235.74 251.39 1.07
qwen3 0.6B Q8_0 pp2 478.24 525.30 1.10
qwen3 0.6B Q8_0 pp3 606.86 636.12 1.05
qwen3 0.6B Q8_0 pp4 762.67 823.37 1.08
qwen3 0.6B Q8_0 pp5 846.17 898.53 1.06
qwen3 0.6B Q8_0 pp6 1103.55 1177.57 1.07
qwen3 0.6B Q8_0 pp7 1238.70 1313.67 1.06
qwen3 0.6B Q8_0 pp8 1408.13 1495.18 1.06
qwen3 0.6B Q8_0 pp16 1298.25 1351.78 1.04
qwen3 0.6B Q8_0 pp32 2683.00 2765.04 1.03
qwen3 0.6B Q8_0 pp64 5196.85 5371.99 1.03
qwen3 0.6B Q8_0 pp128 8662.60 8832.74 1.02
qwen3 0.6B Q8_0 pp512 14232.47 14336.05 1.01
qwen3 0.6B Q8_0 tg32 280.66 280.13 1.00
qwen3 4B Q8_0 pp1 109.69 109.88 1.00
qwen3 4B Q8_0 pp2 194.47 198.33 1.02
qwen3 4B Q8_0 pp3 242.96 247.54 1.02
qwen3 4B Q8_0 pp4 308.48 314.26 1.02
qwen3 4B Q8_0 pp5 320.32 323.35 1.01
qwen3 4B Q8_0 pp6 389.99 394.27 1.01
qwen3 4B Q8_0 pp7 414.83 417.14 1.01
qwen3 4B Q8_0 pp8 467.82 472.08 1.01
qwen3 4B Q8_0 pp16 382.28 383.92 1.00
qwen3 4B Q8_0 pp32 778.42 780.46 1.00
qwen3 4B Q8_0 pp64 1402.92 1411.66 1.01
qwen3 4B Q8_0 pp128 1875.56 1879.44 1.00
qwen3 4B Q8_0 pp512 2460.31 2468.32 1.00
qwen3 4B Q8_0 tg32 113.84 114.32 1.00
qwen3moe 30B.A3B Q4_0 pp1 100.00 101.00 1.01
qwen3moe 30B.A3B Q4_0 pp2 147.48 158.79 1.08
qwen3moe 30B.A3B Q4_0 pp3 185.62 195.54 1.05
qwen3moe 30B.A3B Q4_0 pp4 226.62 236.66 1.04
qwen3moe 30B.A3B Q4_0 pp5 245.20 256.96 1.05
qwen3moe 30B.A3B Q4_0 pp6 292.85 304.86 1.04
qwen3moe 30B.A3B Q4_0 pp7 310.21 324.98 1.05
qwen3moe 30B.A3B Q4_0 pp8 337.95 352.13 1.04
qwen3moe 30B.A3B Q4_0 pp16 348.34 354.88 1.02
qwen3moe 30B.A3B Q4_0 pp32 412.62 415.49 1.01
qwen3moe 30B.A3B Q4_0 pp64 710.67 707.49 1.00
qwen3moe 30B.A3B Q4_0 pp128 1148.62 1144.82 1.00
qwen3moe 30B.A3B Q4_0 pp512 2155.19 2143.51 0.99
qwen3moe 30B.A3B Q4_0 tg32 106.99 109.20 1.02
qwen3next 80B.A3B Q4_K_M pp1 35.86 37.73 1.05
qwen3next 80B.A3B Q4_K_M pp2 43.00 44.27 1.03
qwen3next 80B.A3B Q4_K_M pp3 58.85 60.34 1.03
qwen3next 80B.A3B Q4_K_M pp4 71.19 72.69 1.02
qwen3next 80B.A3B Q4_K_M pp5 84.65 86.38 1.02
qwen3next 80B.A3B Q4_K_M pp6 100.78 102.30 1.02
qwen3next 80B.A3B Q4_K_M pp7 110.15 111.78 1.01
qwen3next 80B.A3B Q4_K_M pp8 121.85 124.00 1.02
qwen3next 80B.A3B Q4_K_M pp16 154.53 154.77 1.00
qwen3next 80B.A3B Q4_K_M pp32 223.17 222.97 1.00
qwen3next 80B.A3B Q4_K_M pp64 361.65 354.28 0.98
qwen3next 80B.A3B Q4_K_M pp128 527.05 516.57 0.98
qwen3next 80B.A3B Q4_K_M pp512 841.18 845.43 1.01
qwen3next 80B.A3B Q4_K_M tg32 36.48 37.39 1.03

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Feb 6, 2026
@ggerganov ggerganov merged commit 8872ad2 into master Feb 7, 2026
76 of 78 checks passed
@ggerganov ggerganov deleted the gg/metal-bin-opt branch February 7, 2026 08:35
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
* metal : refactor bin kernels

* cont

* cont : fix cv
bartowski1182 pushed a commit to bartowski1182/llama.cpp that referenced this pull request Mar 2, 2026
* metal : refactor bin kernels

* cont

* cont : fix cv
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant