Skip to content

metal : adaptive CPU/GPU interleave based on number of nodes#19369

Merged
ggerganov merged 1 commit intomasterfrom
gg/metal-adaptive-cpu-interleave
Feb 5, 2026
Merged

metal : adaptive CPU/GPU interleave based on number of nodes#19369
ggerganov merged 1 commit intomasterfrom
gg/metal-adaptive-cpu-interleave

Conversation

@ggerganov
Copy link
Member

Put a bit more work on the main thread when encoding the graph. This helps to interleave better the CPU/GPU work, especially for larger graphs.

Model Test t/s master t/s gg/metal-adaptive-cpu-interleave Speedup
deepseek2 30B.A3B Q8_0 pp512 1643.86 1650.41 1.00
deepseek2 30B.A3B Q8_0 tg32 60.99 64.10 1.05
gemma3 1B Q4_0 pp512 11084.12 11096.19 1.00
gemma3 1B Q4_0 tg32 221.23 229.51 1.04
gemma3 4B Q4_0 pp512 2801.46 2801.59 1.00
gemma3 4B Q4_0 tg32 136.59 140.30 1.03
gpt-oss 120B MXFP4 MoE pp512 1216.02 1216.82 1.00
gpt-oss 120B MXFP4 MoE tg32 87.14 89.91 1.03
gpt-oss 20B MXFP4 MoE pp512 2407.47 2407.12 1.00
gpt-oss 20B MXFP4 MoE tg32 131.83 134.63 1.02
qwen3 0.6B Q4_0 pp512 14327.76 14248.38 0.99
qwen3 0.6B Q4_0 tg32 329.70 341.94 1.04
qwen3 0.6B Q8_0 pp512 14197.99 14173.35 1.00
qwen3 0.6B Q8_0 tg32 269.92 279.47 1.04
qwen3 4B Q8_0 pp512 2468.90 2463.35 1.00
qwen3 4B Q8_0 tg32 113.25 114.03 1.01
qwen3moe 30B.A3B Q4_0 pp512 2140.49 2158.52 1.01
qwen3moe 30B.A3B Q4_0 tg32 100.31 106.69 1.06
qwen3next 80B.A3B Q4_K_M pp512 840.10 844.97 1.01
qwen3next 80B.A3B Q4_K_M tg32 34.61 36.55 1.06

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Feb 5, 2026
@ggerganov ggerganov merged commit 22cae83 into master Feb 5, 2026
69 of 75 checks passed
@ggerganov ggerganov deleted the gg/metal-adaptive-cpu-interleave branch February 5, 2026 17:07
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant