metal : add Metal backend for GGML_OP_GATED_DELTA_NET by arkavo-com · Pull Request #20244 · ggml-org/llama.cpp

arkavo-com · 2026-03-08T18:04:24Z

Summary

Add fused Metal kernel for GGML_OP_GATED_DELTA_NET (ggml: add GATED_DELTA_NET op #19504), enabling GPU-accelerated DeltaNet recurrence on Apple Silicon
Supports both GDA (scalar gate, G=1) and KDA (per-row gate, G=S) modes
Supports head_size 64 and 128 (covers Qwen3.5 family); unsupported sizes fall back to CPU
Non-contiguous (permuted) tensors gracefully fall back to CPU

Performance (Apple M4 Max, Qwen3.5-0.8B Q4_K_M)

Test	Before	After	Speedup
tg128	170 t/s	213 t/s	+25%

Backend test results

All 7 supported test configurations pass on Metal (GDA + KDA, head_size 64/128, contiguous). 6 unsupported configs (head_size=32, permuted) correctly report "not supported" and fall back to CPU.

Files changed (Metal only)

File	Description
`ggml-metal.metal`	`kernel_gated_delta_net_f32` kernel
`ggml-metal-ops.cpp`	Op dispatch function
`ggml-metal-device.cpp`	Pipeline getter
`ggml-metal-device.h`	Pipeline declaration
`ggml-metal-ops.h`	Op function declaration
`ggml-metal-device.m`	`supports_op` with head_size + contiguity checks

Test plan

test-backend-ops -o GATED_DELTA_NET -- 7/7 pass on Metal, 3/3 backends pass
llama-bench with Qwen3.5-0.8B confirms +25% tg speedup
CI

AI usage: yes. Claude Opus 4.6 assisted with implementation.

🤖 Generated with Claude Code

Add a fused Metal kernel for the gated delta net recurrence op (ggml-org#19504), enabling GPU-accelerated inference for DeltaNet-based models (Qwen3.5, etc.) on Apple Silicon. Supports both GDA (scalar gate) and KDA (per-row gate) modes with head_size 64 and 128. Unsupported configurations (head_size 32, non-contiguous tensors) gracefully fall back to CPU. Performance: Qwen3.5-0.8B Q4_K_M on M4 Max tg128: 170 -> 213 t/s (+25%) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ggerganov · 2026-03-10T19:35:03Z

Superseded by #20361

arkavo-com requested a review from ggerganov as a code owner March 8, 2026 18:04

arkavo-com and others added 2 commits March 8, 2026 14:08

metal : validate contiguity of all input tensors in supports_op

34ac6fc

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

metal : add algorithm equivalence comment for GDA decay path

8efb55f

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 8, 2026

appwa9 mentioned this pull request Mar 10, 2026

metal: fused_gdn_ch (chunked prompt processing) needed for Qwen3.5 on older Apple GPUs #20342

Open

ggerganov mentioned this pull request Mar 10, 2026

metal : add GDN kernel #20361

Merged

ggerganov closed this Mar 10, 2026

ggerganov mentioned this pull request Mar 11, 2026

vulkan: add GATED_DELTA_NET op support #20334

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

metal : add Metal backend for GGML_OP_GATED_DELTA_NET#20244

metal : add Metal backend for GGML_OP_GATED_DELTA_NET#20244
arkavo-com wants to merge 3 commits intoggml-org:masterfrom
arkavo-ai:metal-gated-delta-net

arkavo-com commented Mar 8, 2026

Uh oh!

ggerganov commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

arkavo-com commented Mar 8, 2026

Summary

Performance (Apple M4 Max, Qwen3.5-0.8B Q4_K_M)

Backend test results

Files changed (Metal only)

Test plan

Uh oh!

ggerganov commented Mar 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants