Skip to content

metal : add Metal backend for GGML_OP_GATED_DELTA_NET#20244

Closed
arkavo-com wants to merge 3 commits intoggml-org:masterfrom
arkavo-ai:metal-gated-delta-net
Closed

metal : add Metal backend for GGML_OP_GATED_DELTA_NET#20244
arkavo-com wants to merge 3 commits intoggml-org:masterfrom
arkavo-ai:metal-gated-delta-net

Conversation

@arkavo-com
Copy link
Contributor

Summary

  • Add fused Metal kernel for GGML_OP_GATED_DELTA_NET (ggml: add GATED_DELTA_NET op #19504), enabling GPU-accelerated DeltaNet recurrence on Apple Silicon
  • Supports both GDA (scalar gate, G=1) and KDA (per-row gate, G=S) modes
  • Supports head_size 64 and 128 (covers Qwen3.5 family); unsupported sizes fall back to CPU
  • Non-contiguous (permuted) tensors gracefully fall back to CPU

Performance (Apple M4 Max, Qwen3.5-0.8B Q4_K_M)

Test Before After Speedup
tg128 170 t/s 213 t/s +25%

Backend test results

All 7 supported test configurations pass on Metal (GDA + KDA, head_size 64/128, contiguous). 6 unsupported configs (head_size=32, permuted) correctly report "not supported" and fall back to CPU.

Files changed (Metal only)

File Description
ggml-metal.metal kernel_gated_delta_net_f32 kernel
ggml-metal-ops.cpp Op dispatch function
ggml-metal-device.cpp Pipeline getter
ggml-metal-device.h Pipeline declaration
ggml-metal-ops.h Op function declaration
ggml-metal-device.m supports_op with head_size + contiguity checks

Test plan

  • test-backend-ops -o GATED_DELTA_NET -- 7/7 pass on Metal, 3/3 backends pass
  • llama-bench with Qwen3.5-0.8B confirms +25% tg speedup
  • CI

AI usage: yes. Claude Opus 4.6 assisted with implementation.

🤖 Generated with Claude Code

Add a fused Metal kernel for the gated delta net recurrence op
(ggml-org#19504), enabling GPU-accelerated inference for DeltaNet-based
models (Qwen3.5, etc.) on Apple Silicon.

Supports both GDA (scalar gate) and KDA (per-row gate) modes
with head_size 64 and 128. Unsupported configurations (head_size
32, non-contiguous tensors) gracefully fall back to CPU.

Performance: Qwen3.5-0.8B Q4_K_M on M4 Max
  tg128: 170 -> 213 t/s (+25%)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@arkavo-com arkavo-com requested a review from ggerganov as a code owner March 8, 2026 18:04
arkavo-com and others added 2 commits March 8, 2026 14:08
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Mar 8, 2026
@ggerganov
Copy link
Member

Superseded by #20361

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants