Skip to content

metal: accelerated conv2d#17175

Merged
ggerganov merged 2 commits intoggml-org:masterfrom
bghira:feature/metal-conv2d
Nov 13, 2025
Merged

metal: accelerated conv2d#17175
ggerganov merged 2 commits intoggml-org:masterfrom
bghira:feature/metal-conv2d

Conversation

@bghira
Copy link
Contributor

@bghira bghira commented Nov 11, 2025

This is a pull of ggml-org/ggml#1384 into the llama.cpp repository for review/sync to ggml, since I'm mostly unfamiliar with the contribution process.

I noted a lack of Metal-accelerated ops in GGML and thought Conv2d would be a simple target for my first contribution.

The results for performance test on M3 Max (the only hw I have for testing) show a substantial boost from leveraging simdgroup:

Shape Metal (GFLOPS) CPU (GFLOPS)
19x19, Cin=256, Cout=4096, fp32 191.6 17.1
224x224, Cin=3, Cout=8, fp32 103.0 1.5
58x58, Cin=32, Cout=64, fp32 159.3 7.0

Copilot-generated summary:

This pull request adds support for 2D convolution (CONV_2D) operations in the Metal backend of GGML, enabling hardware-accelerated execution of this operation on supported Apple devices. The changes include the implementation of the Metal kernel, integration into the operation pipeline, and updates to device capability checks and argument structures.

2D Convolution (CONV_2D) Support:

  • Added a new Metal kernel kernel_conv_2d in ggml-metal.metal for efficient 2D convolution, with template instantiations for both float and half.
  • Introduced the ggml_metal_kargs_conv_2d argument struct in ggml-metal-impl.h to pass necessary parameters to the Metal kernel.
  • Implemented the ggml_metal_op_conv_2d function in ggml-metal-ops.cpp to encode and dispatch the 2D convolution operation.
  • Registered the new operation in the Metal operation pipeline and header files (ggml-metal-ops.cpp, ggml-metal-ops.h) [1] [2].
  • Added the pipeline getter for CONV_2D in ggml-metal-device.cpp and declared it in the header [1] [2].
  • Updated device capability checks to recognize CONV_2D support in ggml-metal-device.m.

Other Minor Changes:

  • Updated tensor API enablement logic for device compatibility, removing checks for some device models.
  • Fixed type consistency in argument passing for the concat operation.
  • Minor code cleanup and header includes [1] [2].

These changes collectively allow GGML to offload 2D convolution operations to the GPU via Metal, improving performance for models that use this operation.

@bghira bghira requested a review from ggerganov as a code owner November 11, 2025 19:03
@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning Apple Metal https://en.wikipedia.org/wiki/Metal_(API) labels Nov 11, 2025
@bghira bghira force-pushed the feature/metal-conv2d branch from 20acc63 to c24a58f Compare November 12, 2025 18:27
@bghira
Copy link
Contributor Author

bghira commented Nov 12, 2025

thank you

@bghira bghira force-pushed the feature/metal-conv2d branch from c24a58f to d565e66 Compare November 12, 2025 19:19
@ggerganov ggerganov merged commit 0cfb191 into ggml-org:master Nov 13, 2025
1 check passed
@bghira bghira deleted the feature/metal-conv2d branch November 13, 2025 12:58
Anico2 added a commit to Anico2/llama.cpp that referenced this pull request Jan 15, 2026
* metal: accelerated conv2d

* cont : cleanup

---------

Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
blime4 referenced this pull request in blime4/llama.cpp Feb 5, 2026
* metal: accelerated conv2d

* cont : cleanup

---------

Co-authored-by: bghira <bghira@users.github.com>
Co-authored-by: Georgi Gerganov <ggerganov@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Apple Metal https://en.wikipedia.org/wiki/Metal_(API) ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants