Metal backend crashes on macOS arm64 with tdt-0.6b-v3-q8_0.gguf

Platform: Apple Silicon, GGML Metal enabled
Model: mudler/parakeet-cpp-gguf/tdt-0.6b-v3-q8_0.gguf

Crash during transcription:

ggml_metal_op_encode_impl: error: unsupported op 'CONV_2D_DW'
ggml-metal-ops.cpp:203: unsupported op

After locally replacing ggml_conv_2d_dw_direct with ggml_conv_2d_dw in the subsampling path, that crash goes away, but another Metal failure appears:

ggml_metal_library_compile_pipeline: failed to compile pipeline: base = 'kernel_mul_mv_f32_f16_short'
Error: Function kernel_mul_mv_f32_f16_short was not found in the library

I could get the model running locally by:
1. using ggml_backend_sched with [Metal, CPU] instead of a single Metal backend, so unsupported ops can fall back to CPU
2. avoiding the f32 x f16 short mul_mv case from the lowered depthwise conv path by using F32 im2col

Would be great if this worked out of the box on macOS. Thanks for your work, this is brilliant. Beats ONNX on Apple by 2x.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Metal backend crashes on macOS arm64 with tdt-0.6b-v3-q8_0.gguf #2

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Metal backend crashes on macOS arm64 with tdt-0.6b-v3-q8_0.gguf #2

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions