[BUG] Vulkan backend crash on MTT X300: Shared memory size too small for matrix multiplication

### Name and Version

cuda: INFO: probing library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-cuda.so (app directory)
cuda: INFO: failed to load library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-cuda.so: /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-cuda.so: cannot open shared object file: No such file or directory
cuda: INFO: no pre-built GPU library found
cuda: INFO: to enable GPU support, build with:
cuda: INFO:     llamafile/cuda.sh   (for NVIDIA)
cuda: INFO:     llamafile/rocm.sh   (for AMD)
vulkan: INFO: probing library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-vulkan.so (bundled)
vulkan: INFO: loaded library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-vulkan.so from bundled
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = MTT X300 (MT driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 128 | shared memory: 16384 | int dot: 0 | matrix cores: none
register backend: registered backend CPU (1 devices)
register device: registered device CPU (CPU)
register backend: registered backend Vulkan (1 devices)
register device: registered device Vulkan0 (MTT X300)
vulkan: INFO: Vulkan backend registered with GGML
vulkan: INFO: Vulkan GPU support successfully loaded
vulkan: INFO: found 1 GPU device(s)
0.00.001.959 I common_params_print_info: build 1780673245 (dbe9c0c8c) with cosmocc for cosmopolitan
0.00.001.971 I log_info: verbosity = 2147483647 (adjust with the `-lv N` CLI arg)
0.00.001.971 I device info:
0.00.001.984 I - CPU    : CPU (64192 MiB, 64192 MiB free)
0.00.002.021 I - Vulkan0: MTT X300 (16295 MiB, 15480 MiB free)
0.00.002.066 I system_info: n_threads = 8 (n_threads_batch = 8) / 8 | CPU : NEON = 1 | ARM_FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
0.00.002.074 I srv       server_main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
0.00.002.145 I           init: using 8 threads for HTTP server
0.00.002.386 I srv       start: binding port with default address family
0.00.003.544 I srv       server_main: loading model
0.00.003.560 I           load_model: loading model '/zip/Qwen3.5-0.8B-Q8_0.gguf'
ggml_vulkan: Error: Shared memory size too small for matrix multiplication.
terminate called after throwing an instance of 'std::runtime_error'
  what():  Shared memory size too small for matrix multiplication.


### Operating systems

Linux

### GGML backends

Vulkan

### Hardware

ARM+MTTX300

### Models

Qwen3.5-0.8B-Q8.gguf

### Problem description & steps to reproduce

**Describe the bug**
I am trying to run a Qwen model using the Vulkan backend , but it crashes during model loading with a shared memory error. My GPU is an MTT X300 (Moore Threads). The application terminates immediately after detecting the Vulkan device.

**To Reproduce**
Steps to reproduce the behavior:
1.  Vulkan backend enabled.
2. Load model `Qwen3.5-0.8B-Q8_0.gguf`.
3. The program crashes with `terminate called after throwing an instance of 'std::runtime_error'`.


cuda: INFO: probing library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-cuda.so (app directory)
cuda: INFO: failed to load library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-cuda.so: /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-cuda.so: cannot open shared object file: No such file or directory
cuda: INFO: no pre-built GPU library found
cuda: INFO: to enable GPU support, build with:
cuda: INFO:     llamafile/cuda.sh   (for NVIDIA)
cuda: INFO:     llamafile/rocm.sh   (for AMD)
vulkan: INFO: probing library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-vulkan.so (bundled)
vulkan: INFO: loaded library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-vulkan.so from bundled
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = MTT X300 (MT driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 128 | shared memory: 16384 | int dot: 0 | matrix cores: none
register backend: registered backend CPU (1 devices)
register device: registered device CPU (CPU)
register backend: registered backend Vulkan (1 devices)
register device: registered device Vulkan0 (MTT X300)
vulkan: INFO: Vulkan backend registered with GGML
vulkan: INFO: Vulkan GPU support successfully loaded
vulkan: INFO: found 1 GPU device(s)
0.00.001.959 I common_params_print_info: build 1780673245 (dbe9c0c8c) with cosmocc for cosmopolitan
0.00.001.971 I log_info: verbosity = 2147483647 (adjust with the `-lv N` CLI arg)
0.00.001.971 I device info:
0.00.001.984 I - CPU    : CPU (64192 MiB, 64192 MiB free)
0.00.002.021 I - Vulkan0: MTT X300 (16295 MiB, 15480 MiB free)
0.00.002.066 I system_info: n_threads = 8 (n_threads_batch = 8) / 8 | CPU : NEON = 1 | ARM_FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
0.00.002.074 I srv       server_main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
0.00.002.145 I           init: using 8 threads for HTTP server
0.00.002.386 I srv       start: binding port with default address family
0.00.003.544 I srv       server_main: loading model
0.00.003.560 I           load_model: loading model 'Qwen3.5-0.8B-Q8_0.gguf'
ggml_vulkan: Error: Shared memory size too small for matrix multiplication.
terminate called after throwing an instance of 'std::runtime_error'
  what():  Shared memory size too small for matrix multiplication.


### First Bad Commit

_No response_


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Vulkan backend crash on MTT X300: Shared memory size too small for matrix multiplication #24284

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[BUG] Vulkan backend crash on MTT X300: Shared memory size too small for matrix multiplication #24284

Description

Name and Version

Operating systems

GGML backends

Hardware

Models

Problem description & steps to reproduce

First Bad Commit

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions