Skip to content

[BUG] Vulkan backend crash on MTT X300: Shared memory size too small for matrix multiplication #24284

@sbzoutianxia

Description

@sbzoutianxia

Name and Version

cuda: INFO: probing library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-cuda.so (app directory)
cuda: INFO: failed to load library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-cuda.so: /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-cuda.so: cannot open shared object file: No such file or directory
cuda: INFO: no pre-built GPU library found
cuda: INFO: to enable GPU support, build with:
cuda: INFO: llamafile/cuda.sh (for NVIDIA)
cuda: INFO: llamafile/rocm.sh (for AMD)
vulkan: INFO: probing library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-vulkan.so (bundled)
vulkan: INFO: loaded library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-vulkan.so from bundled
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = MTT X300 (MT driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 128 | shared memory: 16384 | int dot: 0 | matrix cores: none
register backend: registered backend CPU (1 devices)
register device: registered device CPU (CPU)
register backend: registered backend Vulkan (1 devices)
register device: registered device Vulkan0 (MTT X300)
vulkan: INFO: Vulkan backend registered with GGML
vulkan: INFO: Vulkan GPU support successfully loaded
vulkan: INFO: found 1 GPU device(s)
0.00.001.959 I common_params_print_info: build 1780673245 (dbe9c0c) with cosmocc for cosmopolitan
0.00.001.971 I log_info: verbosity = 2147483647 (adjust with the -lv N CLI arg)
0.00.001.971 I device info:
0.00.001.984 I - CPU : CPU (64192 MiB, 64192 MiB free)
0.00.002.021 I - Vulkan0: MTT X300 (16295 MiB, 15480 MiB free)
0.00.002.066 I system_info: n_threads = 8 (n_threads_batch = 8) / 8 | CPU : NEON = 1 | ARM_FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
0.00.002.074 I srv server_main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
0.00.002.145 I init: using 8 threads for HTTP server
0.00.002.386 I srv start: binding port with default address family
0.00.003.544 I srv server_main: loading model
0.00.003.560 I load_model: loading model '/zip/Qwen3.5-0.8B-Q8_0.gguf'
ggml_vulkan: Error: Shared memory size too small for matrix multiplication.
terminate called after throwing an instance of 'std::runtime_error'
what(): Shared memory size too small for matrix multiplication.

Operating systems

Linux

GGML backends

Vulkan

Hardware

ARM+MTTX300

Models

Qwen3.5-0.8B-Q8.gguf

Problem description & steps to reproduce

Describe the bug
I am trying to run a Qwen model using the Vulkan backend , but it crashes during model loading with a shared memory error. My GPU is an MTT X300 (Moore Threads). The application terminates immediately after detecting the Vulkan device.

To Reproduce
Steps to reproduce the behavior:

  1. Vulkan backend enabled.
  2. Load model Qwen3.5-0.8B-Q8_0.gguf.
  3. The program crashes with terminate called after throwing an instance of 'std::runtime_error'.

cuda: INFO: probing library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-cuda.so (app directory)
cuda: INFO: failed to load library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-cuda.so: /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-cuda.so: cannot open shared object file: No such file or directory
cuda: INFO: no pre-built GPU library found
cuda: INFO: to enable GPU support, build with:
cuda: INFO: llamafile/cuda.sh (for NVIDIA)
cuda: INFO: llamafile/rocm.sh (for AMD)
vulkan: INFO: probing library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-vulkan.so (bundled)
vulkan: INFO: loaded library /home/sbzoutianxia/.llamafile/v/0.10.3/ggml-vulkan.so from bundled
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = MTT X300 (MT driver) | uma: 0 | fp16: 1 | bf16: 0 | warp size: 128 | shared memory: 16384 | int dot: 0 | matrix cores: none
register backend: registered backend CPU (1 devices)
register device: registered device CPU (CPU)
register backend: registered backend Vulkan (1 devices)
register device: registered device Vulkan0 (MTT X300)
vulkan: INFO: Vulkan backend registered with GGML
vulkan: INFO: Vulkan GPU support successfully loaded
vulkan: INFO: found 1 GPU device(s)
0.00.001.959 I common_params_print_info: build 1780673245 (dbe9c0c) with cosmocc for cosmopolitan
0.00.001.971 I log_info: verbosity = 2147483647 (adjust with the -lv N CLI arg)
0.00.001.971 I device info:
0.00.001.984 I - CPU : CPU (64192 MiB, 64192 MiB free)
0.00.002.021 I - Vulkan0: MTT X300 (16295 MiB, 15480 MiB free)
0.00.002.066 I system_info: n_threads = 8 (n_threads_batch = 8) / 8 | CPU : NEON = 1 | ARM_FMA = 1 | LLAMAFILE = 1 | OPENMP = 1 | REPACK = 1 |
0.00.002.074 I srv server_main: n_parallel is set to auto, using n_parallel = 4 and kv_unified = true
0.00.002.145 I init: using 8 threads for HTTP server
0.00.002.386 I srv start: binding port with default address family
0.00.003.544 I srv server_main: loading model
0.00.003.560 I load_model: loading model 'Qwen3.5-0.8B-Q8_0.gguf'
ggml_vulkan: Error: Shared memory size too small for matrix multiplication.
terminate called after throwing an instance of 'std::runtime_error'
what(): Shared memory size too small for matrix multiplication.

First Bad Commit

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions