vulkan: reduce iq1 shared memory usage for mul_mm by jeffbolznv · Pull Request #24287 · ggml-org/llama.cpp

jeffbolznv · 2026-06-08T03:16:34Z

Overview

Ifdef iq1s_grid_gpu so it's only used in mmvq, this keeps the shared memory usage under 16KB for mul_mm.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: Yes, used codex to implement the changes.

0cc4m · 2026-06-09T11:03:55Z

@ggml-org/maintainers Another approval needed.

* upstream/HEAD: (329 commits) vendor : update LibreSSL to 4.3.2 (ggml-org#24397) Remove padding and multiple D2D copies for MTP (ggml-org#24086) chat: fix LFM2/LFM2.5 ignoring json_schema (ggml-org#24377) CUDA: Fix ssm_scan_f32 data-races (ggml-org#24360) ci : bump komac version (ggml-org#24396) speculative : fix "ngram-map-k4v" name in logging (ggml-org#24253) webui: implement pinned conversations support (ggml-org#21387) graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (ggml-org#24357) ci : fix windows release (ggml-org#24369) ui: add opt-in run_javascript frontend tool (ggml-org#24244) mtmd: build_vit batching (ggml-org#24352) vulkan: reduce iq1 shared memory usage for mul_mm (ggml-org#24287) vulkan: add `v_dot2_f32_f16` support in matrix-matrix multiplication and Flash Attention (ggml-org#24123) ui: Fix excessive style recalculation on hover (ggml-org#24243) mtmd: refactor video subproc handling (ggml-org#24316) server: log prompts to directory (ggml-org#22031) ui: fix mobile chat form overflow and bust stale bundle cache (ggml-org#24158) ggml : add GGML_OP_COL2IM_1D (ggml-org#24206) server : do not clear slots without unified KV cache (ggml-org#24190) models : fix plamo2 attention_key/value_length regression (ggml-org#24317) ...

vulkan: reduce iq1 shared memory usage for mul_mm

30ded60

jeffbolznv requested a review from a team as a code owner June 8, 2026 03:16

jeffbolznv mentioned this pull request Jun 8, 2026

[BUG] Vulkan backend crash on MTT X300: Shared memory size too small for matrix multiplication #24284

Closed

github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jun 8, 2026

0cc4m approved these changes Jun 8, 2026

View reviewed changes

ngxson approved these changes Jun 9, 2026

View reviewed changes

0cc4m merged commit d6d0ce8 into ggml-org:master Jun 9, 2026
25 of 27 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vulkan: reduce iq1 shared memory usage for mul_mm#24287

vulkan: reduce iq1 shared memory usage for mul_mm#24287
0cc4m merged 1 commit into
ggml-org:masterfrom
jeffbolznv:mul_mm_iq1_shmem

jeffbolznv commented Jun 8, 2026

Uh oh!

0cc4m commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

jeffbolznv commented Jun 8, 2026

Overview

Requirements

Uh oh!

0cc4m commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants