Skip to content

vulkan: reduce iq1 shared memory usage for mul_mm#24287

Merged
0cc4m merged 1 commit into
ggml-org:masterfrom
jeffbolznv:mul_mm_iq1_shmem
Jun 9, 2026
Merged

vulkan: reduce iq1 shared memory usage for mul_mm#24287
0cc4m merged 1 commit into
ggml-org:masterfrom
jeffbolznv:mul_mm_iq1_shmem

Conversation

@jeffbolznv

Copy link
Copy Markdown
Contributor

Overview

Ifdef iq1s_grid_gpu so it's only used in mmvq, this keeps the shared memory usage under 16KB for mul_mm.

Fixes #24284.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: Yes, used codex to implement the changes.

@jeffbolznv jeffbolznv requested a review from a team as a code owner June 8, 2026 03:16
@github-actions github-actions Bot added Vulkan Issues specific to the Vulkan backend ggml changes relating to the ggml tensor library for machine learning labels Jun 8, 2026
@0cc4m

0cc4m commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

@ggml-org/maintainers Another approval needed.

@0cc4m 0cc4m merged commit d6d0ce8 into ggml-org:master Jun 9, 2026
25 of 27 checks passed
Jcfunk added a commit to Jcfunk/llama.cpp that referenced this pull request Jun 11, 2026
* upstream/HEAD: (329 commits)
  vendor : update LibreSSL to 4.3.2 (ggml-org#24397)
  Remove padding and multiple D2D copies for MTP (ggml-org#24086)
  chat: fix LFM2/LFM2.5 ignoring json_schema (ggml-org#24377)
  CUDA: Fix ssm_scan_f32 data-races (ggml-org#24360)
  ci : bump komac version (ggml-org#24396)
  speculative : fix "ngram-map-k4v" name in logging (ggml-org#24253)
  webui: implement pinned conversations support (ggml-org#21387)
  graph: Fix granite speech model inference by applying embedding scale when deepstack is not used (ggml-org#24357)
  ci : fix windows release (ggml-org#24369)
  ui: add opt-in run_javascript frontend tool (ggml-org#24244)
  mtmd: build_vit batching (ggml-org#24352)
  vulkan: reduce iq1 shared memory usage for mul_mm (ggml-org#24287)
  vulkan: add `v_dot2_f32_f16` support in matrix-matrix multiplication and Flash Attention (ggml-org#24123)
  ui: Fix excessive style recalculation on hover (ggml-org#24243)
  mtmd: refactor video subproc handling (ggml-org#24316)
  server: log prompts to directory (ggml-org#22031)
  ui: fix mobile chat form overflow and bust stale bundle cache (ggml-org#24158)
  ggml : add GGML_OP_COL2IM_1D (ggml-org#24206)
  server : do not clear slots without unified KV cache (ggml-org#24190)
  models : fix plamo2 attention_key/value_length regression (ggml-org#24317)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Vulkan Issues specific to the Vulkan backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Vulkan backend crash on MTT X300: Shared memory size too small for matrix multiplication

3 participants