Skip to content

opencl: add MoE support for q4_k, q5_k, q6_k on Adreno#23303

Merged
lhez merged 4 commits into
ggml-org:masterfrom
qualcomm:sq/opencl-q4_k-q5_k-q6_k-moe
May 19, 2026
Merged

opencl: add MoE support for q4_k, q5_k, q6_k on Adreno#23303
lhez merged 4 commits into
ggml-org:masterfrom
qualcomm:sq/opencl-q4_k-q5_k-q6_k-moe

Conversation

@shaofeiqi

Copy link
Copy Markdown
Contributor

Overview

Add Q4_K, Q5_K and Q6_K MoE OpenCL support for Adreno.

Requirements

@shaofeiqi shaofeiqi requested a review from a team as a code owner May 18, 2026 23:11
@github-actions github-actions Bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels May 18, 2026
@lhez lhez merged commit b28a2f3 into ggml-org:master May 19, 2026
47 of 58 checks passed
fhnmor21 pushed a commit to fhnmor21/llama-cpp-turboquant that referenced this pull request May 19, 2026
* opencl: add q4_k moe support

* opencl: add q5_k moe support

* opencl: add q6_k moe support

* opencl: adjust format

---------

Co-authored-by: Li He <lih@qti.qualcomm.com>
dbrain pushed a commit to dbrain/hbd-llama-cpp-turboquant that referenced this pull request May 21, 2026
* opencl: add q4_k moe support

* opencl: add q5_k moe support

* opencl: add q6_k moe support

* opencl: adjust format

---------

Co-authored-by: Li He <lih@qti.qualcomm.com>
baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026
* opencl: add q4_k moe support

* opencl: add q5_k moe support

* opencl: add q6_k moe support

* opencl: adjust format

---------

Co-authored-by: Li He <lih@qti.qualcomm.com>
srossitto79 pushed a commit to srossitto79/llama.cpp that referenced this pull request May 23, 2026
* opencl: add q4_k moe support

* opencl: add q5_k moe support

* opencl: add q6_k moe support

* opencl: adjust format

---------

Co-authored-by: Li He <lih@qti.qualcomm.com>
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
* opencl: add q4_k moe support

* opencl: add q5_k moe support

* opencl: add q6_k moe support

* opencl: adjust format

---------

Co-authored-by: Li He <lih@qti.qualcomm.com>
turbo-tan pushed a commit to turbo-tan/llama.cpp-tq3 that referenced this pull request Jun 2, 2026
* opencl: add q4_k moe support

* opencl: add q5_k moe support

* opencl: add q6_k moe support

* opencl: adjust format

---------

Co-authored-by: Li He <lih@qti.qualcomm.com>
Jcfunk added a commit to Jcfunk/llama.cpp that referenced this pull request Jun 11, 2026
* upstream/HEAD: (25 commits)
  metal : optimize pad + cpy (ggml-org#23354)
  snapdragon: update toolchain to v0.6 (ggml-org#23369)
  ggml-cuda: tune RDNA3 Q6_K MMVQ nwarps (ggml-org#23349)
  opencl: add MoE support for q4_k, q5_k, q6_k on Adreno (ggml-org#23303)
  hexagon: add MROPE and IMROPE support in HTP rope op (ggml-org#23317)
  refactor: Chat Screen UI rendering (ggml-org#23333)
  github: mention --log-file in issue templates (ggml-org#23277)
  common: fix --help for --verbosity (ggml-org#23278)
  common: fix --fit verbosity with --verbosity 4 (ggml-org#23282)
  convert : update mtp related help (ggml-org#23334)
  hexagon: enable support for NORM op (ggml-org#23319)
  model : clarify MTP layer comment in qwen35.cpp [no ci] (ggml-org#23338)
  llama : MTP clean-up (ggml-org#23269)
  ui: Bump packages + address build warnings (ggml-org#23300)
  ci : install libssl-dev (ggml-org#23325)
  ci : install server kleidiai runner dependencies (ggml-org#23259)
  server-context: guarantee there is at least 1 token to decode (ggml-org#23280)
  server : print graphs reused in slot timings (ggml-org#23279)
  save-load-state : refactor tests and improve readability (ggml-org#23196)
  llama-eval : add per-task summary stats (ggml-org#23151)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants