opencl: add cumsum op by shaofeiqi · Pull Request #18981 · ggml-org/llama.cpp

shaofeiqi · 2026-01-21T01:16:39Z

This PR adds the cumsum op for the OpenCL backend.

lhez · 2026-03-12T04:35:58Z

Failures are irrelevant. Will merge shortly.

* OpenCL: add CUMSUM op support * remove unused argument * opencl: refactor cumsum * opencl: refactor * opencl: refactor tmp buffer * opencl: adjust max number of subgroups * opencl: fix whitespace * opencl: fix global size when cumsum the tmp buffer --------- Co-authored-by: Li He <lih@qti.qualcomm.com>

* 'master' of github.com:ggml-org/llama.cpp: (33 commits) convert : better mtp check and fix return [no ci] (ggml-org#20419) vulkan: fix SSM_CONV PP scaling with large ubatch sizes (ggml-org#20379) New conversations now auto-select the first loaded model (ggml-org#20403) ggml-virtgpu: Fix some build commands (ggml-org#20341) metal : avoid divisions in bin kernel (ggml-org#20426) ci: Setup self-hosted CI for Intel Linux Vulkan backend (ggml-org#20154) vulkan: fix l2_norm epsilon handling (ggml-org#20350) vulkan: fix OOB check in flash_attn_mask_opt (ggml-org#20296) vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap (ggml-org#20059) opencl: use larger workgroup size for get_rows (ggml-org#20316) opencl: add cumsum op (ggml-org#18981) hip: compile debug builds with -O2 on hip to avoid a compiler bug (ggml-org#20392) common/parser: add GigaChatV3/3.1 models support (ggml-org#19931) model : add support for Phi4ForCausalLMV (ggml-org#20168) graph : add optional scale parameter to build_lora_mm [no ci] (ggml-org#20427) common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (ggml-org#20416) ggml-webgpu: Add supports for `GGML_OP_REPEAT` (ggml-org#20230) llama : enable chunked fused GDN path (ggml-org#20340) llama : whitespace cleanup (ggml-org#20422) ggml : add NVFP4 quantization type support (ggml-org#19769) ...

shaofeiqi requested review from lhez and max-krasnyansky as code owners January 21, 2026 01:16

github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Jan 21, 2026

shaofeiqi requested review from 0cc4m, CISC, JohannesGaessler, aldehir, allozaur, am17an, angt, danbev, ggerganov, ngxson, pwilkin, reeselevine and taronaeo as code owners February 6, 2026 22:18

shaofeiqi force-pushed the sq/opencl-cumsum-op branch from 888f39a to 1c8211c Compare February 6, 2026 22:24

CISC removed request for 0cc4m, CISC, JohannesGaessler, aldehir, allozaur, am17an, angt, danbev, ggerganov, ngxson, pwilkin and reeselevine February 6, 2026 22:27

CISC removed the request for review from taronaeo February 6, 2026 22:27

shaofeiqi force-pushed the sq/opencl-cumsum-op branch from c001573 to dd52e3f Compare February 6, 2026 22:58

lhez force-pushed the sq/opencl-cumsum-op branch from dd52e3f to 435aad0 Compare March 6, 2026 07:26

shaofeiqi and others added 7 commits March 6, 2026 23:24

OpenCL: add CUMSUM op support

1712caf

remove unused argument

55ff588

opencl: refactor cumsum

31af348

opencl: refactor

c6ab757

opencl: refactor tmp buffer

3a3307c

opencl: adjust max number of subgroups

ccf050b

opencl: fix whitespace

cd6677b

lhez force-pushed the sq/opencl-cumsum-op branch from 832ddc1 to cd6677b Compare March 7, 2026 07:57

opencl: fix global size when cumsum the tmp buffer

d894f29

lhez approved these changes Mar 11, 2026

View reviewed changes

lhez merged commit 3d9ab22 into ggml-org:master Mar 12, 2026
209 of 220 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opencl: add cumsum op#18981

opencl: add cumsum op#18981
lhez merged 8 commits intoggml-org:masterfrom
qualcomm:sq/opencl-cumsum-op

shaofeiqi commented Jan 21, 2026

Uh oh!

lhez commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shaofeiqi commented Jan 21, 2026

Uh oh!

lhez commented Mar 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants