Skip to content

opencl: add cumsum op#18981

Merged
lhez merged 8 commits intoggml-org:masterfrom
qualcomm:sq/opencl-cumsum-op
Mar 12, 2026
Merged

opencl: add cumsum op#18981
lhez merged 8 commits intoggml-org:masterfrom
qualcomm:sq/opencl-cumsum-op

Conversation

@shaofeiqi
Copy link
Contributor

This PR adds the cumsum op for the OpenCL backend.

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Jan 21, 2026
@shaofeiqi shaofeiqi force-pushed the sq/opencl-cumsum-op branch from 888f39a to 1c8211c Compare February 6, 2026 22:24
@CISC CISC removed the request for review from taronaeo February 6, 2026 22:27
@shaofeiqi shaofeiqi force-pushed the sq/opencl-cumsum-op branch from c001573 to dd52e3f Compare February 6, 2026 22:58
@lhez lhez force-pushed the sq/opencl-cumsum-op branch from dd52e3f to 435aad0 Compare March 6, 2026 07:26
@lhez lhez force-pushed the sq/opencl-cumsum-op branch from 832ddc1 to cd6677b Compare March 7, 2026 07:57
@lhez
Copy link
Contributor

lhez commented Mar 12, 2026

Failures are irrelevant. Will merge shortly.

@lhez lhez merged commit 3d9ab22 into ggml-org:master Mar 12, 2026
209 of 220 checks passed
ProgenyAlpha pushed a commit to ProgenyAlpha/llama.cpp that referenced this pull request Mar 12, 2026
* OpenCL: add CUMSUM op support

* remove unused argument

* opencl: refactor cumsum

* opencl: refactor

* opencl: refactor tmp buffer

* opencl: adjust max number of subgroups

* opencl: fix whitespace

* opencl: fix global size when cumsum the tmp buffer

---------

Co-authored-by: Li He <lih@qti.qualcomm.com>
tekintian added a commit to tekintian/llama.cpp that referenced this pull request Mar 12, 2026
* 'master' of github.com:ggml-org/llama.cpp: (33 commits)
  convert : better mtp check and fix return [no ci] (ggml-org#20419)
  vulkan: fix SSM_CONV PP scaling with large ubatch sizes (ggml-org#20379)
  New conversations now auto-select the first loaded model (ggml-org#20403)
  ggml-virtgpu: Fix some build commands (ggml-org#20341)
  metal : avoid divisions in bin kernel (ggml-org#20426)
  ci: Setup self-hosted CI for Intel Linux Vulkan backend (ggml-org#20154)
  vulkan: fix l2_norm epsilon handling (ggml-org#20350)
  vulkan: fix OOB check in flash_attn_mask_opt (ggml-org#20296)
  vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap (ggml-org#20059)
  opencl: use larger workgroup size for get_rows (ggml-org#20316)
  opencl: add cumsum op (ggml-org#18981)
  hip: compile debug builds with -O2 on hip to avoid a compiler bug (ggml-org#20392)
  common/parser: add GigaChatV3/3.1 models support (ggml-org#19931)
  model : add support for Phi4ForCausalLMV (ggml-org#20168)
  graph : add optional scale parameter to build_lora_mm [no ci] (ggml-org#20427)
  common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (ggml-org#20416)
  ggml-webgpu: Add supports for `GGML_OP_REPEAT` (ggml-org#20230)
  llama : enable chunked fused GDN path (ggml-org#20340)
  llama : whitespace cleanup (ggml-org#20422)
  ggml : add NVFP4 quantization type support (ggml-org#19769)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants