Skip to content

opencl: refactor some ops, concat, repeat, tanh and scale#19226

Merged
max-krasnyansky merged 6 commits intoggml-org:masterfrom
qualcomm:lh/concat-refactor
Feb 2, 2026
Merged

opencl: refactor some ops, concat, repeat, tanh and scale#19226
max-krasnyansky merged 6 commits intoggml-org:masterfrom
qualcomm:lh/concat-refactor

Conversation

@lhez
Copy link
Contributor

@lhez lhez commented Jan 31, 2026

Gemma-3n-E2B and Gemma-3n-E4B have been producing weird (not really gibberish but apparently not correct) output. Ended up refactoring these ops and the issue is now fixed. In addition, this refactor also improves perf a bit.

On X Elite,

gemma-3n-E2B-it-Q8_0,

before,

common_perf_print: prompt eval time =    2522.36 ms /   235 tokens (   10.73 ms per token,    93.17 tokens per second)
common_perf_print:        eval time =   24209.42 ms /   256 runs   (   94.57 ms per token,    10.57 tokens per second)

after,

common_perf_print: prompt eval time =    1473.28 ms /   235 tokens (    6.27 ms per token,   159.51 tokens per second)
common_perf_print:        eval time =   15944.91 ms /   256 runs   (   62.28 ms per token,    16.06 tokens per second)

@github-actions github-actions bot added ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend labels Jan 31, 2026
@lhez lhez marked this pull request as ready for review February 2, 2026 04:22
@lhez lhez requested a review from max-krasnyansky as a code owner February 2, 2026 04:22
@max-krasnyansky max-krasnyansky merged commit 91ea44e into ggml-org:master Feb 2, 2026
210 of 220 checks passed
agent-enemy-2 pushed a commit to agent-enemy-2/llama.cpp that referenced this pull request Feb 4, 2026
…9226)

* opencl: refactor concat

* opencl: refactor repeat

* opencl: refactor tanh

* opencl: enable fp16 for tanh

* opencl: refactor scale

* opencl: fix unused variables
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
…9226)

* opencl: refactor concat

* opencl: refactor repeat

* opencl: refactor tanh

* opencl: enable fp16 for tanh

* opencl: refactor scale

* opencl: fix unused variables
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning OpenCL Issues specific to the OpenCL backend

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants