ggml-webgpu: Add supports for `GGML_OP_REPEAT` by yomaytk · Pull Request #20230 · ggml-org/llama.cpp

yomaytk · 2026-03-08T09:51:11Z

This PR adds supports for GGML_OP_REPEAT to the WebGPU backend. The status of REPEAT for WebGPU in docs/ops.md is changed to "partially supported" because WebGPU doesn't seem to support i16.

Also, this PR includes formatting changes (clang-format) for the modified files. Since #20173 touches the same parts, this PR might need to be merged after that one.

reeselevine · 2026-03-09T21:52:21Z

Potentially could support i16 by just treating the bytes being repeated as opaque, since they're just copied/repeated over to the dst buffer, but I don't think that's required. Curious, are there any particular models/usecases that need repeat? I haven't seen it come up in what I'm working on yet.

reeselevine · 2026-03-09T21:52:47Z

And sounds good on merging after #20173, I should merge that soon.

CISC · 2026-03-09T22:19:35Z

Curious, are there any particular models/usecases that need repeat? I haven't seen it come up in what I'm working on yet.

There are a couple, but most recently (and notably) Qwen 3.5.

yomaytk · 2026-03-10T06:44:44Z

Curious, are there any particular models/usecases that need repeat? I haven't seen it come up in what I'm working on yet.

I confirmed that DeepSeek-V2 also uses REPEAT, and DeepSeek-V3 may as well.

Potentially could support i16 by just treating the bytes being repeated as opaque, since they're just copied/repeated over to the dst buffer

I see. As you mentioned, treating the repeated bytes as opaque would work. However, using f16 is simpler here, and the CPU backend does the same. So I added i16 support in the same way, and the i16 test pass as well.

llama.cpp/ggml/src/ggml-cpu/ops.cpp

Lines 1787 to 1792 in c96f608

    
           case GGML_TYPE_F16: 
        
           case GGML_TYPE_BF16: 
        
           case GGML_TYPE_I16: 
        
               { 
        
                   ggml_compute_forward_repeat_f16(params, dst); 
        
               } break;

reeselevine · 2026-03-10T16:15:59Z

Now that #20173 is merged just need to fix some conflicts then we should be good to merge this!

reeselevine · 2026-03-11T03:34:02Z

Looks good to me! Do you know if there are other operations/changes necessary to get Qwen 3.5 running? I tried this model off this branch and I'm getting a segfault in the WebGPU backend. I can try and debug unless you're already looking into/working on it.

yomaytk · 2026-03-11T10:14:27Z

According to my experiments, the following operations appear to be not yet implemented for Qwen 3.5 in WebGPU backend.

L2_NORM, SET, DIAG, TRI, SSM_CONV, FLASH_ATTN_EXT, SOLVE_TRI, GATED_DELTA_NET

I tried this model off this branch and I'm getting a segfault in the WebGPU backend.

In my program (based on examples/simple-chat) too, the model fails with the following errors.

On the other hand, unsloth/Qwen3.5-4B-Q4_0.gguf and unsloth/Qwen3.5-9B-Q4_0.gguf seem to work well with my program.

I can try and debug unless you're already looking into/working on it.

I haven't been working on Qwen 3.5 support myself, so it would be great if you could work on this!

* Add GGML_OP_REPEAT to webgpu backend. * Add i16 support for GGML_OP_REPEAT.

* 'master' of github.com:ggml-org/llama.cpp: (33 commits) convert : better mtp check and fix return [no ci] (ggml-org#20419) vulkan: fix SSM_CONV PP scaling with large ubatch sizes (ggml-org#20379) New conversations now auto-select the first loaded model (ggml-org#20403) ggml-virtgpu: Fix some build commands (ggml-org#20341) metal : avoid divisions in bin kernel (ggml-org#20426) ci: Setup self-hosted CI for Intel Linux Vulkan backend (ggml-org#20154) vulkan: fix l2_norm epsilon handling (ggml-org#20350) vulkan: fix OOB check in flash_attn_mask_opt (ggml-org#20296) vulkan: Fix ErrorOutOfHostMemory on Intel GPU when loading large models with --no-mmap (ggml-org#20059) opencl: use larger workgroup size for get_rows (ggml-org#20316) opencl: add cumsum op (ggml-org#18981) hip: compile debug builds with -O2 on hip to avoid a compiler bug (ggml-org#20392) common/parser: add GigaChatV3/3.1 models support (ggml-org#19931) model : add support for Phi4ForCausalLMV (ggml-org#20168) graph : add optional scale parameter to build_lora_mm [no ci] (ggml-org#20427) common : fix --n-cpu-moe, --cpu-moe for models with fused gate + up (ggml-org#20416) ggml-webgpu: Add supports for `GGML_OP_REPEAT` (ggml-org#20230) llama : enable chunked fused GDN path (ggml-org#20340) llama : whitespace cleanup (ggml-org#20422) ggml : add NVFP4 quantization type support (ggml-org#19769) ...

Add GGML_OP_REPEAT to webgpu backend.

a44572b

yomaytk requested a review from reeselevine as a code owner March 8, 2026 09:51

github-actions bot added documentation Improvements or additions to documentation ggml changes relating to the ggml tensor library for machine learning labels Mar 8, 2026

Add i16 support for GGML_OP_REPEAT.

a93389a

Merge branch 'master' into repeat-webgpu

35fd3ec

loci-dev mentioned this pull request Mar 11, 2026

UPSTREAM PR #20230: ggml-webgpu: Add supports for GGML_OP_REPEAT auroralabs-loci/llama.cpp#1240

Open

reeselevine approved these changes Mar 11, 2026

View reviewed changes

reeselevine merged commit f2ab047 into ggml-org:master Mar 11, 2026
73 of 79 checks passed

yomaytk deleted the repeat-webgpu branch March 12, 2026 01:45

ProgenyAlpha pushed a commit to ProgenyAlpha/llama.cpp that referenced this pull request Mar 12, 2026

ggml-webgpu: Add supports for GGML_OP_REPEAT (ggml-org#20230)

9901819

* Add GGML_OP_REPEAT to webgpu backend. * Add i16 support for GGML_OP_REPEAT.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml-webgpu: Add supports for `GGML_OP_REPEAT`#20230

ggml-webgpu: Add supports for `GGML_OP_REPEAT`#20230
reeselevine merged 3 commits intoggml-org:masterfrom
yomaytk:repeat-webgpu

yomaytk commented Mar 8, 2026

Uh oh!

reeselevine commented Mar 9, 2026

Uh oh!

reeselevine commented Mar 9, 2026

Uh oh!

CISC commented Mar 9, 2026

Uh oh!

yomaytk commented Mar 10, 2026 •

edited

Loading

Uh oh!

reeselevine commented Mar 10, 2026

Uh oh!

reeselevine commented Mar 11, 2026

Uh oh!

yomaytk commented Mar 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

yomaytk commented Mar 8, 2026

Uh oh!

reeselevine commented Mar 9, 2026

Uh oh!

reeselevine commented Mar 9, 2026

Uh oh!

CISC commented Mar 9, 2026

Uh oh!

yomaytk commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

reeselevine commented Mar 10, 2026

Uh oh!

reeselevine commented Mar 11, 2026

Uh oh!

yomaytk commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yomaytk commented Mar 10, 2026 •

edited

Loading

yomaytk commented Mar 11, 2026 •

edited

Loading