Ggml/cuda col2im 1d by ServeurpersoCom · Pull Request #24417 · ggml-org/llama.cpp

ServeurpersoCom · 2026-06-10T14:36:09Z

Overview

cuda: add GGML_OP_COL2IM_1D

CUDA backend follow-up to the CPU op ( #24206 ), same formulation: a gather kernel, one thread per output, each reading only the ceil(K/s0) columns that scatter into it. F32 / F16 / BF16 with an F32 accumulator.
The flat idx -> (channel, time) decomposition uses fast_div_modulo, which buys back time on the cache resident F32 / F16 shapes where the kernel is ALU exposed; on the DRAM bound long shape it is a no op, as expected.

Additional information

Validated against the test-backend-ops grid merged with the CPU op, zero additional test code: 33/33 on CUDA0 across the eight geometries and three types, plus the three perf entries. CMake globs the new .cu, so the only wiring is the dispatch case and the supports_op entry next to conv_transpose_1d.

Optimization (2nd commit):

Same fastdiv pattern as the Snake CUDA fusion ( #22667 ), measured around 10% on F32 and F16 on the cache resident vocoder stage shapes vs a plain div + mod.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES Opus / MCP rootless container with Nvidia GPU

ServeurpersoCom added 2 commits June 10, 2026 15:46

cuda: add GGML_OP_COL2IM_1D, follow-up to the CPU op

1a6725a

cuda: col2im_1d use fast_div_modulo for the index decomposition

77d1dca

ServeurpersoCom requested a review from a team as a code owner June 10, 2026 14:36

github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jun 10, 2026

ServeurpersoCom requested review from JohannesGaessler and am17an June 10, 2026 14:37

ServeurpersoCom mentioned this pull request Jun 12, 2026

ggml : add GGML_OP_COL2IM_1D (CPU + CUDA) #23424

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ggml/cuda col2im 1d#24417

Ggml/cuda col2im 1d#24417
ServeurpersoCom wants to merge 2 commits into
ggml-org:masterfrom
ServeurpersoCom:ggml/cuda-col2im_1d

ServeurpersoCom commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ServeurpersoCom commented Jun 10, 2026

Overview

cuda: add GGML_OP_COL2IM_1D

Additional information

Optimization (2nd commit):

Requirements

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant