Skip to content

Ggml/cuda col2im 1d#24417

Open
ServeurpersoCom wants to merge 2 commits into
ggml-org:masterfrom
ServeurpersoCom:ggml/cuda-col2im_1d
Open

Ggml/cuda col2im 1d#24417
ServeurpersoCom wants to merge 2 commits into
ggml-org:masterfrom
ServeurpersoCom:ggml/cuda-col2im_1d

Conversation

@ServeurpersoCom

Copy link
Copy Markdown
Contributor

Overview

cuda: add GGML_OP_COL2IM_1D

CUDA backend follow-up to the CPU op ( #24206 ), same formulation: a gather kernel, one thread per output, each reading only the ceil(K/s0) columns that scatter into it. F32 / F16 / BF16 with an F32 accumulator.
The flat idx -> (channel, time) decomposition uses fast_div_modulo, which buys back time on the cache resident F32 / F16 shapes where the kernel is ALU exposed; on the DRAM bound long shape it is a no op, as expected.

Additional information

Validated against the test-backend-ops grid merged with the CPU op, zero additional test code: 33/33 on CUDA0 across the eight geometries and three types, plus the three perf entries. CMake globs the new .cu, so the only wiring is the dispatch case and the supports_op entry next to conv_transpose_1d.

Optimization (2nd commit):

Same fastdiv pattern as the Snake CUDA fusion ( #22667 ), measured around 10% on F32 and F16 on the cache resident vocoder stage shapes vs a plain div + mod.

Requirements

  • I have read and agree with the contributing guidelines
  • AI usage disclosure: YES Opus / MCP rootless container with Nvidia GPU

@ServeurpersoCom ServeurpersoCom requested a review from a team as a code owner June 10, 2026 14:36
@github-actions github-actions Bot added Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Jun 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant