ggml webgpu: shader library organization by reeselevine · Pull Request #19530 · ggml-org/llama.cpp

reeselevine · 2026-02-12T00:15:41Z

We've been converting many of the existing WGSL shaders into a format that allows for efficient just-in-time compilation of variants used in specific model graphs, as well as sets them up for better performance tuning down the road. This PR makes a pretty large organizational change, moving the shader preprocessing, compilation, and caching into a new ggml_webgpu_shader_lib structure. As part of this, the existing matrix multiplication shaders were also converted in to the JIT compilation format (using the wgsl preprocessor), along with get_rows and scale.

This new shader library class also opens up the opportunity for tons of interesting specialization in the WebGPU backend. For example, if you have a shader specialized for a particular GPU vendor/architecture in WGSL, it should be pretty easy to hook it into the logic for choosing the right shader/pipeline.

It's always nice to have a PR that removes more lines of code than it adds too :)

* scale jit working * preliminary working jit for getrows and mulmat, needs refining * simplified mul_mat preprocessing switch statement * get_rows fixes, mul_mat refinement * formatted + last edits * removed some extraneous prints * fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish * small fix * some changes, working * get_rows and mul_mat jit fixed and working * Update formatting * formatting * Add header --------- Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com>

SharmaRithik · 2026-02-17T07:41:53Z

The changes look good to me! The new shader library structure and JIT variant setup make the design cleaner and more extensible, and the refactor looks solid overall.

* Basic JIT compilation for mul_mat, get_rows, and scale (ggml-org#17) * scale jit working * preliminary working jit for getrows and mulmat, needs refining * simplified mul_mat preprocessing switch statement * get_rows fixes, mul_mat refinement * formatted + last edits * removed some extraneous prints * fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish * small fix * some changes, working * get_rows and mul_mat jit fixed and working * Update formatting * formatting * Add header --------- Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Start work on all-encompassing shader library * refactor argmax, set_rows * Refactor all but flashattention, mat mul * flashattention and matrix multiplication moved to new format * clean up preprocessing * Formatting * remove duplicate constants * Split large shaders into multiple static strings --------- Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com>

* Basic JIT compilation for mul_mat, get_rows, and scale (reeselevine#17) * scale jit working * preliminary working jit for getrows and mulmat, needs refining * simplified mul_mat preprocessing switch statement * get_rows fixes, mul_mat refinement * formatted + last edits * removed some extraneous prints * fixed get_rows, fixed workgroup dispatch in mul_mat. no gibberish * small fix * some changes, working * get_rows and mul_mat jit fixed and working * Update formatting * formatting * Add header --------- Co-authored-by: Neha Abbas <nehaabbas@ReeseLevines-MacBook-Pro.local> Co-authored-by: Reese Levine <reeselevine1@gmail.com> * Start work on all-encompassing shader library * refactor argmax, set_rows * Refactor all but flashattention, mat mul * flashattention and matrix multiplication moved to new format * clean up preprocessing * Formatting * remove duplicate constants * Split large shaders into multiple static strings --------- Co-authored-by: neha-ha <137219201+neha-ha@users.noreply.github.com>

neha-ha and others added 8 commits February 10, 2026 19:27

Start work on all-encompassing shader library

75e66cb

refactor argmax, set_rows

8a13bbb

Refactor all but flashattention, mat mul

a4e9b45

flashattention and matrix multiplication moved to new format

ae6baf4

clean up preprocessing

d314669

Formatting

e3b214f

remove duplicate constants

e29c480

github-actions bot added python python script changes ggml changes relating to the ggml tensor library for machine learning labels Feb 12, 2026

ggerganov approved these changes Feb 17, 2026

View reviewed changes

Split large shaders into multiple static strings

59c114d

reeselevine merged commit 238856e into ggml-org:master Feb 18, 2026
79 of 80 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml webgpu: shader library organization#19530

ggml webgpu: shader library organization#19530
reeselevine merged 9 commits intoggml-org:masterfrom
reeselevine:master

reeselevine commented Feb 12, 2026 •

edited

Loading

Uh oh!

SharmaRithik commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

reeselevine commented Feb 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

SharmaRithik commented Feb 17, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

reeselevine commented Feb 12, 2026 •

edited

Loading