mtmd, model: add Gemma 4 "unified" variant by ngxson · Pull Request #24077 · ggml-org/llama.cpp

ngxson · 2026-06-03T14:36:45Z

Overview

More info about this PR will be added soon

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: no

ngxson · 2026-06-03T15:10:27Z

@CISC I need to merge this now, but can push fixes in follow-up PR if you spot any problems

tha80 · 2026-06-03T16:45:47Z

For those of you waiting for more information:

https://developers.googleblog.com/gemma-4-12b-the-developer-guide/

LostRuins · 2026-06-04T02:47:51Z

            return ctx->model.mm_fc_w->ne[1];
        case PROJECTOR_TYPE_LFM2A:
            return ctx->model.position_embeddings->ne[0];
-        case PROJECTOR_TYPE_GEMMA4A:


Hello, why is case PROJECTOR_TYPE_GEMMA4A: removed? Now loading gemma E4B will give GGML_ABORT("Unknown projector type");

Hello, why is case PROJECTOR_TYPE_GEMMA4A: removed? Now loading gemma E4B will give GGML_ABORT("Unknown projector type");

Same issue here.

@pilonull it was fixed in #24091 a while after my earlier comment.

LostRuins · 2026-06-04T02:48:02Z

@ngxson i spotted a problem.

https://github.com/ggml-org/llama.cpp/pull/24077/files#r3353114670

CISC · 2026-06-04T07:50:44Z

                } break;
+            case PROJECTOR_TYPE_GEMMA4UV:
+                {
+                    model.mm_input_proj_w = get_tensor(TN_MM_INP_PROJ);


Why is this without weight?

it's already included in the name, but I think that should be refactored at some point:

#define TN_MM_INP_PROJ "mm.input_projection.weight"

This reverts commit da0bb97.

* add model * nits (cherry picked from commit a731805)

#168) * mtmd, model: allow skip build_vit() (ggml-org#24077) * add model * nits (cherry picked from commit a731805) * mtmd: fix Gemma 4 unified FPE (ggml-org#24088) (cherry picked from commit 94a220c) * mtmd: enable non-causal vision for gemma 4 unified (ggml-org#24082) (cherry picked from commit c8d6a00) * fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091) * mtmd: handle Gemma 4 audio projector embedding size * rm projection_dim from clip_n_mmproj_embd --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> (cherry picked from commit e3ba22d) * convert: Fix Gemma 4 Unified conversion (ggml-org#24118) * Fix Gemma 4 Unified conversion * Set audio hidden size to audio_embed_dim (cherry picked from commit e802356) * ggml-metal: fall back to CPU for im2col when KH*KW exceeds threadgroup limit The Metal im2col kernel launches KH*KW threads per threadgroup (one per kernel element). For large conv kernels — e.g. the Gemma 4 unified vision (gemma4uv) patch embedding — KH*KW exceeds the Apple GPU 1024-thread cap and the kernel hits a runtime GGML_ASSERT instead of producing a result. Guard supports_op so an oversized im2col is declined; the backend scheduler then runs that one op on CPU while the rest of the graph stays on the GPU. Fixes Gemma 4 12B vision on the Metal backend (verified end-to-end: loads mmproj + describes an image correctly on an M5 Max). --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Andrei <abetlen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>

…6.0-beta.7 Why this exists --------------- EuLLM Engine 0.6.0-beta.{1..6} were pre-releases bumped specifically to ship Gemma 4 12B Unified vision via llama.cpp's mtmd path with the new `gemma4uv` projector (PRs ggml-org/llama.cpp#24077, #24082, #24091, merged 3-4 Jun 2026). The blocker turned out to be that the upstream Rust binding `llama-cpp-2` (utilityai/llama-cpp-rs) pins its llama.cpp submodule to a late-April commit and has not bumped since, so the new projector type is silently absent from bindgen output and Gemma 4 12B Unified cannot load on a stock 0.1.146 binary. ysimonson opened utilityai/llama-cpp-rs#1034 on 5 Jun 2026 that does exactly the bump we need ("support-gemma4-12b" branch, c491763bcd42eb742287afd5612883d3b6e5e3a8). The PR has zero reviews so far. Rather than wait for an unbounded merge window we vendor the sources in-tree, becoming the first public adopter of that PR — which also lets us comment on it with end-to-end evidence (Linux CUDA, Windows CUDA, macOS Metal) once beta.7 builds run. Vendor strategy --------------- * engine/vendor/llama-cpp-rs/ holds llama-cpp-2 0.1.147 and llama-cpp-sys-2 0.1.147 copied verbatim from ysimonson's HEAD c491763b. No source edits — the manifests use `workspace = true` and resolve cleanly because the EuLLM root workspace mirrors upstream's `[workspace.dependencies]` / `[workspace.lints]` tables byte for byte. * Both vendor crates become members of the EuLLM root workspace; cargo refused all the smaller scoping attempts (`[patch.crates-io]` path, direct path-dep with `[workspace] exclude`, `package.workspace = ".."` pin) because path-deps of a workspace member are unconditionally absorbed into the patching workspace. Merging the workspaces sidesteps this entirely — engine and hub don't opt into `[lints] workspace = true` so the upstream pedantic lint set only applies to the vendor crates. * llama.cpp is a git submodule of THIS repo (not nested inside the vendor tree) pinned to 7c158fbb4aec1bdc9c81d6ca0e785139f4826fae — the same SHA ysimonson chose, and the first commit including the gemma4uv projector. .gitignore gets a `!engine/vendor/**` exception so the broad `vendor/` rule doesn't swallow the vendored crates. Engine wiring ------------- * engine/Cargo.toml: llama-cpp-2 = { path = "vendor/llama-cpp-rs/...", version = "0.1.147", features = ["sampler"] }. Version bump 0.1.146 -> 0.1.147 matches the vendor manifests. * engine version 0.5.20 -> 0.6.0-beta.7. The 0.5.20 stable commit on this branch (81010eb) stays valid and can be tagged independently for the Latest release; this commit re-opens the pre-release line on top of it, this time actually capable of loading Gemma 4 12B Unified. Cleanup plan when upstream lands -------------------------------- When utilityai/llama-cpp-rs#1034 merges and llama-cpp-2 0.1.147+ is published to crates.io: 1. drop engine/vendor/llama-cpp-rs 2. drop the workspace members + workspace.dependencies + workspace.lints that this commit adds to the root Cargo.toml 3. drop the llama.cpp submodule entry and the .gitignore exception 4. restore engine/Cargo.toml to `llama-cpp-2 = { version = "...", features = ["sampler"] }` (registry, no path) Known scope of impact --------------------- * CI checkout steps now need `submodules: recursive` for engine-building jobs — covered by the previous commit. * Two upstream API renames in 0.1.147 (`llama_memory_breakdown_print` -> `llama_rs_memory_breakdown_print`, `llama_params_fit` -> `llama_rs_params_fit`) do not affect the engine — neither symbol is referenced anywhere in our source.

* add model * nits (cherry picked from commit a731805)

…L stage Brings gemma4-unified-vision (ggml-org#24077 skip build_vit + ggml-org#24082 non-causal vision for gemma4 unified) + remaining upstream. Conflict: ggml.c GGML_OP_COUNT static_assert — set to true merged enum count 104 (100 upstream ops + our 4 GGML_OP_ML8_* ops, all present in merged ggml.h). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

add model

951fa5c

ngxson requested review from a team and CISC as code owners June 3, 2026 14:36

github-actions Bot added model Model specific examples python python script changes labels Jun 3, 2026

nits

5aa6625

danbev approved these changes Jun 3, 2026

View reviewed changes

ggerganov approved these changes Jun 3, 2026

View reviewed changes

ngxson merged commit a731805 into master Jun 3, 2026
28 checks passed

ngxson changed the title ~~mtmd, model: allow skip build_vit()~~ mtmd, model: add Gemma 4 "unified" variant Jun 3, 2026

abetlen mentioned this pull request Jun 3, 2026

fix(mtmd): handle Gemma 4 audio projector embedding size #24091

Merged

LostRuins reviewed Jun 4, 2026

View reviewed changes

LostRuins added a commit to LostRuins/koboldcpp that referenced this pull request Jun 4, 2026

e4b fix ggml-org#24077

da0bb97

CISC reviewed Jun 4, 2026

View reviewed changes

LostRuins added a commit to LostRuins/koboldcpp that referenced this pull request Jun 4, 2026

Revert "e4b fix ggml-org#24077"

94375fb

This reverts commit da0bb97.

TheTom pushed a commit to TheTom/llama-cpp-turboquant that referenced this pull request Jun 5, 2026

mtmd, model: allow skip build_vit() (ggml-org#24077)

27d622f

* add model * nits (cherry picked from commit a731805)

This was referenced Jun 5, 2026

mtmd: port gemma4uv/gemma4ua support — fixes Gemma 4 12B vision (#163) TheTom/llama-cpp-turboquant#168

Merged

fix(mtmd): guard clip d_head/kq_scale against n_head==0 (Gemma 4 12B SIGFPE) TheTom/llama-cpp-turboquant#166

Closed

hyeoktae pushed a commit to hyeoktae/llama.cpp that referenced this pull request Jun 11, 2026

mtmd, model: allow skip build_vit() (ggml-org#24077)

5b794e2

* add model * nits (cherry picked from commit a731805)

ngxson deleted the xsn/g4u branch June 13, 2026 12:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

mtmd, model: add Gemma 4 "unified" variant#24077

mtmd, model: add Gemma 4 "unified" variant#24077
ngxson merged 2 commits into
masterfrom
xsn/g4u

ngxson commented Jun 3, 2026

Uh oh!

ngxson commented Jun 3, 2026

Uh oh!

Uh oh!

tha80 commented Jun 3, 2026

Uh oh!

LostRuins Jun 4, 2026

Uh oh!

pilonull Jun 5, 2026

Uh oh!

LostRuins Jun 5, 2026

Uh oh!

LostRuins commented Jun 4, 2026

Uh oh!

CISC Jun 4, 2026

Uh oh!

ngxson Jun 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

ngxson commented Jun 3, 2026

Overview

Requirements

Uh oh!

ngxson commented Jun 3, 2026

Uh oh!

Uh oh!

tha80 commented Jun 3, 2026

Uh oh!

LostRuins Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

pilonull Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

LostRuins Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

LostRuins commented Jun 4, 2026

Uh oh!

CISC Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

ngxson Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants