Conversation
|
@CISC I need to merge this now, but can push fixes in follow-up PR if you spot any problems |
|
For those of you waiting for more information: https://developers.googleblog.com/gemma-4-12b-the-developer-guide/ |
| return ctx->model.mm_fc_w->ne[1]; | ||
| case PROJECTOR_TYPE_LFM2A: | ||
| return ctx->model.position_embeddings->ne[0]; | ||
| case PROJECTOR_TYPE_GEMMA4A: |
There was a problem hiding this comment.
Hello, why is case PROJECTOR_TYPE_GEMMA4A: removed? Now loading gemma E4B will give GGML_ABORT("Unknown projector type");
There was a problem hiding this comment.
Hello, why is
case PROJECTOR_TYPE_GEMMA4A:removed? Now loading gemma E4B will giveGGML_ABORT("Unknown projector type");
Same issue here.
|
@ngxson i spotted a problem. https://github.com/ggml-org/llama.cpp/pull/24077/files#r3353114670 |
| } break; | ||
| case PROJECTOR_TYPE_GEMMA4UV: | ||
| { | ||
| model.mm_input_proj_w = get_tensor(TN_MM_INP_PROJ); |
There was a problem hiding this comment.
it's already included in the name, but I think that should be refactored at some point:
#define TN_MM_INP_PROJ "mm.input_projection.weight"This reverts commit da0bb97.
* add model * nits (cherry picked from commit a731805)
#168) * mtmd, model: allow skip build_vit() (ggml-org#24077) * add model * nits (cherry picked from commit a731805) * mtmd: fix Gemma 4 unified FPE (ggml-org#24088) (cherry picked from commit 94a220c) * mtmd: enable non-causal vision for gemma 4 unified (ggml-org#24082) (cherry picked from commit c8d6a00) * fix(mtmd): handle Gemma 4 audio projector embedding size (ggml-org#24091) * mtmd: handle Gemma 4 audio projector embedding size * rm projection_dim from clip_n_mmproj_embd --------- Co-authored-by: Xuan Son Nguyen <son@huggingface.co> (cherry picked from commit e3ba22d) * convert: Fix Gemma 4 Unified conversion (ggml-org#24118) * Fix Gemma 4 Unified conversion * Set audio hidden size to audio_embed_dim (cherry picked from commit e802356) * ggml-metal: fall back to CPU for im2col when KH*KW exceeds threadgroup limit The Metal im2col kernel launches KH*KW threads per threadgroup (one per kernel element). For large conv kernels — e.g. the Gemma 4 unified vision (gemma4uv) patch embedding — KH*KW exceeds the Apple GPU 1024-thread cap and the kernel hits a runtime GGML_ASSERT instead of producing a result. Guard supports_op so an oversized im2col is declined; the backend scheduler then runs that one op on CPU while the rest of the graph stays on the GPU. Fixes Gemma 4 12B vision on the Metal backend (verified end-to-end: loads mmproj + describes an image correctly on an M5 Max). --------- Co-authored-by: Xuan-Son Nguyen <son@huggingface.co> Co-authored-by: Andrei <abetlen@gmail.com> Co-authored-by: Pedro Cuenca <pedro@huggingface.co>
…6.0-beta.7
Why this exists
---------------
EuLLM Engine 0.6.0-beta.{1..6} were pre-releases bumped specifically to
ship Gemma 4 12B Unified vision via llama.cpp's mtmd path with the new
`gemma4uv` projector (PRs ggml-org/llama.cpp#24077, #24082, #24091,
merged 3-4 Jun 2026). The blocker turned out to be that the upstream
Rust binding `llama-cpp-2` (utilityai/llama-cpp-rs) pins its llama.cpp
submodule to a late-April commit and has not bumped since, so the new
projector type is silently absent from bindgen output and Gemma 4 12B
Unified cannot load on a stock 0.1.146 binary.
ysimonson opened utilityai/llama-cpp-rs#1034 on 5 Jun 2026 that does
exactly the bump we need ("support-gemma4-12b" branch,
c491763bcd42eb742287afd5612883d3b6e5e3a8). The PR has zero reviews so
far. Rather than wait for an unbounded merge window we vendor the
sources in-tree, becoming the first public adopter of that PR — which
also lets us comment on it with end-to-end evidence (Linux CUDA,
Windows CUDA, macOS Metal) once beta.7 builds run.
Vendor strategy
---------------
* engine/vendor/llama-cpp-rs/ holds llama-cpp-2 0.1.147 and
llama-cpp-sys-2 0.1.147 copied verbatim from ysimonson's HEAD
c491763b. No source edits — the manifests use `workspace = true` and
resolve cleanly because the EuLLM root workspace mirrors upstream's
`[workspace.dependencies]` / `[workspace.lints]` tables byte for byte.
* Both vendor crates become members of the EuLLM root workspace; cargo
refused all the smaller scoping attempts (`[patch.crates-io]` path,
direct path-dep with `[workspace] exclude`, `package.workspace = ".."`
pin) because path-deps of a workspace member are unconditionally
absorbed into the patching workspace. Merging the workspaces sidesteps
this entirely — engine and hub don't opt into `[lints] workspace = true`
so the upstream pedantic lint set only applies to the vendor crates.
* llama.cpp is a git submodule of THIS repo (not nested inside the
vendor tree) pinned to 7c158fbb4aec1bdc9c81d6ca0e785139f4826fae —
the same SHA ysimonson chose, and the first commit including the
gemma4uv projector. .gitignore gets a `!engine/vendor/**` exception
so the broad `vendor/` rule doesn't swallow the vendored crates.
Engine wiring
-------------
* engine/Cargo.toml: llama-cpp-2 = { path = "vendor/llama-cpp-rs/...",
version = "0.1.147", features = ["sampler"] }. Version bump
0.1.146 -> 0.1.147 matches the vendor manifests.
* engine version 0.5.20 -> 0.6.0-beta.7. The 0.5.20 stable commit on
this branch (81010eb) stays valid and can be tagged independently for
the Latest release; this commit re-opens the pre-release line on top
of it, this time actually capable of loading Gemma 4 12B Unified.
Cleanup plan when upstream lands
--------------------------------
When utilityai/llama-cpp-rs#1034 merges and llama-cpp-2 0.1.147+ is
published to crates.io:
1. drop engine/vendor/llama-cpp-rs
2. drop the workspace members + workspace.dependencies + workspace.lints
that this commit adds to the root Cargo.toml
3. drop the llama.cpp submodule entry and the .gitignore exception
4. restore engine/Cargo.toml to `llama-cpp-2 = { version = "...",
features = ["sampler"] }` (registry, no path)
Known scope of impact
---------------------
* CI checkout steps now need `submodules: recursive` for engine-building
jobs — covered by the previous commit.
* Two upstream API renames in 0.1.147 (`llama_memory_breakdown_print` ->
`llama_rs_memory_breakdown_print`, `llama_params_fit` ->
`llama_rs_params_fit`) do not affect the engine — neither symbol is
referenced anywhere in our source.
* add model * nits (cherry picked from commit a731805)
…L stage Brings gemma4-unified-vision (ggml-org#24077 skip build_vit + ggml-org#24082 non-causal vision for gemma4 unified) + remaining upstream. Conflict: ggml.c GGML_OP_COUNT static_assert — set to true merged enum count 104 (100 upstream ops + our 4 GGML_OP_ML8_* ops, all present in merged ggml.h). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Overview
More info about this PR will be added soon
Requirements