Sync to latest llama.cpp by ysimonson · Pull Request #1034 · utilityai/llama-cpp-rs

ysimonson · 2026-06-05T11:31:44Z

This will add support for Gemma4 12B's non-encoder multimodal functionality.

* Disable `LLAMA_BUILD_APP` since largely unneeded code was moved there * Feature-gate fitting since it's been moved to the common library * Link both `llama-common` and `llama-common-base` * Recreate `msg_diff_to_json_oaicompat` since it was moved to the unlinked server tools

…6.0-beta.7 Why this exists --------------- EuLLM Engine 0.6.0-beta.{1..6} were pre-releases bumped specifically to ship Gemma 4 12B Unified vision via llama.cpp's mtmd path with the new `gemma4uv` projector (PRs ggml-org/llama.cpp#24077, #24082, #24091, merged 3-4 Jun 2026). The blocker turned out to be that the upstream Rust binding `llama-cpp-2` (utilityai/llama-cpp-rs) pins its llama.cpp submodule to a late-April commit and has not bumped since, so the new projector type is silently absent from bindgen output and Gemma 4 12B Unified cannot load on a stock 0.1.146 binary. ysimonson opened utilityai/llama-cpp-rs#1034 on 5 Jun 2026 that does exactly the bump we need ("support-gemma4-12b" branch, c491763bcd42eb742287afd5612883d3b6e5e3a8). The PR has zero reviews so far. Rather than wait for an unbounded merge window we vendor the sources in-tree, becoming the first public adopter of that PR — which also lets us comment on it with end-to-end evidence (Linux CUDA, Windows CUDA, macOS Metal) once beta.7 builds run. Vendor strategy --------------- * engine/vendor/llama-cpp-rs/ holds llama-cpp-2 0.1.147 and llama-cpp-sys-2 0.1.147 copied verbatim from ysimonson's HEAD c491763b. No source edits — the manifests use `workspace = true` and resolve cleanly because the EuLLM root workspace mirrors upstream's `[workspace.dependencies]` / `[workspace.lints]` tables byte for byte. * Both vendor crates become members of the EuLLM root workspace; cargo refused all the smaller scoping attempts (`[patch.crates-io]` path, direct path-dep with `[workspace] exclude`, `package.workspace = ".."` pin) because path-deps of a workspace member are unconditionally absorbed into the patching workspace. Merging the workspaces sidesteps this entirely — engine and hub don't opt into `[lints] workspace = true` so the upstream pedantic lint set only applies to the vendor crates. * llama.cpp is a git submodule of THIS repo (not nested inside the vendor tree) pinned to 7c158fbb4aec1bdc9c81d6ca0e785139f4826fae — the same SHA ysimonson chose, and the first commit including the gemma4uv projector. .gitignore gets a `!engine/vendor/**` exception so the broad `vendor/` rule doesn't swallow the vendored crates. Engine wiring ------------- * engine/Cargo.toml: llama-cpp-2 = { path = "vendor/llama-cpp-rs/...", version = "0.1.147", features = ["sampler"] }. Version bump 0.1.146 -> 0.1.147 matches the vendor manifests. * engine version 0.5.20 -> 0.6.0-beta.7. The 0.5.20 stable commit on this branch (81010eb) stays valid and can be tagged independently for the Latest release; this commit re-opens the pre-release line on top of it, this time actually capable of loading Gemma 4 12B Unified. Cleanup plan when upstream lands -------------------------------- When utilityai/llama-cpp-rs#1034 merges and llama-cpp-2 0.1.147+ is published to crates.io: 1. drop engine/vendor/llama-cpp-rs 2. drop the workspace members + workspace.dependencies + workspace.lints that this commit adds to the root Cargo.toml 3. drop the llama.cpp submodule entry and the .gitignore exception 4. restore engine/Cargo.toml to `llama-cpp-2 = { version = "...", features = ["sampler"] }` (registry, no path) Known scope of impact --------------------- * CI checkout steps now need `submodules: recursive` for engine-building jobs — covered by the previous commit. * Two upstream API renames in 0.1.147 (`llama_memory_breakdown_print` -> `llama_rs_memory_breakdown_print`, `llama_params_fit` -> `llama_rs_params_fit`) do not affect the engine — neither symbol is referenced anywhere in our source.

primoco · 2026-06-06T19:56:01Z

Thanks for this, we adopted this branch ahead of merge and wanted to share end-to-end evidence in case it's useful for review.

We're EuLLM, an open-source engine (Rust + your bindings). We vendored llama-cpp-2 / llama-cpp-sys-2 at c491763 (this branch, llama.cpp submodule pinned to 7c158fbb4 — the first commit carrying the gemma4uv projector) specifically to run Gemma 4 12B with vision, which the current crates.io release can't load.

Builds — all green in our release pipeline:

Target	Result
Linux x64 / arm64	✅
macOS x64 / arm64	✅
Windows x64	✅
Linux x64 CUDA 12.8	✅
Windows x64 CUDA 12.8 (Ninja generator)	✅

No source changes to your crates were needed — they compiled as-is on every target.

Runtime-validated (Windows x64, CUDA 12.8, RTX 5070 Ti 16 GB, compute 12.0):

gemma4 architecture + F16 mmproj load cleanly; 49/49 layers offloaded to GPU; fused Gated Delta Net kernels (autoregressive + chunked) recognized and enabled.
Multimodal path via mtmd works: an input image is encoded by the projector

image_tokens->nx = 273
image slice encoded in 99 ms
decoding image batch 1/1, n_tokens_batch = 273
image decoded (batch 1/1) in 59 ms

and the model produces a coherent, detailed description of a pure (text-free) landscape photo — and accurate OCR on text-bearing images.

So from our side this branch is solid on Gemma 4 12B vision across CPU/CUDA/Metal builds. Happy to share more logs or test other configs if it helps land the PR. 🙏

MarcusDunn · 2026-06-12T16:06:32Z

I merged in "Sync with latest llama.cpp: Remove OpenAI API support, add missing parameter to mtmd stuff, update build.rs #1037." hope that covers this.

ysimonson added 3 commits June 4, 2026 18:16

Sync to latest to include gemma4 12b support

26e9fa1

Trim down comment

c491763

AsbjornOlling mentioned this pull request Jun 9, 2026

Sync with latest llama.cpp: Remove OpenAI API support, add missing parameter to mtmd stuff, update build.rs #1037

Merged

MarcusDunn closed this Jun 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync to latest llama.cpp#1034

Sync to latest llama.cpp#1034
ysimonson wants to merge 3 commits into
utilityai:mainfrom
ysimonson:support-gemma4-12b

ysimonson commented Jun 5, 2026

Uh oh!

primoco commented Jun 6, 2026

Uh oh!

MarcusDunn commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ysimonson commented Jun 5, 2026

Uh oh!

primoco commented Jun 6, 2026

Uh oh!

MarcusDunn commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants