Sync to latest llama.cpp#1034
Conversation
* Disable `LLAMA_BUILD_APP` since largely unneeded code was moved there * Feature-gate fitting since it's been moved to the common library * Link both `llama-common` and `llama-common-base` * Recreate `msg_diff_to_json_oaicompat` since it was moved to the unlinked server tools
…6.0-beta.7
Why this exists
---------------
EuLLM Engine 0.6.0-beta.{1..6} were pre-releases bumped specifically to
ship Gemma 4 12B Unified vision via llama.cpp's mtmd path with the new
`gemma4uv` projector (PRs ggml-org/llama.cpp#24077, #24082, #24091,
merged 3-4 Jun 2026). The blocker turned out to be that the upstream
Rust binding `llama-cpp-2` (utilityai/llama-cpp-rs) pins its llama.cpp
submodule to a late-April commit and has not bumped since, so the new
projector type is silently absent from bindgen output and Gemma 4 12B
Unified cannot load on a stock 0.1.146 binary.
ysimonson opened utilityai/llama-cpp-rs#1034 on 5 Jun 2026 that does
exactly the bump we need ("support-gemma4-12b" branch,
c491763bcd42eb742287afd5612883d3b6e5e3a8). The PR has zero reviews so
far. Rather than wait for an unbounded merge window we vendor the
sources in-tree, becoming the first public adopter of that PR — which
also lets us comment on it with end-to-end evidence (Linux CUDA,
Windows CUDA, macOS Metal) once beta.7 builds run.
Vendor strategy
---------------
* engine/vendor/llama-cpp-rs/ holds llama-cpp-2 0.1.147 and
llama-cpp-sys-2 0.1.147 copied verbatim from ysimonson's HEAD
c491763b. No source edits — the manifests use `workspace = true` and
resolve cleanly because the EuLLM root workspace mirrors upstream's
`[workspace.dependencies]` / `[workspace.lints]` tables byte for byte.
* Both vendor crates become members of the EuLLM root workspace; cargo
refused all the smaller scoping attempts (`[patch.crates-io]` path,
direct path-dep with `[workspace] exclude`, `package.workspace = ".."`
pin) because path-deps of a workspace member are unconditionally
absorbed into the patching workspace. Merging the workspaces sidesteps
this entirely — engine and hub don't opt into `[lints] workspace = true`
so the upstream pedantic lint set only applies to the vendor crates.
* llama.cpp is a git submodule of THIS repo (not nested inside the
vendor tree) pinned to 7c158fbb4aec1bdc9c81d6ca0e785139f4826fae —
the same SHA ysimonson chose, and the first commit including the
gemma4uv projector. .gitignore gets a `!engine/vendor/**` exception
so the broad `vendor/` rule doesn't swallow the vendored crates.
Engine wiring
-------------
* engine/Cargo.toml: llama-cpp-2 = { path = "vendor/llama-cpp-rs/...",
version = "0.1.147", features = ["sampler"] }. Version bump
0.1.146 -> 0.1.147 matches the vendor manifests.
* engine version 0.5.20 -> 0.6.0-beta.7. The 0.5.20 stable commit on
this branch (81010eb) stays valid and can be tagged independently for
the Latest release; this commit re-opens the pre-release line on top
of it, this time actually capable of loading Gemma 4 12B Unified.
Cleanup plan when upstream lands
--------------------------------
When utilityai/llama-cpp-rs#1034 merges and llama-cpp-2 0.1.147+ is
published to crates.io:
1. drop engine/vendor/llama-cpp-rs
2. drop the workspace members + workspace.dependencies + workspace.lints
that this commit adds to the root Cargo.toml
3. drop the llama.cpp submodule entry and the .gitignore exception
4. restore engine/Cargo.toml to `llama-cpp-2 = { version = "...",
features = ["sampler"] }` (registry, no path)
Known scope of impact
---------------------
* CI checkout steps now need `submodules: recursive` for engine-building
jobs — covered by the previous commit.
* Two upstream API renames in 0.1.147 (`llama_memory_breakdown_print` ->
`llama_rs_memory_breakdown_print`, `llama_params_fit` ->
`llama_rs_params_fit`) do not affect the engine — neither symbol is
referenced anywhere in our source.
|
Thanks for this, we adopted this branch ahead of merge and wanted to share end-to-end evidence in case it's useful for review. We're EuLLM, an open-source engine (Rust + your bindings). We vendored Builds — all green in our release pipeline:
No source changes to your crates were needed — they compiled as-is on every target. Runtime-validated (Windows x64, CUDA 12.8, RTX 5070 Ti 16 GB, compute 12.0):
image_tokens->nx = 273 and the model produces a coherent, detailed description of a pure (text-free) landscape photo — and accurate OCR on text-bearing images. So from our side this branch is solid on Gemma 4 12B vision across CPU/CUDA/Metal builds. Happy to share more logs or test other configs if it helps land the PR. 🙏 |
|
I merged in "Sync with latest llama.cpp: Remove OpenAI API support, add missing parameter to mtmd stuff, update build.rs #1037." hope that covers this. |
This will add support for Gemma4 12B's non-encoder multimodal functionality.