Sync with latest llama.cpp: Remove OpenAI API support, add missing parameter to mtmd stuff, update build.rs by AsbjornOlling · Pull Request #1037 · utilityai/llama-cpp-rs

AsbjornOlling · 2026-06-09T15:04:54Z

This is yet-another PR that syncs llama.cpp to the latest commit.

Upstream changes to OpenAI API compat

It is very similar to #1034, but solves the problem with common_chat_msg_diff_to_json_oaicompat() differently.
The problem to address is that common_chat_msg_diff_to_json_oaicompat() was moved from common into server. I see three ways to solve this problem:

One approach is to link against server. This obviously sucks because it brings on a lot of build-time dependencies, and may bloat the build binary size too (I'm not sure exactly what the tree-shaking behavior is here). Neither PR does this.

Another approach (implemented by #1034) is to copy-paste the implementation of common_chat_msg_diff_to_json_oaicompat() into this repo, so that we can include it without building llama-server. This avoids making a breaking change, but is shitty to maintain for obvious reasons. Do we really want this repo to contain a bunch of duplicated code from llama.cpp?

A third approach (implemented by this PR) is to remove all of the openai-api-compatibility stuff. This is obviously a breaking change, and the most drastic of the three approaches, but it has some advantages too. It's far simpler to maintain going forward, it fits better with the scope of this project (to be direct low-level bindings of the libllama.a API), and it lets us remove ~1000 lines of C++ code from this repo.

I feel that the Openai-compatible API stuff never fit that well into this project. If the goal is to be direct low-level bindings into the libllama.a API, then adding a bunch of stuff outside of libllama for handling specific shapes of JSON is significantly out-of-scope. IMO this project should support llama.cpp for inference, not as a webapp server.

Add placeholder parameter for MTMD functions

Upstream has added a new parameter to the mtmd functions for loading images and audio. When placeholder is set to true, the image is replaced with a dummy placeholder to make token-counting a bunch faster for multimodal use-cases.
This PR supports it. As far as I can see, #1034 does not. Dunno if that's because they're building without the mtmd feature, or if it was added in a more recent version of llama.cpp

Build.rs fixes

common -> llama-common: Simple build.rs fix. The common module was renamed.
set LLAMA_BUILD_APP=OFF to avoid building the unified binary

Add shims for `params_fit` and `memory_breakdown_print`

These moved from llama.h into common, and now need shims and feature-gating.

AsbjornOlling · 2026-06-12T10:59:41Z

@MarcusDunn Could we merge one of these PRs?

I prefer this approach over #1034, but am okay with either. We just need to get llama.cpp updated.

Marcus Dunn and others added 4 commits June 8, 2026 01:19

updated llama.cpp

e9885c6

remove openai-compatible api functionality

fed596a

new shims for params_fit and memory_breakdown_print

b16e8dd

support placeholder flag on multimodal ingestion

36c6311

AsbjornOlling changed the title ~~Remove OpenAI API support, Sync llama.cpp~~ Sync with latest llama.cpp: Remove OpenAI API support, add missing parameter to mtmd stuff, update build.rs Jun 9, 2026

MarcusDunn merged commit 59811eb into utilityai:main Jun 12, 2026
3 of 5 checks passed

MarcusDunn mentioned this pull request Jun 12, 2026

Sync to latest llama.cpp #1034

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sync with latest llama.cpp: Remove OpenAI API support, add missing parameter to mtmd stuff, update build.rs#1037

Sync with latest llama.cpp: Remove OpenAI API support, add missing parameter to mtmd stuff, update build.rs#1037
MarcusDunn merged 4 commits into
utilityai:mainfrom
AsbjornOlling:remove-openai-api-support

AsbjornOlling commented Jun 9, 2026 •

edited

Loading

Uh oh!

AsbjornOlling commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

AsbjornOlling commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Upstream changes to OpenAI API compat

Add placeholder parameter for MTMD functions

Build.rs fixes

Add shims for params_fit and memory_breakdown_print

Uh oh!

AsbjornOlling commented Jun 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

AsbjornOlling commented Jun 9, 2026 •

edited

Loading

Add shims for `params_fit` and `memory_breakdown_print`