Skip to content

Sync with latest llama.cpp: Remove OpenAI API support, add missing parameter to mtmd stuff, update build.rs#1037

Merged
MarcusDunn merged 4 commits into
utilityai:mainfrom
AsbjornOlling:remove-openai-api-support
Jun 12, 2026
Merged

Sync with latest llama.cpp: Remove OpenAI API support, add missing parameter to mtmd stuff, update build.rs#1037
MarcusDunn merged 4 commits into
utilityai:mainfrom
AsbjornOlling:remove-openai-api-support

Conversation

@AsbjornOlling

@AsbjornOlling AsbjornOlling commented Jun 9, 2026

Copy link
Copy Markdown
Contributor

This is yet-another PR that syncs llama.cpp to the latest commit.

Upstream changes to OpenAI API compat

It is very similar to #1034, but solves the problem with common_chat_msg_diff_to_json_oaicompat() differently.
The problem to address is that common_chat_msg_diff_to_json_oaicompat() was moved from common into server. I see three ways to solve this problem:

One approach is to link against server. This obviously sucks because it brings on a lot of build-time dependencies, and may bloat the build binary size too (I'm not sure exactly what the tree-shaking behavior is here). Neither PR does this.

Another approach (implemented by #1034) is to copy-paste the implementation of common_chat_msg_diff_to_json_oaicompat() into this repo, so that we can include it without building llama-server. This avoids making a breaking change, but is shitty to maintain for obvious reasons. Do we really want this repo to contain a bunch of duplicated code from llama.cpp?

A third approach (implemented by this PR) is to remove all of the openai-api-compatibility stuff. This is obviously a breaking change, and the most drastic of the three approaches, but it has some advantages too. It's far simpler to maintain going forward, it fits better with the scope of this project (to be direct low-level bindings of the libllama.a API), and it lets us remove ~1000 lines of C++ code from this repo.

I feel that the Openai-compatible API stuff never fit that well into this project. If the goal is to be direct low-level bindings into the libllama.a API, then adding a bunch of stuff outside of libllama for handling specific shapes of JSON is significantly out-of-scope. IMO this project should support llama.cpp for inference, not as a webapp server.

Add placeholder parameter for MTMD functions

Upstream has added a new parameter to the mtmd functions for loading images and audio. When placeholder is set to true, the image is replaced with a dummy placeholder to make token-counting a bunch faster for multimodal use-cases.
This PR supports it. As far as I can see, #1034 does not. Dunno if that's because they're building without the mtmd feature, or if it was added in a more recent version of llama.cpp

Build.rs fixes

  • common -> llama-common: Simple build.rs fix. The common module was renamed.
  • set LLAMA_BUILD_APP=OFF to avoid building the unified binary

Add shims for params_fit and memory_breakdown_print

These moved from llama.h into common, and now need shims and feature-gating.

@AsbjornOlling AsbjornOlling changed the title Remove OpenAI API support, Sync llama.cpp Sync with latest llama.cpp: Remove OpenAI API support, add missing parameter to mtmd stuff, update build.rs Jun 9, 2026
@AsbjornOlling

Copy link
Copy Markdown
Contributor Author

@MarcusDunn Could we merge one of these PRs?

I prefer this approach over #1034, but am okay with either. We just need to get llama.cpp updated.

@MarcusDunn MarcusDunn merged commit 59811eb into utilityai:main Jun 12, 2026
3 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants