Sync with latest llama.cpp: Remove OpenAI API support, add missing parameter to mtmd stuff, update build.rs#1037
Merged
MarcusDunn merged 4 commits intoJun 12, 2026
Conversation
Contributor
Author
|
@MarcusDunn Could we merge one of these PRs? I prefer this approach over #1034, but am okay with either. We just need to get llama.cpp updated. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This is yet-another PR that syncs llama.cpp to the latest commit.
Upstream changes to OpenAI API compat
It is very similar to #1034, but solves the problem with
common_chat_msg_diff_to_json_oaicompat()differently.The problem to address is that
common_chat_msg_diff_to_json_oaicompat()was moved from common into server. I see three ways to solve this problem:One approach is to link against server. This obviously sucks because it brings on a lot of build-time dependencies, and may bloat the build binary size too (I'm not sure exactly what the tree-shaking behavior is here). Neither PR does this.
Another approach (implemented by #1034) is to copy-paste the implementation of
common_chat_msg_diff_to_json_oaicompat()into this repo, so that we can include it without building llama-server. This avoids making a breaking change, but is shitty to maintain for obvious reasons. Do we really want this repo to contain a bunch of duplicated code from llama.cpp?A third approach (implemented by this PR) is to remove all of the openai-api-compatibility stuff. This is obviously a breaking change, and the most drastic of the three approaches, but it has some advantages too. It's far simpler to maintain going forward, it fits better with the scope of this project (to be direct low-level bindings of the libllama.a API), and it lets us remove ~1000 lines of C++ code from this repo.
I feel that the Openai-compatible API stuff never fit that well into this project. If the goal is to be direct low-level bindings into the libllama.a API, then adding a bunch of stuff outside of libllama for handling specific shapes of JSON is significantly out-of-scope. IMO this project should support llama.cpp for inference, not as a webapp server.
Add placeholder parameter for MTMD functions
Upstream has added a new parameter to the mtmd functions for loading images and audio. When
placeholderis set totrue, the image is replaced with a dummy placeholder to make token-counting a bunch faster for multimodal use-cases.This PR supports it. As far as I can see, #1034 does not. Dunno if that's because they're building without the
mtmdfeature, or if it was added in a more recent version of llama.cppBuild.rs fixes
common->llama-common: Simple build.rs fix. Thecommonmodule was renamed.LLAMA_BUILD_APP=OFFto avoid building the unified binaryAdd shims for
params_fitandmemory_breakdown_printThese moved from llama.h into common, and now need shims and feature-gating.