Explicitly set cur to res->t_embd in PLaMo2 models#16766
Merged
CISC merged 1 commit intoggml-org:masterfrom Oct 25, 2025
Merged
Explicitly set cur to res->t_embd in PLaMo2 models#16766CISC merged 1 commit intoggml-org:masterfrom
CISC merged 1 commit intoggml-org:masterfrom
Conversation
wqerrewetw
added a commit
to wqerrewetw/llama.cpp
that referenced
this pull request
Oct 25, 2025
* model-conversion : add trust_remote_code for orig model run [no ci] (ggml-org#16751) This commit add the trust_remote_code=True argument when loading models using AutoConfig, AutoTokenizer, and AutoModelForCausalLM for the run original model script. The motivation for this is that some models require custom code to be loaded properly, and setting trust_remote_code=True avoids a prompt asking for user confirmation: ```console (venv) $ make causal-run-original-model The repository /path/to/model contains custom code which must be executed to correctly load the model. You can inspect the repository content at /path/to/model. Do you wish to run the custom code? [y/N] N ``` Having this as the default seems like a safe choice as we have to clone or download the models we convert and would be expecting to run any custom code they have. * webui: support q URL parameter (ggml-org#16728) * webui: support q URL parameter Fixes ggml-org#16722 I’ve checked that it works with Firefox’s AI tools * webui: apply suggestions from code review Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * chore: update webui static build --------- Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> * CUDA: use CUB for arbitary size argsort (ggml-org#16754) * ggml: fix CUDA grid launch condition for large block_nums.y in binbcast (ggml-org#16742) * Fix CUDA grid launch condition for large block_nums.y * add backend ops test * reduce test repetitions * convert : avoid dequantizing mxfp4 for GPT-OSS (ggml-org#16756) * vulkan: Optimize SSM_SCAN (ggml-org#16645) * vulkan: delete dead code (ggml-org#16732) ggml_vk_create_buffer_temp is not used anywhere, and it is the only caller for ggml_vk_pool_malloc. Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> * model : set res->t_embd in PLaMo2 models (ggml-org#16766) --------- Signed-off-by: Giuseppe Scrivano <gscrivan@redhat.com> Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com> Co-authored-by: Florian Badie <florianbadie@odrling.xyz> Co-authored-by: Aleksander Grygier <aleksander.grygier@gmail.com> Co-authored-by: Aman Gupta <amangupta052@gmail.com> Co-authored-by: leejet <leejet714@gmail.com> Co-authored-by: compilade <git@compilade.net> Co-authored-by: Jeff Bolz <jbolz@nvidia.com> Co-authored-by: Giuseppe Scrivano <gscrivan@redhat.com> Co-authored-by: Shunta Saito <shunta.saito@gmail.com>
Anico2
added a commit
to Anico2/llama.cpp
that referenced
this pull request
Jan 15, 2026
blime4
referenced
this pull request
in blime4/llama.cpp
Feb 5, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR adds a line to explicitly set
res->t_embdwithcurin the PLaMo2 model implementation.Without this fix, the ollama always crashes because params.embeddings is hard-coded with
trueat ollama/llama/llama.go:L120 and it results in a call of this assertion:https://github.com/ollama/ollama/blob/ad6f6a1d29f45a5c7266bcd7edb5671621e86810/llama/llama.cpp/src/llama-graph.cpp#L1894
then it fails because
res->t_embdhere:https://github.com/ollama/ollama/blob/ad6f6a1d29f45a5c7266bcd7edb5671621e86810/llama/llama.cpp/src/llama-graph.cpp#L1882
is null with the current code of PLaMo2 model.
This PR intends to fix this problem.