Skip to content

llama : deprecate llama_set_warmup#24009

Merged
ggerganov merged 2 commits into
masterfrom
gg/llama-remove-warmup
Jun 2, 2026
Merged

llama : deprecate llama_set_warmup#24009
ggerganov merged 2 commits into
masterfrom
gg/llama-remove-warmup

Conversation

@ggerganov

@ggerganov ggerganov commented Jun 2, 2026

Copy link
Copy Markdown
Member

Overview

cont #11571

Deprecate the functionality for pre-loading all MoE experts at the context/graph level. The user code would now have to be responsible to do the necessary warmup runs to guarantee that the weights are hot (in case that is needed by the application).

Additional information

The cparams.warmup flag changes the tensor shapes in the FFN graph. Before #23861 this wasn't causing problems because we were over-allocating outputs in the compute buffer that silently covered for the extra experts during warmup. Now after being more strict with the output allocations, the issue shows up: https://github.com/ggml-org/llama.cpp/actions/runs/26794936619/job/78989134399#step:5:3668

Generally, we want to keep the graphs as static as possible. The goal is once we reserve memory at the start to never have to reserve again (unless some relevant context parameter is modified explicitly).

Extra CI

Requirements

@ggerganov ggerganov requested a review from a team as a code owner June 2, 2026 06:06
@ggerganov ggerganov force-pushed the gg/llama-remove-warmup branch from 0665b8c to 537d4b4 Compare June 2, 2026 06:09
@github-actions github-actions Bot added the testing Everything test related label Jun 2, 2026
@am17an

am17an commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Shouldn't at least the server now do a warmup? Or will it be added separately?

@ggerganov

Copy link
Copy Markdown
Member Author

Shouldn't at least the server now do a warmup? Or will it be added separately?

It can be added to the warmup section of common_init_from_params - decode a large batch of random tokens. But it has to be opt-in. We can do this later if it is needed.

@am17an

am17an commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Also I think this will negatively impact llama-bench numbers because there will be no warmup

@ggerganov

Copy link
Copy Markdown
Member Author

Also I think this will negatively impact llama-bench numbers because there will be no warmup

No, llama-bench does it's own warmup - it never used this functionality.

Comment thread include/llama.h Outdated
[no ci]

Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>
@ggerganov ggerganov merged commit 4f3a4be into master Jun 2, 2026
1 check passed
@ggerganov ggerganov deleted the gg/llama-remove-warmup branch June 2, 2026 07:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants