llama : deprecate `llama_set_warmup` by ggerganov · Pull Request #24009 · ggml-org/llama.cpp

ggerganov · 2026-06-02T06:06:46Z

Overview

Deprecate the functionality for pre-loading all MoE experts at the context/graph level. The user code would now have to be responsible to do the necessary warmup runs to guarantee that the weights are hot (in case that is needed by the application).

Additional information

The cparams.warmup flag changes the tensor shapes in the FFN graph. Before #23861 this wasn't causing problems because we were over-allocating outputs in the compute buffer that silently covered for the extra experts during warmup. Now after being more strict with the output allocations, the issue shows up: https://github.com/ggml-org/llama.cpp/actions/runs/26794936619/job/78989134399#step:5:3668

Generally, we want to keep the graphs as static as possible. The goal is once we reserve memory at the start to never have to reserve again (unless some relevant context parameter is modified explicitly).

Extra CI

https://github.com/ggml-org/llama.cpp/actions/runs/26802314741

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

am17an · 2026-06-02T06:26:42Z

Shouldn't at least the server now do a warmup? Or will it be added separately?

ggerganov · 2026-06-02T06:39:44Z

Shouldn't at least the server now do a warmup? Or will it be added separately?

It can be added to the warmup section of common_init_from_params - decode a large batch of random tokens. But it has to be opt-in. We can do this later if it is needed.

am17an · 2026-06-02T06:41:17Z

Also I think this will negatively impact llama-bench numbers because there will be no warmup

ggerganov · 2026-06-02T06:42:02Z

Also I think this will negatively impact llama-bench numbers because there will be no warmup

No, llama-bench does it's own warmup - it never used this functionality.

[no ci] Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>

ggerganov requested a review from a team as a code owner June 2, 2026 06:06

llama : deprecate llama_set_warmup

537d4b4

ggerganov force-pushed the gg/llama-remove-warmup branch from 0665b8c to 537d4b4 Compare June 2, 2026 06:09

github-actions Bot added the testing Everything test related label Jun 2, 2026

danbev approved these changes Jun 2, 2026

View reviewed changes

Comment thread include/llama.h Outdated

ggerganov mentioned this pull request Jun 2, 2026

speculative : fix n_outputs_max and remove draft-simple auto-enable #23988

Merged

1 task

cont : fix type

91e032a

[no ci] Co-authored-by: Daniel Bevenius <daniel.bevenius@gmail.com>

ggerganov merged commit 4f3a4be into master Jun 2, 2026
1 check passed

ggerganov deleted the gg/llama-remove-warmup branch June 2, 2026 07:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama : deprecate `llama_set_warmup`#24009

llama : deprecate `llama_set_warmup`#24009
ggerganov merged 2 commits into
masterfrom
gg/llama-remove-warmup

ggerganov commented Jun 2, 2026 •

edited

Loading

Uh oh!

am17an commented Jun 2, 2026

Uh oh!

ggerganov commented Jun 2, 2026

Uh oh!

am17an commented Jun 2, 2026 •

edited

Loading

Uh oh!

ggerganov commented Jun 2, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ggerganov commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Extra CI

Requirements

Uh oh!

am17an commented Jun 2, 2026

Uh oh!

ggerganov commented Jun 2, 2026

Uh oh!

am17an commented Jun 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Jun 2, 2026

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ggerganov commented Jun 2, 2026 •

edited

Loading

am17an commented Jun 2, 2026 •

edited

Loading