Skip to content

sampling : delegate input allocation to the scheduler#19266

Merged
ggerganov merged 2 commits intomasterfrom
gg/backend-sampling-fix-inp-allocation
Feb 3, 2026
Merged

sampling : delegate input allocation to the scheduler#19266
ggerganov merged 2 commits intomasterfrom
gg/backend-sampling-fix-inp-allocation

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Feb 2, 2026

fix #18622
alt #18636

  • Merge the sampler inputs into the main graph. This way the backend scheduler is responsible for allocating the memory which makes backend sampling compatible with pipeline parallelism
  • Utilize ggml_build_forward_select() in llm_graph_context::build_sampling() to avoid computing the samplers when not needed

@ggerganov ggerganov force-pushed the gg/backend-sampling-fix-inp-allocation branch from c4d5b0f to 7f58cca Compare February 3, 2026 11:01
@ggerganov ggerganov mentioned this pull request Feb 3, 2026
1 task
@ggerganov ggerganov marked this pull request as ready for review February 3, 2026 14:12
@ggerganov ggerganov requested a review from CISC as a code owner February 3, 2026 14:12
@ggerganov ggerganov requested a review from danbev February 3, 2026 14:13
@ggerganov ggerganov merged commit faa1bc2 into master Feb 3, 2026
76 of 78 checks passed
@ggerganov ggerganov deleted the gg/backend-sampling-fix-inp-allocation branch February 3, 2026 20:16
agent-enemy-2 pushed a commit to agent-enemy-2/llama.cpp that referenced this pull request Feb 4, 2026
* sampling : delegate input allocation to the scheduler

* graph : compute backend samplers only if needed
liparetejas pushed a commit to liparetejas/llama.cpp that referenced this pull request Feb 23, 2026
* sampling : delegate input allocation to the scheduler

* graph : compute backend samplers only if needed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Eval bug: Segmentation fault with -bs with multiple GPUs

2 participants