common : add common_speculative_is_compat() by ggerganov · Pull Request #19270 · ggml-org/llama.cpp

ggerganov · 2026-02-02T13:20:21Z

fix #19267

Memory modules that do not support removing the last tokens from the context (such as recurrent modules) cannot perform speculative decoding. Add new common_speculative_is_compat() to query this functionality and use it in llama-server to disable speculative decoding for those contexts.

ngxson

Just wondering if we can do this inside common_speculative_init instead.

For example, common_speculative_init can try to evaluate 2 tokens, then remove the first one. If llama_memory_seq_rm returns error, then we throw an error saying the model is not compatible.

Btw, I think it's better to throw an error and exit, rather than a warning.

coder543 · 2026-02-03T16:37:08Z

I just ran into #19267, and it would be cool if there were a way to make this compatible rather than just disabling it, but disabling it is better than crashing. With Qwen3-Coder-Next, ngram-mod could provide large speedups during coding workflows.

This reverts commit d30e59b.

ggerganov · 2026-02-04T11:22:49Z

@ngxson Implemented this idea in a new common_speculative_is_compat() helper function.

Btw, I think it's better to throw an error and exit, rather than a warning.

Do you have something specific in mind? In my server config, I want to set a default ngram-based spec decoding and have it applied for all routed models. When a routed model does not support it, it still continues to work. So I think a warning is better.

ggerganov · 2026-02-06T13:48:15Z

@ngxson Gentle ping

ngxson

Yeah sorry I missed the notif. LGTM!

Just wondering if we should also do the same check for draft model.

ggerganov · 2026-02-06T14:47:18Z

Yes, I think we can do that. Will follow up in next PR.

* llama : add llama_memory_can_rm_suffix() * Revert "llama : add llama_memory_can_rm_suffix()" This reverts commit d30e59b. * spec : check if the target context is compatible for spec decoding

llama : add llama_memory_can_rm_suffix()

d30e59b

ggerganov requested a review from ngxson as a code owner February 2, 2026 13:20

github-actions Bot added examples server labels Feb 2, 2026

danbev approved these changes Feb 2, 2026

View reviewed changes

loci-dev mentioned this pull request Feb 2, 2026

UPSTREAM PR #19270: llama : add llama_memory_can_rm_suffix() auroralabs-loci/llama.cpp#1136

Open

ngxson reviewed Feb 2, 2026

View reviewed changes

ggerganov added 2 commits February 4, 2026 13:11

Revert "llama : add llama_memory_can_rm_suffix()"

1f8d0c8

This reverts commit d30e59b.

spec : check if the target context is compatible for spec decoding

46c3bb1

ggerganov changed the title ~~llama : add llama_memory_can_rm_suffix()~~ common : add common_speculative_is_compat() Feb 4, 2026

ggerganov requested a review from ngxson February 5, 2026 08:08

ngxson approved these changes Feb 6, 2026

View reviewed changes

ggerganov merged commit dfde599 into master Feb 6, 2026
70 of 78 checks passed

ggerganov deleted the gg/spec-disable-for-recurrent branch February 6, 2026 14:47

srogmann mentioned this pull request Feb 10, 2026

server : speculative checkpointing #19493

Merged

treo mentioned this pull request Apr 15, 2026

Bug: Speculative Decoding with Qwen 3.5 creates corrupted outputs ikawrakow/ik_llama.cpp#1639

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

common : add common_speculative_is_compat()#19270

common : add common_speculative_is_compat()#19270
ggerganov merged 3 commits into
masterfrom
gg/spec-disable-for-recurrent

ggerganov commented Feb 2, 2026 •

edited

Loading

Uh oh!

ngxson left a comment

Uh oh!

coder543 commented Feb 3, 2026

Uh oh!

ggerganov commented Feb 4, 2026

Uh oh!

ggerganov commented Feb 6, 2026

Uh oh!

ngxson left a comment

Uh oh!

ggerganov commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

ggerganov commented Feb 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

coder543 commented Feb 3, 2026

Uh oh!

ggerganov commented Feb 4, 2026

Uh oh!

ggerganov commented Feb 6, 2026

Uh oh!

ngxson left a comment

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Feb 6, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ggerganov commented Feb 2, 2026 •

edited

Loading