arg : add --spec-default by ggerganov · Pull Request #22223 · ggml-org/llama.cpp

ggerganov · 2026-04-21T16:32:52Z

Overview

Add --spec-default flag for enabling default configuration for speculative decoding.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

Cherry-picks 4 upstream PRs to enable speculative decoding on hybrid MoE+SSM architectures (Qwen3.6-35B-A3B): - ggml-org#19493 — speculative checkpointing (save/restore recurrent state) - ggml-org#22114 — refactor "use checkpoint" logic - ggml-org#22168 — reset i_last on low acceptance streak - ggml-org#22223 — add --spec-default argument Smoke tested on M5 Max with turbo4 KV — zero regression. Co-Authored-By: tturney@psyguard.ai Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

arg : add --spec-default

6f44fe7

ggerganov requested a review from a team as a code owner April 21, 2026 16:32

pwilkin approved these changes Apr 21, 2026

View reviewed changes

danbev approved these changes Apr 21, 2026

View reviewed changes

ggerganov merged commit 84652b8 into master Apr 21, 2026
46 of 49 checks passed

ggerganov deleted the gg/arg-add-spec-default branch April 21, 2026 16:52

chad-loder mentioned this pull request Apr 21, 2026

Sync upstream: speculative checkpointing for hybrid models TheTom/llama-cpp-turboquant#100

Closed

ggerganov mentioned this pull request Apr 22, 2026

Feature request: enable speculative decoding by default ggml-org/LlamaBarn#83

Closed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026

arg : add --spec-default (ggml-org#22223)

25ffaae

rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026

arg : add --spec-default (ggml-org#22223)

788084c

samuraieng pushed a commit to samuraieng/llama.cpp that referenced this pull request May 6, 2026

arg : add --spec-default (ggml-org#22223)

6f957ed

ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026

arg : add --spec-default (ggml-org#22223)

76048d8

ordokr mentioned this pull request May 13, 2026

Speculative decoding: ~130× decode regression on CUDA + turbo3 KV (RTX 5090, Qwen3.6-27B-Q6_K) despite 100% draft acceptance TheTom/llama-cpp-turboquant#143

Open

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

arg : add --spec-default (ggml-org#22223)

f5ebb5f

my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026

arg : add --spec-default (ggml-org#22223)

ce90220

baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026

arg : add --spec-default (ggml-org#22223)

51b6960

fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026

arg : add --spec-default (ggml-org#22223)

3648951

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

arg : add --spec-default#22223

arg : add --spec-default#22223
ggerganov merged 1 commit into
masterfrom
gg/arg-add-spec-default

ggerganov commented Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ggerganov commented Apr 21, 2026

Overview

Requirements

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants