Skip to content

arg : add --spec-default#22223

Merged
ggerganov merged 1 commit into
masterfrom
gg/arg-add-spec-default
Apr 21, 2026
Merged

arg : add --spec-default#22223
ggerganov merged 1 commit into
masterfrom
gg/arg-add-spec-default

Conversation

@ggerganov

Copy link
Copy Markdown
Member

Overview

Add --spec-default flag for enabling default configuration for speculative decoding.

Requirements

@ggerganov ggerganov requested a review from a team as a code owner April 21, 2026 16:32
@ggerganov ggerganov merged commit 84652b8 into master Apr 21, 2026
46 of 49 checks passed
@ggerganov ggerganov deleted the gg/arg-add-spec-default branch April 21, 2026 16:52
TheTom added a commit to TheTom/llama-cpp-turboquant that referenced this pull request Apr 22, 2026
Cherry-picks 4 upstream PRs to enable speculative decoding on hybrid
MoE+SSM architectures (Qwen3.6-35B-A3B):

- ggml-org#19493 — speculative checkpointing (save/restore recurrent state)
- ggml-org#22114 — refactor "use checkpoint" logic
- ggml-org#22168 — reset i_last on low acceptance streak
- ggml-org#22223 — add --spec-default argument

Smoke tested on M5 Max with turbo4 KV — zero regression.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TheTom added a commit to TheTom/llama-cpp-turboquant that referenced this pull request Apr 22, 2026
Cherry-picks 4 upstream PRs to enable speculative decoding on hybrid
MoE+SSM architectures (Qwen3.6-35B-A3B):

- ggml-org#19493 — speculative checkpointing (save/restore recurrent state)
- ggml-org#22114 — refactor "use checkpoint" logic
- ggml-org#22168 — reset i_last on low acceptance streak
- ggml-org#22223 — add --spec-default argument

Smoke tested on M5 Max with turbo4 KV — zero regression.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
arthw pushed a commit to arthw/llama.cpp that referenced this pull request Apr 23, 2026
rsenthilkumar6 pushed a commit to rsenthilkumar6/llama.cpp that referenced this pull request May 1, 2026
samuraieng pushed a commit to samuraieng/llama.cpp that referenced this pull request May 6, 2026
ljubomirj pushed a commit to ljubomirj/llama.cpp that referenced this pull request May 6, 2026
meh pushed a commit to meh/llama.cpp that referenced this pull request May 10, 2026
Cherry-picks 4 upstream PRs to enable speculative decoding on hybrid
MoE+SSM architectures (Qwen3.6-35B-A3B):

- ggml-org#19493 — speculative checkpointing (save/restore recurrent state)
- ggml-org#22114 — refactor "use checkpoint" logic
- ggml-org#22168 — reset i_last on low acceptance streak
- ggml-org#22223 — add --spec-default argument

Smoke tested on M5 Max with turbo4 KV — zero regression.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
meh pushed a commit to meh/llama.cpp that referenced this pull request May 10, 2026
Cherry-picks 4 upstream PRs to enable speculative decoding on hybrid
MoE+SSM architectures (Qwen3.6-35B-A3B):

- ggml-org#19493 — speculative checkpointing (save/restore recurrent state)
- ggml-org#22114 — refactor "use checkpoint" logic
- ggml-org#22168 — reset i_last on low acceptance streak
- ggml-org#22223 — add --spec-default argument

Smoke tested on M5 Max with turbo4 KV — zero regression.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Jcfunk pushed a commit to Jcfunk/llama.cpp that referenced this pull request May 13, 2026
Cherry-picks 4 upstream PRs to enable speculative decoding on hybrid
MoE+SSM architectures (Qwen3.6-35B-A3B):

- ggml-org#19493 — speculative checkpointing (save/restore recurrent state)
- ggml-org#22114 — refactor "use checkpoint" logic
- ggml-org#22168 — reset i_last on low acceptance streak
- ggml-org#22223 — add --spec-default argument

Smoke tested on M5 Max with turbo4 KV — zero regression.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Jcfunk pushed a commit to Jcfunk/llama.cpp that referenced this pull request May 13, 2026
Cherry-picks 4 upstream PRs to enable speculative decoding on hybrid
MoE+SSM architectures (Qwen3.6-35B-A3B):

- ggml-org#19493 — speculative checkpointing (save/restore recurrent state)
- ggml-org#22114 — refactor "use checkpoint" logic
- ggml-org#22168 — reset i_last on low acceptance streak
- ggml-org#22223 — add --spec-default argument

Smoke tested on M5 Max with turbo4 KV — zero regression.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
my-other-github-account pushed a commit to my-other-github-account/llama.cpp that referenced this pull request May 15, 2026
baramofme pushed a commit to baramofme/llama-cpp-turboquant that referenced this pull request May 23, 2026
fewtarius pushed a commit to fewtarius/llama.cpp that referenced this pull request May 30, 2026
wel97459 pushed a commit to wel97459/llama-cpp-turboquant that referenced this pull request Jun 4, 2026
Cherry-picks 4 upstream PRs to enable speculative decoding on hybrid
MoE+SSM architectures (Qwen3.6-35B-A3B):

- ggml-org#19493 — speculative checkpointing (save/restore recurrent state)
- ggml-org#22114 — refactor "use checkpoint" logic
- ggml-org#22168 — reset i_last on low acceptance streak
- ggml-org#22223 — add --spec-default argument

Smoke tested on M5 Max with turbo4 KV — zero regression.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
wel97459 pushed a commit to wel97459/llama-cpp-turboquant that referenced this pull request Jun 4, 2026
Cherry-picks 4 upstream PRs to enable speculative decoding on hybrid
MoE+SSM architectures (Qwen3.6-35B-A3B):

- ggml-org#19493 — speculative checkpointing (save/restore recurrent state)
- ggml-org#22114 — refactor "use checkpoint" logic
- ggml-org#22168 — reset i_last on low acceptance streak
- ggml-org#22223 — add --spec-default argument

Smoke tested on M5 Max with turbo4 KV — zero regression.

Co-Authored-By: tturney@psyguard.ai
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants