[Spec][Ngram]: Add output-as-corpus and distractor corporas to benchmark dynamic spec tokens allocation by kpham-sgl · Pull Request #22569 · sgl-project/sglang

kpham-sgl · 2026-04-11T01:47:23Z

Motivation

Part of Ngram series #21052
Following #22538. Verify that allowing dynamic spec token allocation across Trie and SAMs bring benefit

Modifications

Add the benchmark

Added an end-to-end regression benchmark that compares three settings on the
same prompt set: trieOnly (no external corpus loaded, draft tokens come only from Trie), samOnly (load strong matching external suffix-automaton corpus and rerun the same prompts), and samPlusDistractors (keep that matching SAM, then add extra irrelevant SAM corpora). The metric is avg_spec_accept_length, i.e. the average number of speculative tokens
accepted per verify step.
On the benchmark workload, samOnly improves accept length from 2.13 to 6.64 (3.12x vs. trieOnly).
With 2 or 4 distractor SAM corpora, accept length stays at 5.90 / 5.92 (~89% of samOnly), showing the new trie/SAM ranking preserves the strong matching corpus instead of collapsing back toward the trie-only baseline.

Accuracy Tests

This is the accuracy test for #22538

Speed Tests and Profiling

This is the speed test for #22538

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist · 2026-04-11T01:47:28Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist · 2026-04-11T01:57:34Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

kpham-sgl · 2026-04-15T01:22:40Z

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

github-actions · 2026-04-15T01:23:10Z

✅ 1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

kpham-sgl · 2026-04-15T01:54:32Z

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

github-actions · 2026-04-15T01:54:58Z

✅ 1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

github-actions Bot added speculative-decoding jit-kernel labels Apr 11, 2026

kpham-sgl mentioned this pull request Apr 11, 2026

[Spec][Ngram] 7/N: Dynamically select draft token counts from SAMs and Trie #22538

Open

5 tasks

kpham-sgl changed the title ~~[Spec][Ngram]: Benchmark select draft token counts from SAMs and Trie~~ [Spec][Ngram]: Add output-as-corpus and distractor corporas to benchmark dynamic spec tokens allocation Apr 11, 2026

kpham-sgl marked this pull request as ready for review April 11, 2026 01:57

kpham-sgl requested review from BBuf, DarkSharpness, HydraQYH, Ying1123, celve, hnyls2002, merrymercy and yuan-luo as code owners April 11, 2026 01:57

kpham-sgl mentioned this pull request Apr 11, 2026

[Roadmap] Further Ngram Speculative Decoding Support #21052

Open

19 tasks

kpham-sgl added 10 commits April 13, 2026 05:55

initial commit for better sam trie merging

b3293d1

remove sam cap

54898bb

minor fix

6b8884e

clean up dead code

01f9cc4

remove hard cap for external sam budget

3edb388

nit

8ef4ab4

clean up dead code

3af36d4

remove hard cap for external sam budget

2ffe1d7

add benchmark

f19b0e9

nit comment fix

07a69a6

kpham-sgl force-pushed the kp/benchmark-trie-sam-matching-change branch from 7265a6f to 07a69a6 Compare April 13, 2026 06:17

kpham-sgl mentioned this pull request Apr 15, 2026

[Spec][Ngram] 8/N: Add support for Per-request Trie mode #22737

Open

5 tasks

new list API

3940b51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spec][Ngram]: Add output-as-corpus and distractor corporas to benchmark dynamic spec tokens allocation#22569

[Spec][Ngram]: Add output-as-corpus and distractor corporas to benchmark dynamic spec tokens allocation#22569
kpham-sgl wants to merge 11 commits intosgl-project:mainfrom
kpham-sgl:kp/benchmark-trie-sam-matching-change

kpham-sgl commented Apr 11, 2026

Uh oh!

gemini-code-assist Bot commented Apr 11, 2026

Uh oh!

gemini-code-assist Bot commented Apr 11, 2026

Uh oh!

kpham-sgl commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

kpham-sgl commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kpham-sgl commented Apr 11, 2026

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot commented Apr 11, 2026

Uh oh!

gemini-code-assist Bot commented Apr 11, 2026

Uh oh!

kpham-sgl commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

kpham-sgl commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant