[Spec][Ngram] 7/N: Dynamically select draft token counts from SAMs and Trie by kpham-sgl · Pull Request #22538 · sgl-project/sglang

kpham-sgl · 2026-04-10T18:06:17Z

Motivation

Part of Ngram series #21052

This PR removes the old fixed trie/SAM draft-budget split. Instead of reserving draft tokens with external_sam_budget, trie and each loaded SAM are treated as candidate sources scored by score = source_prior * (w_specificity * specificity + w_confidence * confidence) where w_specificity and w_confidence are normalized from the user-provided weights. Sources are merged in score order, and the final merged tree is capped only by num_draft_tokens.

This lets multiple external corpora participate in drafting without hard partitioning draft capacity across trie and SAMs.

Followed by #22569 which benchmarks this PR's effect on accept length across some experiments

Modifications

Add C++ external-corpus SAM support for Ngram, including suffix automaton
matching, root-result merging, and multi-corpus management.
Wire the feature through FFI / Python / worker plumbing so external corpora
can be loaded and used during Ngram speculative decoding.
Replace static trie/SAM budget allocation with score-ordered source merging.
Remove external_sam_budget and min_trie_share; keep
trie_source_prior, match_specificity_weight, and
match_confidence_weight as source-ranking knobs.
Update unit/spec coverage for external corpus loading, multi-SAM merge
behavior, server args, and the output-as-corpus accept-length regression.
Update design docs to match the non-budgeted merge model.

Accuracy Tests

Passed:

python3 -m pytest test/registered/unit/spec/test_ngram_corpus.py -k "TestNgramCorpusExternalSam or TestNgramCorpusMultiSam"
python3 -m pytest test/registered/unit/server_args/test_server_args.py -k "NgramExternalSamArgs"
python3 -m pytest test/registered/spec/test_ngram_speculative_decoding.py -k "TestNgramSpeculativeDecodingFlashinfer and output_as_corpus_boosts_accept_length"

See #22569 for extra benchmarks on accept length

Speed Tests and Profiling

See #22569 for extra benchmarks on accept length

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review and Merge Process

Ping Merge Oncalls to start the process. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

gemini-code-assist

Code Review

This pull request refactors the ngram corpus matching logic to introduce a weighted budget allocation system for speculative decoding. It now distributes draft tokens between the live Trie and external Suffix Automata based on match quality metrics like specificity and confidence. The changes include new helper functions for budget allocation, updated parameter validations, and refactored buildRecency and buildFrequency methods in Trie and SuffixAutomaton to separate anchor matching from result building. An unused batchMatch overload and its FFI binding were removed. Review comments suggest improving the readability and efficiency of a sorting lambda by caching source objects and simplifying a conditional expression for better clarity.

gemini-code-assist · 2026-04-10T23:07:33Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

kpham-sgl · 2026-04-10T23:07:47Z

/tag-and-rerun-ci

kpham-sgl · 2026-04-10T23:09:17Z

/rerun-test test/registered/unit/spec/test_ngram_corpus.py

kpham-sgl · 2026-04-10T23:09:34Z

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

github-actions · 2026-04-10T23:09:45Z

✅ ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/spec/test_ngram_corpus.py

github-actions · 2026-04-10T23:09:59Z

✅ 1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

kpham-sgl · 2026-04-15T01:22:16Z

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

github-actions · 2026-04-15T01:22:47Z

✅ 1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

github-actions Bot added the jit-kernel label Apr 10, 2026

gemini-code-assist Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread python/sglang/jit_kernel/csrc/ngram_corpus/ngram.cpp Outdated

Comment thread python/sglang/jit_kernel/csrc/ngram_corpus/ngram.cpp Outdated

github-actions Bot added the speculative-decoding label Apr 10, 2026

kpham-sgl force-pushed the kp/trie-sam-matching-change branch from 92da2fe to 790b6a0 Compare April 10, 2026 22:44

kpham-sgl changed the title ~~[WIP][Spec][Ngram] 7/N: Dynamically select draft token counts from SAMs and Trie~~ [Spec][Ngram] 7/N: Dynamically select draft token counts from SAMs and Trie Apr 10, 2026

kpham-sgl marked this pull request as ready for review April 10, 2026 23:07

kpham-sgl requested review from BBuf, DarkSharpness, HydraQYH, Ying1123, celve, hnyls2002, merrymercy and yuan-luo as code owners April 10, 2026 23:07

github-actions Bot added the run-ci label Apr 10, 2026

This was referenced Apr 10, 2026

[Roadmap] Further Ngram Speculative Decoding Support #21052

Open

[Spec][Ngram]: Add output-as-corpus and distractor corporas to benchmark dynamic spec tokens allocation #22569

Open

kpham-sgl added 6 commits April 13, 2026 05:55

initial commit for better sam trie merging

b3293d1

remove sam cap

54898bb

minor fix

6b8884e

clean up dead code

01f9cc4

remove hard cap for external sam budget

3edb388

nit

8ef4ab4

kpham-sgl force-pushed the kp/trie-sam-matching-change branch from 790b6a0 to 8ef4ab4 Compare April 13, 2026 05:56

kpham-sgl mentioned this pull request Apr 15, 2026

[Spec][Ngram] 8/N: Add support for Per-request Trie mode #22737

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Spec][Ngram] 7/N: Dynamically select draft token counts from SAMs and Trie#22538

[Spec][Ngram] 7/N: Dynamically select draft token counts from SAMs and Trie#22538
kpham-sgl wants to merge 6 commits intosgl-project:mainfrom
kpham-sgl:kp/trie-sam-matching-change

kpham-sgl commented Apr 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kpham-sgl commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

github-actions Bot commented Apr 10, 2026

Uh oh!

kpham-sgl commented Apr 15, 2026

Uh oh!

github-actions Bot commented Apr 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kpham-sgl commented Apr 10, 2026 •

edited

Loading