Skip to content

[Spec][Ngram] 7/N: Dynamically select draft token counts from SAMs and Trie#22538

Open
kpham-sgl wants to merge 6 commits intosgl-project:mainfrom
kpham-sgl:kp/trie-sam-matching-change
Open

[Spec][Ngram] 7/N: Dynamically select draft token counts from SAMs and Trie#22538
kpham-sgl wants to merge 6 commits intosgl-project:mainfrom
kpham-sgl:kp/trie-sam-matching-change

Conversation

@kpham-sgl
Copy link
Copy Markdown
Collaborator

@kpham-sgl kpham-sgl commented Apr 10, 2026

Motivation

Part of Ngram series #21052

This PR removes the old fixed trie/SAM draft-budget split. Instead of reserving draft tokens with external_sam_budget, trie and each loaded SAM are treated as candidate sources scored by score = source_prior * (w_specificity * specificity + w_confidence * confidence) where w_specificity and w_confidence are normalized from the user-provided weights. Sources are merged in score order, and the final merged tree is capped only by num_draft_tokens.

This lets multiple external corpora participate in drafting without hard partitioning draft capacity across trie and SAMs.

Followed by #22569 which benchmarks this PR's effect on accept length across some experiments

Modifications

  • Add C++ external-corpus SAM support for Ngram, including suffix automaton
    matching, root-result merging, and multi-corpus management.
  • Wire the feature through FFI / Python / worker plumbing so external corpora
    can be loaded and used during Ngram speculative decoding.
  • Replace static trie/SAM budget allocation with score-ordered source merging.
    Remove external_sam_budget and min_trie_share; keep
    trie_source_prior, match_specificity_weight, and
    match_confidence_weight as source-ranking knobs.
  • Update unit/spec coverage for external corpus loading, multi-SAM merge
    behavior, server args, and the output-as-corpus accept-length regression.
  • Update design docs to match the non-budgeted merge model.

Accuracy Tests

Passed:

  • python3 -m pytest test/registered/unit/spec/test_ngram_corpus.py -k "TestNgramCorpusExternalSam or TestNgramCorpusMultiSam"
  • python3 -m pytest test/registered/unit/server_args/test_server_args.py -k "NgramExternalSamArgs"
  • python3 -m pytest test/registered/spec/test_ngram_speculative_decoding.py -k "TestNgramSpeculativeDecodingFlashinfer and output_as_corpus_boosts_accept_length"

See #22569 for extra benchmarks on accept length

Speed Tests and Profiling

See #22569 for extra benchmarks on accept length

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the ngram corpus matching logic to introduce a weighted budget allocation system for speculative decoding. It now distributes draft tokens between the live Trie and external Suffix Automata based on match quality metrics like specificity and confidence. The changes include new helper functions for budget allocation, updated parameter validations, and refactored buildRecency and buildFrequency methods in Trie and SuffixAutomaton to separate anchor matching from result building. An unused batchMatch overload and its FFI binding were removed. Review comments suggest improving the readability and efficiency of a sorting lambda by caching source objects and simplifying a conditional expression for better clarity.

Comment thread python/sglang/jit_kernel/csrc/ngram_corpus/ngram.cpp Outdated
Comment thread python/sglang/jit_kernel/csrc/ngram_corpus/ngram.cpp Outdated
@kpham-sgl kpham-sgl force-pushed the kp/trie-sam-matching-change branch from 92da2fe to 790b6a0 Compare April 10, 2026 22:44
@kpham-sgl kpham-sgl changed the title [WIP][Spec][Ngram] 7/N: Dynamically select draft token counts from SAMs and Trie [Spec][Ngram] 7/N: Dynamically select draft token counts from SAMs and Trie Apr 10, 2026
@kpham-sgl kpham-sgl marked this pull request as ready for review April 10, 2026 23:07
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/unit/spec/test_ngram_corpus.py

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/spec/test_ngram_corpus.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

@kpham-sgl kpham-sgl force-pushed the kp/trie-sam-matching-change branch from 790b6a0 to 8ef4ab4 Compare April 13, 2026 05:56
@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant