Skip to content

[Spec][Ngram] 1/N: Reference based Speculative Decoding refactor#20393

Merged
hnyls2002 merged 12 commits intosgl-project:mainfrom
kpham-sgl:kp/ref-based-spec-dec-refactor
Mar 22, 2026
Merged

[Spec][Ngram] 1/N: Reference based Speculative Decoding refactor#20393
hnyls2002 merged 12 commits intosgl-project:mainfrom
kpham-sgl:kp/ref-based-spec-dec-refactor

Conversation

@kpham-sgl
Copy link
Copy Markdown
Collaborator

@kpham-sgl kpham-sgl commented Mar 12, 2026

Motivation

1/N: Refactor the monolithic C++ Ngram class into a template-based architecture to enable pluggable cache backends (e.g., Suffix Automaton in 2+/N), no behavioural changes (except [NEW] tag).

Modifications

  • Extract Result, Node, fillResult() into result.h/.cpp (shared across backends)

  • Extract TrieNode + Trie into trie.h/.cpp (trie data structure + tree building: insert, buildRecency, buildFrequency, squeeze, reset)

  • Convert Ngram into class Ngram — thin concurrency wrapper that holds both SAM (in later PRs) - for corpus and prefix and running Trie - for decoded tokens

  • Add comprehensive test suite (test_ngram_corpus.py): golden-output tests for BFS/PROB modes, reset, squeeze/eviction, batch consistency, mask invariants, frequency boosting, recency ordering, etc.

  • [NEW] Wire match_type parameter through ngram_worker.py

  • [NEW] Update docs for the above parameter

Future plans

#21052

Accuracy Tests

Pass

  • python -m pytest test/registered/spec/utils/test_ngram_cache.py -v
  • python -m pytest test/registered/spec/test_ngram_speculative_decoding.py::TestNgramSpeculativeDecodingTriton -xvs

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@kpham-sgl kpham-sgl changed the title [WIP][Spec] 1/N: kp/ref based spec dec refactor [WIP][Spec] 1/N: Reference based Speculative Decoding refactor Mar 12, 2026
@kpham-sgl kpham-sgl force-pushed the kp/ref-based-spec-dec-refactor branch from edc131f to e35bd8b Compare March 12, 2026 01:15
@kpham-sgl kpham-sgl force-pushed the kp/ref-based-spec-dec-refactor branch from 209d17d to 4ef7157 Compare March 13, 2026 18:38
@kpham-sgl kpham-sgl changed the title [WIP][Spec] 1/N: Reference based Speculative Decoding refactor [Spec] 1/N: Reference based Speculative Decoding refactor Mar 13, 2026
@github-actions github-actions Bot added documentation Improvements or additions to documentation speculative-decoding npu labels Mar 13, 2026
@kpham-sgl kpham-sgl marked this pull request as ready for review March 13, 2026 19:09
@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

| `--speculative-ngram-min-bfs-breadth` | The minimum breadth for BFS (Breadth-First Search) in ngram speculative decoding. | `1` | Type: int |
| `--speculative-ngram-max-bfs-breadth` | The maximum breadth for BFS (Breadth-First Search) in ngram speculative decoding. | `10` | Type: int |
| `--speculative-ngram-match-type` | The match type for cache tree. | `BFS` | `BFS`, `PROB` |
| `--speculative-ngram-match-type` | Ngram tree-building mode. `BFS` selects recency-based expansion and `PROB` selects frequency-based expansion. This setting is forwarded to the ngram cache implementation. | `BFS` | `BFS`, `PROB` |
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Open to put these doc change in a separate PR

Comment thread python/sglang/srt/speculative/cpp_ngram/ngram.cpp Outdated
Comment thread test/registered/spec/utils/test_ngram_cache.py Outdated
@hnyls2002
Copy link
Copy Markdown
Collaborator

@kpham-sgl This PR generally looks good, only does some migration and cleanup. For the next PRs:

  • rename branch_length -> max_trie_depth
  • Remove max_match_window_size and min_match_window_size. We will match as much as possible in the trie.
  • Do not do O(max_depth ^ 2) matching, maintain a list of matching status, then update the matching status by O(1) for each anchor
  • There is a racing condition: when the queue is empty, the insertion may not be done. Please solve that.
  • Current spin-wait is wrong, which still consumes CPU resources. Use conditional variables for that to avoid busy waiting.

@kpham-sgl kpham-sgl requested a review from hnyls2002 March 20, 2026 21:13
@kpham-sgl kpham-sgl changed the title [Spec] 1/N: Reference based Speculative Decoding refactor [Spec][Ngram] 1/N: Reference based Speculative Decoding refactor Mar 20, 2026
- Rename module/extension (ngram_corpus_cpp), sources, and TrieCache→Trie
- Update ngram_worker and registered tests

Made-with: Cursor
Align pybind name and Python attribute with actual semantics:
Ngram is a concurrency wrapper, not a trie.
@hnyls2002
Copy link
Copy Markdown
Collaborator

hnyls2002 commented Mar 22, 2026

/rerun-ut test_ngram_corpus.py

@hnyls2002
Copy link
Copy Markdown
Collaborator

hnyls2002 commented Mar 22, 2026

/rerun-ut test_ngram_speculative_decoding.py

@sgl-project sgl-project deleted a comment from github-actions Bot Mar 22, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Mar 22, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Mar 22, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Mar 22, 2026
@github-actions
Copy link
Copy Markdown
Contributor

❌ No test file found matching test_ngram_corpus.py under test/registered/.

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered /rerun-ut on 1-gpu-h100 runner:

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

🔗 View workflow run

@hnyls2002 hnyls2002 merged commit 6d160b4 into sgl-project:main Mar 22, 2026
36 of 76 checks passed
JustinTong0323 pushed a commit to JustinTong0323/sglang that referenced this pull request Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants