Skip to content

[Spec][Ngram] 8/N: Add support for Per-request Trie mode#22737

Open
kpham-sgl wants to merge 20 commits intosgl-project:mainfrom
kpham-sgl:kp/trie-per-request
Open

[Spec][Ngram] 8/N: Add support for Per-request Trie mode#22737
kpham-sgl wants to merge 20 commits intosgl-project:mainfrom
kpham-sgl:kp/trie-per-request

Conversation

@kpham-sgl
Copy link
Copy Markdown
Collaborator

@kpham-sgl kpham-sgl commented Apr 14, 2026

Motivation

Part of the Ngram series #21052
Following #22538 (hard dependency on #22538, stacking on #22569)

  • Add per-request trie support for NGRAM speculative decoding so request-local matches/inserts stay isolated instead of sharing one trie across requests.
  • Keep the new design robust by using one self-owned slab/free-list allocator per trie and exposing separate capacity controls for global vs per-request modes.

Modifications

  • Added request-scoped trie management across the C++ and Python NGRAM corpus stack, including separate global/request trie handling, lookup-only request matching, unified async insert marshalling, and request cleanup on finish/abort.
  • Renamed the trie capacity knobs to speculative_ngram_trie_capacity and speculative_ngram_trie_capacity_per_request, lowered the per-request default, and updated user-facing docs/help text to match the new memory model.
  • Added focused regression coverage for allocator stability, request-mode validation, low-level insert contracts, and server-arg parsing for the new trie configuration.

Accuracy Tests

Speed Tests and Profiling

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added speculative-decoding jit-kernel documentation Improvements or additions to documentation labels Apr 14, 2026
@kpham-sgl kpham-sgl marked this pull request as ready for review April 15, 2026 00:31
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@kpham-sgl kpham-sgl changed the title [WIP][Spec][Ngram] 8/N: Add support for Per-request Trie mode [Spec][Ngram] 8/N: Add support for Per-request Trie mode Apr 15, 2026
@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/unit/spec/test_ngram_corpus.py

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/spec/test_ngram_corpus.py

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/unit/server_args/test_server_args.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

ubuntu-latest (1 test): View workflow run

cd test/ && python3 registered/unit/server_args/test_server_args.py

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation jit-kernel speculative-decoding

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant