Skip to content

[Spec][Ngram]: Add output-as-corpus and distractor corporas to benchmark dynamic spec tokens allocation#22569

Open
kpham-sgl wants to merge 11 commits intosgl-project:mainfrom
kpham-sgl:kp/benchmark-trie-sam-matching-change
Open

[Spec][Ngram]: Add output-as-corpus and distractor corporas to benchmark dynamic spec tokens allocation#22569
kpham-sgl wants to merge 11 commits intosgl-project:mainfrom
kpham-sgl:kp/benchmark-trie-sam-matching-change

Conversation

@kpham-sgl
Copy link
Copy Markdown
Collaborator

Motivation

Part of Ngram series #21052
Following #22538. Verify that allowing dynamic spec token allocation across Trie and SAMs bring benefit

Modifications

Add the benchmark

  • Added an end-to-end regression benchmark that compares three settings on the
    same prompt set: trieOnly (no external corpus loaded, draft tokens come only from Trie), samOnly (load strong matching external suffix-automaton corpus and rerun the same prompts), and samPlusDistractors (keep that matching SAM, then add extra irrelevant SAM corpora). The metric is avg_spec_accept_length, i.e. the average number of speculative tokens
    accepted per verify step.
  • On the benchmark workload, samOnly improves accept length from 2.13 to 6.64 (3.12x vs. trieOnly).
  • With 2 or 4 distractor SAM corpora, accept length stays at 5.90 / 5.92 (~89% of samOnly), showing the new trie/SAM ranking preserves the strong matching corpus instead of collapsing back toward the trie-only baseline.

Accuracy Tests

This is the accuracy test for #22538

Speed Tests and Profiling

This is the speed test for #22538

Checklist

Review and Merge Process

  1. Ping Merge Oncalls to start the process. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • Common commands include /tag-and-rerun-ci, /tag-run-ci-label, /rerun-failed-ci
  4. After green CI and required approvals, ask Merge Oncalls or people with Write permission to merge the PR.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@kpham-sgl kpham-sgl changed the title [Spec][Ngram]: Benchmark select draft token counts from SAMs and Trie [Spec][Ngram]: Add output-as-corpus and distractor corporas to benchmark dynamic spec tokens allocation Apr 11, 2026
@kpham-sgl kpham-sgl marked this pull request as ready for review April 11, 2026 01:57
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@kpham-sgl kpham-sgl force-pushed the kp/benchmark-trie-sam-matching-change branch from 7265a6f to 07a69a6 Compare April 13, 2026 06:17
@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

@kpham-sgl
Copy link
Copy Markdown
Collaborator Author

/rerun-test test/registered/spec/test_ngram_speculative_decoding.py

@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/spec/test_ngram_speculative_decoding.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant