Skip to content

[Perf] Make EAGLE bigram key an O(1) view on RadixKey#23106

Merged
hnyls2002 merged 12 commits intomainfrom
lsyin/bigram-optimize
Apr 20, 2026
Merged

[Perf] Make EAGLE bigram key an O(1) view on RadixKey#23106
hnyls2002 merged 12 commits intomainfrom
lsyin/bigram-optimize

Conversation

@hnyls2002
Copy link
Copy Markdown
Collaborator

@hnyls2002 hnyls2002 commented Apr 17, 2026

Summary

  • Replace the O(N) List[Tuple[int, int]] materialization with a bigram view over raw token_ids controlled by a new is_bigram flag on RadixKey.
  • convert_to_bigram_key is no longer called on the cache_finished_req / cache_unfinished_req hot path; key construction and slicing are now O(1).

Background

  • EAGLE radix keys have historically been stored as List[Tuple[int, int]] produced by convert_to_bigram_key; every insert/match allocated N−1 Python tuples.
  • At 1M context in SWARadixCache.cache_unfinished_req, convert_to_bigram_key alone took ~138 ms (inside insert ~84 ms + inside the follow-up match_prefix ~54 ms), dominating the step.

Change

  • RadixKey stores raw token_ids: List[int] in both plain and bigram modes. is_bigram=True makes __len__, __iter__, __getitem__, and new page_aligned expose bigram semantics over the raw list; a slice over bigrams [a, b) returns a view over raw tokens [a, b+1] so adjacent slices share one boundary token.
  • _key_match_page_size1 / _key_match_paged compare raw ints position-by-position and convert matched-tokens → matched-bigrams with max(0, i - 1) (then round to page size for paged mode).
  • get_child_key builds a single bigram tuple (or page_size tuples) as the dict key — unchanged in behavior, but no longer per-token allocation.
  • compute_node_hash_values hashes overlapping (t_i, t_{i+1}) byte pairs directly from raw tokens, producing the same SHA256 stream as before for backwards-compatible hashes.
  • page_align_keys gains an is_bigram flag to keep one extra boundary token when aligning on bigram count.
  • Call sites in radix_cache.py, swa_radix_cache.py, and unified_radix_cache.py drop the tuple-materialization path; maybe_to_bigram_view becomes the O(1) entry normalizer for externally-constructed keys. hiradix_cache.py storage path keeps convert_to_bigram_key (external hash-key compatibility).

Numbers (CPU microbench, 1M tokens)

  • convert step: ~47 ms → ~0 ms (> 100,000× for this sub-operation).
  • match (full prefix): 113 ms → 16 ms (~7×).
  • Full cache_unfinished_req-like cycle: 70 ms → 23 ms (~3×).

Test plan

  • test/registered/unit/mem_cache/test_swa_unittest.py
  • test/registered/unit/mem_cache/test_radix_cache_unit.py
  • test/registered/unit/mem_cache/test_unified_radix_cache_unittest.py
  • EAGLE + chunked prefill + long-context e2e on GPU

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@hnyls2002 hnyls2002 changed the title Optimize bigram handling in radix cache [Perf] Make EAGLE bigram key an O(1) view on RadixKey Apr 20, 2026
@hnyls2002
Copy link
Copy Markdown
Collaborator Author

hnyls2002 commented Apr 20, 2026

/rerun-test test_swa_unittest.py test_radix_cache_unit.py test_unified_radix_cache_unittest.py test_mamba_unittest.py test_radix_cache_slru_accuracy.py test_swa_eviction_boundary.py test_unified_radix_cache_bench.py test_eagle3_basic.py test_eagle_infer_a.py test_eagle_infer_beta.py
(4 tries)

@sgl-project sgl-project deleted a comment from github-actions Bot Apr 20, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 20, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 20, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 20, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 20, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 20, 2026
@sgl-project sgl-project deleted a comment from github-actions Bot Apr 20, 2026
@github-actions
Copy link
Copy Markdown
Contributor

1-gpu-h100 (3 tests): View workflow run

cd test/ && python3 registered/unit/mem_cache/test_swa_unittest.py
cd test/ && python3 registered/unit/mem_cache/test_swa_eviction_boundary.py
cd test/ && python3 registered/spec/eagle/test_eagle_infer_a.py

1-gpu-5090 (7 tests): View workflow run

cd test/ && python3 registered/unit/mem_cache/test_radix_cache_unit.py
cd test/ && python3 registered/unit/mem_cache/test_unified_radix_cache_unittest.py
cd test/ && python3 registered/unit/mem_cache/test_mamba_unittest.py
cd test/ && python3 registered/unit/mem_cache/test_radix_cache_slru_accuracy.py
cd test/ && python3 registered/unit/mem_cache/test_unified_radix_cache_bench.py
cd test/ && python3 registered/spec/eagle/test_eagle3_basic.py
cd test/ && python3 registered/spec/eagle/test_eagle_infer_beta.py

@hnyls2002 hnyls2002 merged commit 8cb957c into main Apr 20, 2026
139 of 171 checks passed
@hnyls2002 hnyls2002 deleted the lsyin/bigram-optimize branch April 20, 2026 19:01
zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026
kyx1999 pushed a commit to KMSorSMS/sglang that referenced this pull request Apr 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants