[Perf] Make EAGLE bigram key an O(1) view on RadixKey#23106
Merged
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
5 tasks
ispobock
approved these changes
Apr 19, 2026
RadixKey
Collaborator
Author
|
/rerun-test test_swa_unittest.py test_radix_cache_unit.py test_unified_radix_cache_unittest.py test_mamba_unittest.py test_radix_cache_slru_accuracy.py test_swa_eviction_boundary.py test_unified_radix_cache_bench.py test_eagle3_basic.py test_eagle_infer_a.py test_eagle_infer_beta.py |
… into lsyin/bigram-optimize
Contributor
|
✅ ✅ |
zhangying098
pushed a commit
to zhangying098/sglang
that referenced
this pull request
Apr 23, 2026
kyx1999
pushed a commit
to KMSorSMS/sglang
that referenced
this pull request
Apr 27, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
List[Tuple[int, int]]materialization with a bigram view over rawtoken_idscontrolled by a newis_bigramflag onRadixKey.convert_to_bigram_keyis no longer called on thecache_finished_req/cache_unfinished_reqhot path; key construction and slicing are now O(1).Background
List[Tuple[int, int]]produced byconvert_to_bigram_key; every insert/match allocated N−1 Python tuples.SWARadixCache.cache_unfinished_req,convert_to_bigram_keyalone took ~138 ms (insideinsert~84 ms + inside the follow-upmatch_prefix~54 ms), dominating the step.Change
RadixKeystores rawtoken_ids: List[int]in both plain and bigram modes.is_bigram=Truemakes__len__,__iter__,__getitem__, and newpage_alignedexpose bigram semantics over the raw list; a slice over bigrams[a, b)returns a view over raw tokens[a, b+1]so adjacent slices share one boundary token._key_match_page_size1/_key_match_pagedcompare raw ints position-by-position and convert matched-tokens → matched-bigrams withmax(0, i - 1)(then round to page size for paged mode).get_child_keybuilds a single bigram tuple (orpage_sizetuples) as the dict key — unchanged in behavior, but no longer per-token allocation.compute_node_hash_valueshashes overlapping(t_i, t_{i+1})byte pairs directly from raw tokens, producing the same SHA256 stream as before for backwards-compatible hashes.page_align_keysgains anis_bigramflag to keep one extra boundary token when aligning on bigram count.radix_cache.py,swa_radix_cache.py, andunified_radix_cache.pydrop the tuple-materialization path;maybe_to_bigram_viewbecomes the O(1) entry normalizer for externally-constructed keys.hiradix_cache.pystorage path keepsconvert_to_bigram_key(external hash-key compatibility).Numbers (CPU microbench, 1M tokens)
convertstep: ~47 ms → ~0 ms (> 100,000× for this sub-operation).match(full prefix): 113 ms → 16 ms (~7×).cache_unfinished_req-like cycle: 70 ms → 23 ms (~3×).Test plan
test/registered/unit/mem_cache/test_swa_unittest.pytest/registered/unit/mem_cache/test_radix_cache_unit.pytest/registered/unit/mem_cache/test_unified_radix_cache_unittest.py