[model-gateway] Fix tokenizer L0 cache key collision on add_special_tokens#17189
[model-gateway] Fix tokenizer L0 cache key collision on add_special_tokens#17189ppraneth wants to merge 9 commits intosgl-project:mainfrom
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request correctly fixes a cache key collision bug in the L0 tokenizer cache by including the add_special_tokens flag in the cache key. The changes are logical and well-tested, with a new integration test that effectively validates the fix. My main feedback is a performance concern regarding the implementation of the new cache key. The current approach introduces a string allocation on every cache lookup, which could be a bottleneck. I've left a detailed comment with suggestions on how to address this to maintain the cache's high performance.
|
@slin1237 Can you take a look at this pr? |
Motivation
The
CachedTokenizer's L0 cache layer was previously using only the raw input string as a lookup key. This caused a key collision when the same string was tokenized with different values for theadd_special_tokensflag (e.g., once for an embedding request where special tokens are required, and once for a chat length check where they are omitted). The second call would incorrectly return the cached result of the first, leading to incorrect token IDs and counts.Modifications
src/tokenizer/cache/l0.rs: Updated the internalDashMapto use a composite key of(String, bool)representing the input text and theadd_special_tokensflag. Updatedget,insert, andinsert_arcmethods to handle this new key format.src/tokenizer/cache/mod.rs: Modified theencodemethod to pass theadd_special_tokensflag into the L0 cache lookup and insertion logic. Updated documentation comments to reflect that the cache is now parameter-aware.l0.rsto reflect the new method signatures.Accuracy Tests
Created a new integration test
tests/tokenizer/cache_collision_test.rsto verify the fix. The test confirms:add_special_tokens=truefollowed byadd_special_tokens=falseresults in two distinct cache misses, proving they no longer collide.true) results in a cache hit.Test Result:
test tokenizer::cache_collision_test::test_l0_cache_key_collision ... okBenchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci