[model-gateway] Fix tokenizer L0 cache key collision on add_special_tokens by ppraneth · Pull Request #17189 · sgl-project/sglang

ppraneth · 2026-01-16T06:09:50Z

Motivation

The CachedTokenizer's L0 cache layer was previously using only the raw input string as a lookup key. This caused a key collision when the same string was tokenized with different values for the add_special_tokens flag (e.g., once for an embedding request where special tokens are required, and once for a chat length check where they are omitted). The second call would incorrectly return the cached result of the first, leading to incorrect token IDs and counts.

Modifications

src/tokenizer/cache/l0.rs: Updated the internal DashMap to use a composite key of (String, bool) representing the input text and the add_special_tokens flag. Updated get, insert, and insert_arc methods to handle this new key format.
src/tokenizer/cache/mod.rs: Modified the encode method to pass the add_special_tokens flag into the L0 cache lookup and insertion logic. Updated documentation comments to reflect that the cache is now parameter-aware.
Tests: Updated unit tests in l0.rs to reflect the new method signatures.

Accuracy Tests

Created a new integration test tests/tokenizer/cache_collision_test.rs to verify the fix. The test confirms:

Isolation: Tokenizing the same string with add_special_tokens=true followed by add_special_tokens=false results in two distinct cache misses, proving they no longer collide.
Correctness: A third call using a previously cached combination (e.g., true) results in a cache hit.

Test Result:

test tokenizer::cache_collision_test::test_l0_cache_key_collision ... ok

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-01-16T06:09:53Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ppraneth · 2026-01-16T10:20:35Z

/gemini review

gemini-code-assist

Code Review

This pull request correctly fixes a cache key collision bug in the L0 tokenizer cache by including the add_special_tokens flag in the cache key. The changes are logical and well-tested, with a new integration test that effectively validates the fix. My main feedback is a performance concern regarding the implementation of the new cache key. The current approach introduces a string allocation on every cache lookup, which could be a bottleneck. I've left a detailed comment with suggestions on how to address this to maintain the cache's high performance.

ppraneth · 2026-01-16T10:40:36Z

@slin1237 Can you take a look at this pr?

ppraneth added 6 commits January 16, 2026 10:31

test

7bcb48a

test

8042528

test

3be33a9

test

2aef65a

fix

7a047a9

fix

d26f573

ppraneth requested review from CatherineSue and slin1237 as code owners January 16, 2026 06:09

Merge branch 'main' into collision

bda8e06

github-actions Bot added the model-gateway label Jan 16, 2026

Merge branch 'main' into collision

7691afa

gemini-code-assist Bot reviewed Jan 16, 2026

View reviewed changes

Comment thread sgl-model-gateway/src/tokenizer/cache/l0.rs

Merge branch 'main' into collision

32e0421

monkeyLoveding mentioned this pull request Jan 23, 2026

[NPU] Adapt cann 8.5: use sfa and lightning indexer op from cann and CI update #17615

Merged

ppraneth closed this May 8, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model-gateway] Fix tokenizer L0 cache key collision on add_special_tokens#17189

[model-gateway] Fix tokenizer L0 cache key collision on add_special_tokens#17189
ppraneth wants to merge 9 commits intosgl-project:mainfrom
ppraneth:collision

ppraneth commented Jan 16, 2026

Uh oh!

gemini-code-assist Bot commented Jan 16, 2026

Uh oh!

ppraneth commented Jan 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

ppraneth commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ppraneth commented Jan 16, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Jan 16, 2026

Uh oh!

ppraneth commented Jan 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

ppraneth commented Jan 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant