Skip to content

[model-gateway] Fix tokenizer L0 cache key collision on add_special_tokens#17189

Closed
ppraneth wants to merge 9 commits intosgl-project:mainfrom
ppraneth:collision
Closed

[model-gateway] Fix tokenizer L0 cache key collision on add_special_tokens#17189
ppraneth wants to merge 9 commits intosgl-project:mainfrom
ppraneth:collision

Conversation

@ppraneth
Copy link
Copy Markdown
Contributor

Motivation

The CachedTokenizer's L0 cache layer was previously using only the raw input string as a lookup key. This caused a key collision when the same string was tokenized with different values for the add_special_tokens flag (e.g., once for an embedding request where special tokens are required, and once for a chat length check where they are omitted). The second call would incorrectly return the cached result of the first, leading to incorrect token IDs and counts.

Modifications

  • src/tokenizer/cache/l0.rs: Updated the internal DashMap to use a composite key of (String, bool) representing the input text and the add_special_tokens flag. Updated get, insert, and insert_arc methods to handle this new key format.
  • src/tokenizer/cache/mod.rs: Modified the encode method to pass the add_special_tokens flag into the L0 cache lookup and insertion logic. Updated documentation comments to reflect that the cache is now parameter-aware.
  • Tests: Updated unit tests in l0.rs to reflect the new method signatures.

Accuracy Tests

Created a new integration test tests/tokenizer/cache_collision_test.rs to verify the fix. The test confirms:

  1. Isolation: Tokenizing the same string with add_special_tokens=true followed by add_special_tokens=false results in two distinct cache misses, proving they no longer collide.
  2. Correctness: A third call using a previously cached combination (e.g., true) results in a cache hit.

Test Result:

test tokenizer::cache_collision_test::test_l0_cache_key_collision ... ok

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@ppraneth
Copy link
Copy Markdown
Contributor Author

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request correctly fixes a cache key collision bug in the L0 tokenizer cache by including the add_special_tokens flag in the cache key. The changes are logical and well-tested, with a new integration test that effectively validates the fix. My main feedback is a performance concern regarding the implementation of the new cache key. The current approach introduces a string allocation on every cache lookup, which could be a bottleneck. I've left a detailed comment with suggestions on how to address this to maintain the cache's high performance.

Comment thread sgl-model-gateway/src/tokenizer/cache/l0.rs
@ppraneth
Copy link
Copy Markdown
Contributor Author

@slin1237 Can you take a look at this pr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant