[model-gateway] Replace tokenizer with tokenizer registry for dynamic tokenizer loading in gRPC router#12968
Conversation
026cb56 to
71e7f38
Compare
71e7f38 to
86d82a4
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
5972eab to
d3bc3b2
Compare
|
/gemini review |
d3bc3b2 to
7ae24c6
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a TokenizerRegistry to enable dynamic tokenizer loading, a crucial feature for multi-model serving. The implementation is solid, using DashMap for concurrency and integrating the new registry across the application. I've identified a few areas for improvement: a race condition in the tokenizer loading logic, a minor memory leak in the new registry, and some code duplication. My suggestions aim to address these points to enhance the robustness and maintainability of this new feature.
245113f to
69a239b
Compare
69a239b to
e86e56d
Compare
|
/tag-and-rerun-ci |
8ac1271 to
2b62135
Compare
8e3e55a to
859357d
Compare
495cbda to
3e5bc3d
Compare
3e5bc3d to
268bbf5
Compare
|
Hi, is there any updates about the compatibility of "router + gRPC + service discovery (ome)"? I am happy to quickly work on it, but since there is already ongoing PR, I feel not good to do so and thus need to wait for the existing PR, thus wondering whether is an ETA about it. |
@fzyzcjy Hi, thanks very much for reaching out. It is in the scope of model gateway 3.0 |
c335a0a to
aad0e0a
Compare
use tokenizer registry to shared component use unknown to replace empty model id use fall back strategy to resolve model id
aad0e0a to
965479b
Compare
… tokenizer loading in gRPC router (sgl-project#12968)
… tokenizer loading in gRPC router (sgl-project#12968)
Add design document for implementing batch_size > 1 support in Qwen-Image Edit pipelines, following diffusers PR sgl-project#12968 approach. Key changes: - Support nested list format for batch > 1 input - Modify preprocess_vae_image() and _prepare_edit_cond_kwargs() - Remove assert batch_size == 1 limitation
Motivation
This PR addresses a key limitation in the SGLang router: the inability to support gRPC workers without predefined tokenizer-path or model-path. Currently, gRPC routers require tokenizer paths at startup, making multi-model serving with dynamic worker registration impossible.
This work implements a TokenizerRegistry system to enable dynamic tokenizer loading when workers join the router, and integrates it into the request pipeline to support tokenizer selection based on the requested model.
Related Router roadmap #13098
Modifications
Core Infrastructure
TokenizerRegistry (
sgl-router/src/tokenizer/registry.rs): Thread-safe registry using DashMap for concurrent tokenizer storage and retrievalload()method with per-key locking to prevent duplicate loadsget(),register(),remove(),contains()AppContext Integration (
sgl-router/src/app_context.rs): Addedtokenizer_registry: Arcfield alongsideRequest Pipeline Integration
Worker Registration (
sgl-router/src/core/workflow/steps/worker_registration.rs): Enhanced worker registration workflow to automatically load tokenizersRequest Processing Updated all request processing stages to use tokenizer registry:
sgl-router/src/routers/grpc/regular/processor.rs): Look up tokenizer from registry based on requested modelsgl-router/src/routers/grpc/regular/stages/chat/preparation.rs): Use model-specific tokenizer for chat requestssgl-router/src/routers/grpc/regular/stages/generate/preparation.rs): Use model-specific tokenizer for generate requestssgl-router/src/routers/grpc/regular/streaming.rs): Use tokenizer registry for streaming response processingRouter Updates:
sgl-router/src/routers/grpc/router.rs) and PdRouter (sgl-router/src/routers/grpc/pd_router.rs): Pass tokenizer_registry to request processorssgl-router/src/routers/grpc/pipeline.rs): Updated to work with tokenizer registryModel ID Resolution for GenerateRequest
sgl-router/src/routers/router_manager.rs): Added intelligent model ID resolution for requests without explicit model specificationresolve_model_id()helper method: Intelligently resolves model_id when not provided in requestsroute_generate()method: Replaced hardcoded "unknown" fallback with intelligent model resolutionConfiguration & Validation
sgl-router/src/config/validation.rs): Relaxed tokenizer-path requirement for gRPC mode to allow tokenizer-less startup with dynamic loadingTesting Updates
tests/common/) to work with tokenizer registryWhat's Not Included (Coming in Follow-up PR)
Accuracy Tests
Not applicable - this PR does not affect model outputs or inference results. It only modifies tokenizer loading infrastructure and routing logic to support dynamic tokenizer management.
Benchmarking and Profiling
Performance impact is minimal as the tokenizer registry uses DashMap for lock-free concurrent access. The registry lookup adds negligible overhead to the request path (~microseconds for a concurrent hash map lookup).
Key performance characteristics:
Future benchmarking will measure:
Checklist
Screenshot