[model-gateway] Replace tokenizer with tokenizer registry for dynamic tokenizer loading in gRPC router by YouNeedCryDear · Pull Request #12968 · sgl-project/sglang

YouNeedCryDear · 2025-11-10T07:44:51Z

Motivation

This PR addresses a key limitation in the SGLang router: the inability to support gRPC workers without predefined tokenizer-path or model-path. Currently, gRPC routers require tokenizer paths at startup, making multi-model serving with dynamic worker registration impossible.

This work implements a TokenizerRegistry system to enable dynamic tokenizer loading when workers join the router, and integrates it into the request pipeline to support tokenizer selection based on the requested model.

Related Router roadmap #13098

Modifications

Core Infrastructure

TokenizerRegistry (sgl-router/src/tokenizer/registry.rs): Thread-safe registry using DashMap for concurrent tokenizer storage and retrieval
- load() method with per-key locking to prevent duplicate loads
- Concurrent-safe: multiple workers with the same model load tokenizer only once
- Simple API: get(), register(), remove(), contains()
- Comprehensive unit tests covering basic operations, concurrent loading prevention, multiple models, load failures, and concurrent access patterns
AppContext Integration (sgl-router/src/app_context.rs): Added tokenizer_registry: Arc field alongside
- Initialize tokenizer_registry at startup and register the default tokenizer if provided

Request Pipeline Integration

Worker Registration (sgl-router/src/core/workflow/steps/worker_registration.rs): Enhanced worker registration workflow to automatically load tokenizers
- Fetch model info from workers using GetModelInfo gRPC call
- Use extracted model_id as register key
- Automatically load and register tokenizer in the registry when new workers join
- Graceful error handling for workers that don't support model info queries
Request Processing Updated all request processing stages to use tokenizer registry:
- Regular Router Processor (sgl-router/src/routers/grpc/regular/processor.rs): Look up tokenizer from registry based on requested model
- Chat Preparation (sgl-router/src/routers/grpc/regular/stages/chat/preparation.rs): Use model-specific tokenizer for chat requests
- Generate Preparation (sgl-router/src/routers/grpc/regular/stages/generate/preparation.rs): Use model-specific tokenizer for generate requests
- Streaming Handler (sgl-router/src/routers/grpc/regular/streaming.rs): Use tokenizer registry for streaming response processing
Router Updates:
- GrpcRouter (sgl-router/src/routers/grpc/router.rs) and PdRouter (sgl-router/src/routers/grpc/pd_router.rs): Pass tokenizer_registry to request processors
- Pipeline (sgl-router/src/routers/grpc/pipeline.rs): Updated to work with tokenizer registry

Model ID Resolution for GenerateRequest

RouterManager (sgl-router/src/routers/router_manager.rs): Added intelligent model ID resolution for requests without explicit model specification
- New resolve_model_id() helper method: Intelligently resolves model_id when not provided in requests
  - If model_id is provided: use it directly
  - If not provided and only 1 model exists: use it as implicit default (backward compatible for single-model deployments)
  - If not provided and multiple models exist: return clear error listing available models
  - If no models exist: return service unavailable error
- Updated route_generate() method: Replaced hardcoded "unknown" fallback with intelligent model resolution
- Ensures GenerateRequest (with optional model field) correctly routes to registered tokenizers
- Aligns model_id resolution with tokenizer registry keys and worker model_ids

Configuration & Validation

Config Validation (sgl-router/src/config/validation.rs): Relaxed tokenizer-path requirement for gRPC mode to allow tokenizer-less startup with dynamic loading

Testing Updates

Updated test infrastructure (tests/common/) to work with tokenizer registry
Added test coverage for tokenizer registry operations

What's Not Included (Coming in Follow-up PR)

IGW (Intelligent Gateway) Support: Multi-model routing through IGW will be enabled in a subsequent PR. The current implementation focuses on single-router mode with the tokenizer registry infrastructure in place.

Accuracy Tests

Not applicable - this PR does not affect model outputs or inference results. It only modifies tokenizer loading infrastructure and routing logic to support dynamic tokenizer management.

Benchmarking and Profiling

Performance impact is minimal as the tokenizer registry uses DashMap for lock-free concurrent access. The registry lookup adds negligible overhead to the request path (~microseconds for a concurrent hash map lookup).

Key performance characteristics:

Tokenizer loading is done once per model during worker registration
Request routing uses fast concurrent hash map lookups
No impact on existing single-tokenizer configurations
Memory usage scales linearly with the number of unique models

Future benchmarking will measure:

Tokenizer loading time during worker registration
Memory usage with multiple models
Request routing latency comparison (should remain unchanged)

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

Screenshot

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

YouNeedCryDear · 2025-11-14T00:02:13Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a TokenizerRegistry to enable dynamic tokenizer loading, a crucial feature for multi-model serving. The implementation is solid, using DashMap for concurrency and integrating the new registry across the application. I've identified a few areas for improvement: a race condition in the tokenizer loading logic, a minor memory leak in the new registry, and some code duplication. My suggestions aim to address these points to enhance the robustness and maintainability of this new feature.

slin1237 · 2025-11-24T21:50:04Z

/tag-and-rerun-ci

fzyzcjy · 2025-12-18T02:08:00Z

Hi, is there any updates about the compatibility of "router + gRPC + service discovery (ome)"? I am happy to quickly work on it, but since there is already ongoing PR, I feel not good to do so and thus need to wait for the existing PR, thus wondering whether is an ETA about it.

YouNeedCryDear · 2025-12-18T05:09:38Z

Hi, is there any updates about the compatibility of "router + gRPC + service discovery (ome)"? I am happy to quickly work on it, but since there is already ongoing PR, I feel not good to do so and thus need to wait for the existing PR, thus wondering whether is an ETA about it.

@fzyzcjy Hi, thanks very much for reaching out. It is in the scope of model gateway 3.0
This PR is close to merge and I will be working on the get tokenizer endpoint PR #12407 afterwards. Hopefully it will be the next release or the one after.
Currently there are some major workflow refactor going on so that's why this has been pushed back a little bit.

use tokenizer registry to shared component use unknown to replace empty model id use fall back strategy to resolve model id

… tokenizer loading in gRPC router (sgl-project#12968)

Add design document for implementing batch_size > 1 support in Qwen-Image Edit pipelines, following diffusers PR sgl-project#12968 approach. Key changes: - Support nested list format for batch > 1 input - Modify preprocess_vae_image() and _prepare_edit_cond_kwargs() - Remove assert batch_size == 1 limitation

github-actions Bot added the router label Nov 10, 2025

YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 6 times, most recently from 026cb56 to 71e7f38 Compare November 13, 2025 07:31

YouNeedCryDear changed the title ~~[WIP][Router] Add tokenizer registry for dynamic tokenizer loading in IGW mode~~ [Router] Add tokenizer registry for dynamic tokenizer loading in IGW mode Nov 13, 2025

YouNeedCryDear force-pushed the grpc-tokenizer-registry branch from 71e7f38 to 86d82a4 Compare November 13, 2025 08:17

YouNeedCryDear marked this pull request as ready for review November 13, 2025 08:25

YouNeedCryDear requested review from ByronHsu, CatherineSue, key4ng and slin1237 as code owners November 13, 2025 08:25

YouNeedCryDear changed the title ~~[Router] Add tokenizer registry for dynamic tokenizer loading in IGW mode~~ [Router] Replace tokenizer with tokenizer registry for dynamic tokenizer loading in gRPC router Nov 13, 2025

chatgpt-codex-connector Bot reviewed Nov 13, 2025

View reviewed changes

Comment thread sgl-router/src/core/workflow/steps/worker_registration.rs Outdated

Comment thread sgl-router/src/app_context.rs

YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 3 times, most recently from 5972eab to d3bc3b2 Compare November 13, 2025 23:24

YouNeedCryDear force-pushed the grpc-tokenizer-registry branch from d3bc3b2 to 7ae24c6 Compare November 14, 2025 00:03

gemini-code-assist Bot reviewed Nov 14, 2025

View reviewed changes

Comment thread sgl-router/src/core/workflow/steps/worker_registration.rs Outdated

Comment thread sgl-router/src/tokenizer/registry.rs

YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 2 times, most recently from 245113f to 69a239b Compare November 20, 2025 21:26

github-actions Bot added the model-gateway label Nov 20, 2025

YouNeedCryDear force-pushed the grpc-tokenizer-registry branch from 69a239b to e86e56d Compare November 24, 2025 19:32

slin1237 reviewed Nov 24, 2025

View reviewed changes

slin1237 changed the title ~~[Router] Replace tokenizer with tokenizer registry for dynamic tokenizer loading in gRPC router~~ [model-gateway] Replace tokenizer with tokenizer registry for dynamic tokenizer loading in gRPC router Nov 24, 2025

github-actions Bot added the run-ci label Nov 24, 2025

YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 7 times, most recently from 8ac1271 to 2b62135 Compare December 2, 2025 15:47

slin1237 requested changes Dec 2, 2025

View reviewed changes

Comment thread sgl-router/src/routers/grpc/context.rs Outdated

YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 4 times, most recently from 8e3e55a to 859357d Compare December 3, 2025 21:40

YouNeedCryDear requested a review from slin1237 December 3, 2025 21:52

YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 2 times, most recently from 495cbda to 3e5bc3d Compare December 4, 2025 21:26

YouNeedCryDear force-pushed the grpc-tokenizer-registry branch from 3e5bc3d to 268bbf5 Compare December 17, 2025 21:56

YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 3 times, most recently from c335a0a to aad0e0a Compare December 19, 2025 07:18

slin1237 approved these changes Dec 21, 2025

View reviewed changes

add tokenizer registry to replace tokenizer

965479b

use tokenizer registry to shared component use unknown to replace empty model id use fall back strategy to resolve model id

YouNeedCryDear force-pushed the grpc-tokenizer-registry branch from aad0e0a to 965479b Compare December 23, 2025 08:03

slin1237 approved these changes Dec 23, 2025

View reviewed changes

slin1237 merged commit dd62098 into sgl-project:main Dec 23, 2025
62 checks passed

YouNeedCryDear deleted the grpc-tokenizer-registry branch December 24, 2025 05:17

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

[model-gateway] Replace tokenizer with tokenizer registry for dynamic…

64c911c

… tokenizer loading in gRPC router (sgl-project#12968)

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

[model-gateway] Replace tokenizer with tokenizer registry for dynamic…

b21dc10

… tokenizer loading in gRPC router (sgl-project#12968)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[model-gateway] Replace tokenizer with tokenizer registry for dynamic tokenizer loading in gRPC router#12968

[model-gateway] Replace tokenizer with tokenizer registry for dynamic tokenizer loading in gRPC router#12968
slin1237 merged 1 commit intosgl-project:mainfrom
YouNeedCryDear:grpc-tokenizer-registry

YouNeedCryDear commented Nov 10, 2025 •

edited

Loading

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

Uh oh!

YouNeedCryDear commented Nov 14, 2025

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slin1237 commented Nov 24, 2025

Uh oh!

Uh oh!

fzyzcjy commented Dec 18, 2025

Uh oh!

YouNeedCryDear commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

YouNeedCryDear commented Nov 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Core Infrastructure

Request Pipeline Integration

Model ID Resolution for GenerateRequest

Configuration & Validation

Testing Updates

What's Not Included (Coming in Follow-up PR)

Accuracy Tests

Benchmarking and Profiling

Checklist

Screenshot

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

YouNeedCryDear commented Nov 14, 2025

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slin1237 commented Nov 24, 2025

Uh oh!

Uh oh!

fzyzcjy commented Dec 18, 2025

Uh oh!

YouNeedCryDear commented Dec 18, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

YouNeedCryDear commented Nov 10, 2025 •

edited

Loading