Skip to content

[model-gateway] Replace tokenizer with tokenizer registry for dynamic tokenizer loading in gRPC router#12968

Merged
slin1237 merged 1 commit intosgl-project:mainfrom
YouNeedCryDear:grpc-tokenizer-registry
Dec 23, 2025
Merged

[model-gateway] Replace tokenizer with tokenizer registry for dynamic tokenizer loading in gRPC router#12968
slin1237 merged 1 commit intosgl-project:mainfrom
YouNeedCryDear:grpc-tokenizer-registry

Conversation

@YouNeedCryDear
Copy link
Copy Markdown
Contributor

@YouNeedCryDear YouNeedCryDear commented Nov 10, 2025

Motivation

This PR addresses a key limitation in the SGLang router: the inability to support gRPC workers without predefined tokenizer-path or model-path. Currently, gRPC routers require tokenizer paths at startup, making multi-model serving with dynamic worker registration impossible.

This work implements a TokenizerRegistry system to enable dynamic tokenizer loading when workers join the router, and integrates it into the request pipeline to support tokenizer selection based on the requested model.

Related Router roadmap #13098

Modifications

Core Infrastructure

  • TokenizerRegistry (sgl-router/src/tokenizer/registry.rs): Thread-safe registry using DashMap for concurrent tokenizer storage and retrieval

    • load() method with per-key locking to prevent duplicate loads
    • Concurrent-safe: multiple workers with the same model load tokenizer only once
    • Simple API: get(), register(), remove(), contains()
    • Comprehensive unit tests covering basic operations, concurrent loading prevention, multiple models, load failures, and concurrent access patterns
  • AppContext Integration (sgl-router/src/app_context.rs): Added tokenizer_registry: Arc field alongside

    • Initialize tokenizer_registry at startup and register the default tokenizer if provided

Request Pipeline Integration

  • Worker Registration (sgl-router/src/core/workflow/steps/worker_registration.rs): Enhanced worker registration workflow to automatically load tokenizers

    • Fetch model info from workers using GetModelInfo gRPC call
    • Use extracted model_id as register key
    • Automatically load and register tokenizer in the registry when new workers join
    • Graceful error handling for workers that don't support model info queries
  • Request Processing Updated all request processing stages to use tokenizer registry:

    • Regular Router Processor (sgl-router/src/routers/grpc/regular/processor.rs): Look up tokenizer from registry based on requested model
    • Chat Preparation (sgl-router/src/routers/grpc/regular/stages/chat/preparation.rs): Use model-specific tokenizer for chat requests
    • Generate Preparation (sgl-router/src/routers/grpc/regular/stages/generate/preparation.rs): Use model-specific tokenizer for generate requests
    • Streaming Handler (sgl-router/src/routers/grpc/regular/streaming.rs): Use tokenizer registry for streaming response processing
  • Router Updates:

    • GrpcRouter (sgl-router/src/routers/grpc/router.rs) and PdRouter (sgl-router/src/routers/grpc/pd_router.rs): Pass tokenizer_registry to request processors
    • Pipeline (sgl-router/src/routers/grpc/pipeline.rs): Updated to work with tokenizer registry

Model ID Resolution for GenerateRequest

  • RouterManager (sgl-router/src/routers/router_manager.rs): Added intelligent model ID resolution for requests without explicit model specification
    • New resolve_model_id() helper method: Intelligently resolves model_id when not provided in requests
      • If model_id is provided: use it directly
      • If not provided and only 1 model exists: use it as implicit default (backward compatible for single-model deployments)
      • If not provided and multiple models exist: return clear error listing available models
      • If no models exist: return service unavailable error
    • Updated route_generate() method: Replaced hardcoded "unknown" fallback with intelligent model resolution
    • Ensures GenerateRequest (with optional model field) correctly routes to registered tokenizers
    • Aligns model_id resolution with tokenizer registry keys and worker model_ids

Configuration & Validation

  • Config Validation (sgl-router/src/config/validation.rs): Relaxed tokenizer-path requirement for gRPC mode to allow tokenizer-less startup with dynamic loading

Testing Updates

  • Updated test infrastructure (tests/common/) to work with tokenizer registry
  • Added test coverage for tokenizer registry operations

What's Not Included (Coming in Follow-up PR)

  • IGW (Intelligent Gateway) Support: Multi-model routing through IGW will be enabled in a subsequent PR. The current implementation focuses on single-router mode with the tokenizer registry infrastructure in place.

Accuracy Tests

Not applicable - this PR does not affect model outputs or inference results. It only modifies tokenizer loading infrastructure and routing logic to support dynamic tokenizer management.

Benchmarking and Profiling

Performance impact is minimal as the tokenizer registry uses DashMap for lock-free concurrent access. The registry lookup adds negligible overhead to the request path (~microseconds for a concurrent hash map lookup).

Key performance characteristics:

  • Tokenizer loading is done once per model during worker registration
  • Request routing uses fast concurrent hash map lookups
  • No impact on existing single-tokenizer configurations
  • Memory usage scales linearly with the number of unique models

Future benchmarking will measure:

  • Tokenizer loading time during worker registration
  • Memory usage with multiple models
  • Request routing latency comparison (should remain unchanged)

Checklist

Screenshot

Screenshot 2025-11-13 at 12 05 59 AM Screenshot 2025-11-13 at 12 07 33 AM

@YouNeedCryDear YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 6 times, most recently from 026cb56 to 71e7f38 Compare November 13, 2025 07:31
@YouNeedCryDear YouNeedCryDear changed the title [WIP][Router] Add tokenizer registry for dynamic tokenizer loading in IGW mode [Router] Add tokenizer registry for dynamic tokenizer loading in IGW mode Nov 13, 2025
@YouNeedCryDear YouNeedCryDear force-pushed the grpc-tokenizer-registry branch from 71e7f38 to 86d82a4 Compare November 13, 2025 08:17
@YouNeedCryDear YouNeedCryDear marked this pull request as ready for review November 13, 2025 08:25
@YouNeedCryDear YouNeedCryDear changed the title [Router] Add tokenizer registry for dynamic tokenizer loading in IGW mode [Router] Replace tokenizer with tokenizer registry for dynamic tokenizer loading in gRPC router Nov 13, 2025
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread sgl-router/src/core/workflow/steps/worker_registration.rs Outdated
Comment thread sgl-router/src/app_context.rs
@YouNeedCryDear YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 3 times, most recently from 5972eab to d3bc3b2 Compare November 13, 2025 23:24
@YouNeedCryDear
Copy link
Copy Markdown
Contributor Author

/gemini review

@YouNeedCryDear YouNeedCryDear force-pushed the grpc-tokenizer-registry branch from d3bc3b2 to 7ae24c6 Compare November 14, 2025 00:03
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a TokenizerRegistry to enable dynamic tokenizer loading, a crucial feature for multi-model serving. The implementation is solid, using DashMap for concurrency and integrating the new registry across the application. I've identified a few areas for improvement: a race condition in the tokenizer loading logic, a minor memory leak in the new registry, and some code duplication. My suggestions aim to address these points to enhance the robustness and maintainability of this new feature.

Comment thread sgl-router/src/core/workflow/steps/worker_registration.rs Outdated
Comment thread sgl-router/src/tokenizer/registry.rs
@YouNeedCryDear YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 2 times, most recently from 245113f to 69a239b Compare November 20, 2025 21:26
@YouNeedCryDear YouNeedCryDear force-pushed the grpc-tokenizer-registry branch from 69a239b to e86e56d Compare November 24, 2025 19:32
Comment thread sgl-router/src/core/workflow/steps/worker_registration.rs Outdated
Comment thread sgl-router/src/routers/grpc/regular/stages/generate/preparation.rs Outdated
Comment thread sgl-router/src/routers/grpc/regular/streaming.rs Outdated
Comment thread sgl-router/src/routers/grpc/regular/streaming.rs Outdated
Comment thread sgl-model-gateway/src/routers/grpc/regular/streaming.rs
Comment thread sgl-router/src/tokenizer/registry.rs Outdated
@slin1237
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@slin1237 slin1237 changed the title [Router] Replace tokenizer with tokenizer registry for dynamic tokenizer loading in gRPC router [model-gateway] Replace tokenizer with tokenizer registry for dynamic tokenizer loading in gRPC router Nov 24, 2025
@YouNeedCryDear YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 7 times, most recently from 8ac1271 to 2b62135 Compare December 2, 2025 15:47
Comment thread sgl-router/src/routers/grpc/context.rs Outdated
@YouNeedCryDear YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 4 times, most recently from 8e3e55a to 859357d Compare December 3, 2025 21:40
@YouNeedCryDear YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 2 times, most recently from 495cbda to 3e5bc3d Compare December 4, 2025 21:26
@YouNeedCryDear YouNeedCryDear force-pushed the grpc-tokenizer-registry branch from 3e5bc3d to 268bbf5 Compare December 17, 2025 21:56
@fzyzcjy
Copy link
Copy Markdown
Collaborator

fzyzcjy commented Dec 18, 2025

Hi, is there any updates about the compatibility of "router + gRPC + service discovery (ome)"? I am happy to quickly work on it, but since there is already ongoing PR, I feel not good to do so and thus need to wait for the existing PR, thus wondering whether is an ETA about it.

@YouNeedCryDear
Copy link
Copy Markdown
Contributor Author

Hi, is there any updates about the compatibility of "router + gRPC + service discovery (ome)"? I am happy to quickly work on it, but since there is already ongoing PR, I feel not good to do so and thus need to wait for the existing PR, thus wondering whether is an ETA about it.

@fzyzcjy Hi, thanks very much for reaching out. It is in the scope of model gateway 3.0
This PR is close to merge and I will be working on the get tokenizer endpoint PR #12407 afterwards. Hopefully it will be the next release or the one after.
Currently there are some major workflow refactor going on so that's why this has been pushed back a little bit.

@YouNeedCryDear YouNeedCryDear force-pushed the grpc-tokenizer-registry branch 3 times, most recently from c335a0a to aad0e0a Compare December 19, 2025 07:18
use tokenizer registry to shared component
use unknown to replace empty model id
use fall back strategy to resolve model id
@YouNeedCryDear YouNeedCryDear force-pushed the grpc-tokenizer-registry branch from aad0e0a to 965479b Compare December 23, 2025 08:03
@slin1237 slin1237 merged commit dd62098 into sgl-project:main Dec 23, 2025
62 checks passed
@YouNeedCryDear YouNeedCryDear deleted the grpc-tokenizer-registry branch December 24, 2025 05:17
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
RuixiangMa added a commit to RuixiangMa/sglang that referenced this pull request Mar 5, 2026
Add design document for implementing batch_size > 1 support
in Qwen-Image Edit pipelines, following diffusers PR sgl-project#12968 approach.

Key changes:
- Support nested list format for batch > 1 input
- Modify preprocess_vae_image() and _prepare_edit_cond_kwargs()
- Remove assert batch_size == 1 limitation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants