[model-gateway]Enable IGW mode with gRPC router and auto enable IGW when service discovery is turned on#15459
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // Load tokenizer with thread safe lock | ||
| if let Err(e) = app_context | ||
| .tokenizer_registry | ||
| .load(&model_id, || async move { | ||
| factory::create_tokenizer_async(&tokenizer_path.to_string()) | ||
| .await | ||
| .map_err(|e| e.to_string()) |
There was a problem hiding this comment.
Preserve chat_template when loading worker tokenizers
Dynamic tokenizer loading ignores the configured chat template: RegisterTokenizerStep loads tokenizers with create_tokenizer_async (lines 50-56) without passing RouterConfig.chat_template or the worker’s model card template. In service-discovery/IGW mode this path supplies the only tokenizer for gRPC routers, so any --chat-template override is silently dropped and prompts are formatted with the default template, which can break models that rely on the custom template. Consider passing the configured chat template when registering tokenizers so gRPC routing remains consistent with the router’s settings.
Useful? React with 👍 / 👎.
62e1268 to
0747af1
Compare
0747af1 to
e642fe9
Compare
e642fe9 to
748b556
Compare
…hen service discovery is turned on (sgl-project#15459)
…hen service discovery is turned on (sgl-project#15459)
Summary
Motivation
Modifications
sgl-model-gateway/src/app_context.rs: Initialize reasoning/tool parser factories when either gRPC or IGW is enabled, reflecting IGW’s gRPC router usage.sgl-model-gateway/src/main.rs: Decouple routing-mode selection from IGW, streamline PD routing config, and automatically flip enable_igw on when service discovery is requested (with an info log).sgl-model-gateway/src/routers/router_manager.rs: Always create a gRPC regular router in IGW mode; create HTTP and gRPC PD routers only when PD disaggregation is enabled; choose routers based on worker connection mode/role priority (grpc-pd > http-pd > grpc-regular > http-regular); update health endpoint to report ready if any worker is healthy.Accuracy Tests
Benchmarking and Profiling
Screenshot
Checklist