Skip to content

[model-gateway]Enable IGW mode with gRPC router and auto enable IGW when service discovery is turned on#15459

Merged
slin1237 merged 1 commit intosgl-project:mainfrom
YouNeedCryDear:enable-igw-grpc
Dec 24, 2025
Merged

[model-gateway]Enable IGW mode with gRPC router and auto enable IGW when service discovery is turned on#15459
slin1237 merged 1 commit intosgl-project:mainfrom
YouNeedCryDear:enable-igw-grpc

Conversation

@YouNeedCryDear
Copy link
Copy Markdown
Contributor

@YouNeedCryDear YouNeedCryDear commented Dec 19, 2025

Summary

  • Enable IGW mode to spin up gRPC routers (regular and PD) and select them preferentially when matching workers are present.
  • Auto-enable IGW when service discovery is requested, aligning router initialization with discovery-driven worker registration.
  • Improve readiness reporting by returning 200 when any registered worker is healthy.

Motivation

  • IGW mode internally uses all types of routers; previously parser factories and router creation logic were tied to explicit single gRPC instance, leaving IGW without the necessary gRPC router coverage.
  • Service discovery implies IGW behavior; requiring a separate flag led to misconfiguration risk and silent misalignment between discovery and router mode.
  • Operators need the router manager to pick the best transport (gRPC vs HTTP, PD vs regular) based on available workers and to surface health when capacity exists.

Modifications

  • sgl-model-gateway/src/app_context.rs: Initialize reasoning/tool parser factories when either gRPC or IGW is enabled, reflecting IGW’s gRPC router usage.
  • sgl-model-gateway/src/main.rs: Decouple routing-mode selection from IGW, streamline PD routing config, and automatically flip enable_igw on when service discovery is requested (with an info log).
  • sgl-model-gateway/src/routers/router_manager.rs: Always create a gRPC regular router in IGW mode; create HTTP and gRPC PD routers only when PD disaggregation is enabled; choose routers based on worker connection mode/role priority (grpc-pd > http-pd > grpc-regular > http-regular); update health endpoint to report ready if any worker is healthy.

Accuracy Tests

Benchmarking and Profiling

Screenshot

Screenshot 2025-12-18 at 11 32 15 PM

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +50 to +56
// Load tokenizer with thread safe lock
if let Err(e) = app_context
.tokenizer_registry
.load(&model_id, || async move {
factory::create_tokenizer_async(&tokenizer_path.to_string())
.await
.map_err(|e| e.to_string())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Preserve chat_template when loading worker tokenizers

Dynamic tokenizer loading ignores the configured chat template: RegisterTokenizerStep loads tokenizers with create_tokenizer_async (lines 50-56) without passing RouterConfig.chat_template or the worker’s model card template. In service-discovery/IGW mode this path supplies the only tokenizer for gRPC routers, so any --chat-template override is silently dropped and prompts are formatted with the default template, which can break models that rely on the custom template. Consider passing the configured chat template when registering tokenizers so gRPC routing remains consistent with the router’s settings.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @slin1237 for fixing this bug

@slin1237 slin1237 merged commit f65fa04 into sgl-project:main Dec 24, 2025
62 checks passed
@YouNeedCryDear YouNeedCryDear deleted the enable-igw-grpc branch December 25, 2025 08:57
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants