Skip to content

[model-gateway] Fix IGW routing and optimize RouterManager#15741

Merged
slin1237 merged 1 commit intomainfrom
fixup-n/1
Dec 24, 2025
Merged

[model-gateway] Fix IGW routing and optimize RouterManager#15741
slin1237 merged 1 commit intomainfrom
fixup-n/1

Conversation

@slin1237
Copy link
Copy Markdown
Collaborator

  • Add static RouterId constants to avoid heap allocations in hot paths
  • Replace RwLock .unwrap() with .unwrap_or_else() for poison safety
  • Add model ID validation in IGW mode for chat/completion/generate
  • Add OpenAI router creation in IGW mode
  • Simplify non-IGW mode routing (router handles validation)

Checklist

- Add static RouterId constants to avoid heap allocations in hot paths
- Replace RwLock .unwrap() with .unwrap_or_else() for poison safety
- Add model ID validation in IGW mode for chat/completion/generate
- Add OpenAI router creation in IGW mode
- Simplify non-IGW mode routing (router handles validation)
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the model-gateway's routing mechanism, focusing on performance, concurrency safety, and specialized handling for Intelligent Gateway (IGW) operations. It introduces static router identifiers to reduce memory allocations, improves RwLock error handling, and implements distinct model ID validation and router creation logic tailored for IGW and non-IGW environments.

Highlights

  • Performance Optimization: Introduced static RouterId constants using &'static str to eliminate heap allocations in hot paths within the RouterManager, improving overall performance.
  • Concurrency Safety: Enhanced robustness by replacing RwLock.unwrap() calls with unwrap_or_else(|e| e.into_inner()) to gracefully handle poisoned locks.
  • Intelligent Gateway (IGW) Routing Enhancements: Implemented explicit model ID validation and early failure for chat, completion, and generate requests when operating in IGW mode. Also, integrated the creation and registration of an OpenAI-compatible router within the RouterManager for IGW mode.
  • Simplified Non-IGW Routing: Streamlined model ID handling for non-IGW modes, delegating validation responsibilities to individual routers, thus simplifying the RouterManager's logic in these cases.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refines the model gateway's routing mechanism, focusing on performance, stability, and intelligent gateway (IGW) functionality. It introduces static router identifiers to minimize memory overhead, improves concurrency safety by handling RwLock poisoning, and enhances IGW mode with explicit model ID validation and OpenAI router support. These changes collectively aim to make the gateway more efficient and robust.

Highlights

  • Performance Optimization: Introduced static RouterId constants to reduce heap allocations in critical routing paths, improving overall performance.
  • Concurrency Safety: Replaced RwLock.unwrap() calls with unwrap_or_else() for poison safety, enhancing the robustness of the RouterManager against panics.
  • Intelligent Gateway (IGW) Routing Logic: Implemented model ID validation for chat, completion, and generate requests when operating in IGW mode, ensuring early failure for unresolvable models.
  • OpenAI Router Integration: Added the capability to create and register an OpenAI-compatible router within the RouterManager, specifically for IGW mode.
  • Simplified Non-IGW Routing: Streamlined the routing process for non-IGW modes by delegating model validation responsibilities to individual routers, reducing complexity in the RouterManager.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @slin1237, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the 'model-gateway's routing capabilities and robustness. It introduces performance optimizations by leveraging static router identifiers to minimize memory allocations, improves concurrency safety by gracefully handling 'RwLock' poisoning, and refines model ID validation, particularly for Intelligent Gateway (IGW) mode. Additionally, it integrates OpenAI router creation and simplifies non-IGW routing by shifting validation responsibilities to the specific routers.

Highlights

  • Performance Optimization: Introduced static RouterId constants using '&'static str to eliminate heap allocations in hot paths, improving performance and reducing runtime overhead.
  • Concurrency Safety: Replaced 'RwLock.unwrap()' calls with 'unwrap_or_else(|e| e.into_inner())' to enhance poison safety when dealing with 'RwLock's, ensuring more robust error handling in concurrent operations.
  • IGW Model ID Validation: Implemented explicit model ID validation for 'chat', 'completion', and 'generate' requests when operating in IGW (Intelligent Gateway) mode, ensuring requests are routed to valid models and failing fast if not.
  • OpenAI Router Integration: Enabled the creation and registration of an OpenAI-compatible router within the 'RouterManager' for IGW mode, expanding routing capabilities.
  • Simplified Non-IGW Routing: Streamlined the routing logic for non-IGW mode by delegating model validation to the individual routers, reducing complexity in the 'RouterManager' itself.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces several valuable optimizations and fixes to the RouterManager. The move to &'static str for RouterId and using constants is a great performance improvement, reducing heap allocations in hot paths. The adoption of unwrap_or_else for RwLocks enhances robustness by handling potential lock poisoning. The routing logic is also improved, with clearer separation between IGW and non-IGW modes, and more efficient router selection.

I've added a few suggestions to further improve code clarity and maintainability by addressing some minor code duplication and an unreachable code path. Overall, this is a solid set of changes.

(ConnectionMode::Http, RoutingMode::OpenAI { .. }) => router_ids::HTTP_OPENAI,
(ConnectionMode::Grpc { .. }, RoutingMode::Regular { .. }) => router_ids::GRPC_REGULAR,
(ConnectionMode::Grpc { .. }, RoutingMode::PrefillDecode { .. }) => router_ids::GRPC_PD,
(ConnectionMode::Grpc { .. }, RoutingMode::OpenAI { .. }) => router_ids::GRPC_REGULAR,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The combination of ConnectionMode::Grpc and RoutingMode::OpenAI represents an invalid configuration that should be caught during startup validation in RouterFactory. Therefore, this match arm is unreachable. Returning GRPC_REGULAR is misleading. Using unreachable! makes this assumption explicit and will cause a panic if this logic is ever reached, which is desirable for what should be an impossible state.

Suggested change
(ConnectionMode::Grpc { .. }, RoutingMode::OpenAI { .. }) => router_ids::GRPC_REGULAR,
(ConnectionMode::Grpc { .. }, RoutingMode::OpenAI { .. }) => unreachable!("Invalid config: OpenAI mode requires HTTP"),

Comment on lines +505 to 511
let router =
self.select_router_for_request(headers, effective_model_id.as_deref().or(model_id));

if let Some(router) = router {
router
.route_generate(headers, body, Some(&resolved_model_id))
.route_generate(headers, body, effective_model_id.as_deref().or(model_id))
.await
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The expression effective_model_id.as_deref().or(model_id) is used twice here. To improve clarity and avoid re-computation, it's better to evaluate it once and store the result in a variable. This same pattern is repeated in route_chat and route_completion and could be improved there as well.

Suggested change
let router =
self.select_router_for_request(headers, effective_model_id.as_deref().or(model_id));
if let Some(router) = router {
router
.route_generate(headers, body, Some(&resolved_model_id))
.route_generate(headers, body, effective_model_id.as_deref().or(model_id))
.await
let final_model_id = effective_model_id.as_deref().or(model_id);
let router = self.select_router_for_request(headers, final_model_id);
if let Some(router) = router {
router
.route_generate(headers, body, final_model_id)
.await

Comment on lines +529 to +538
let effective_model_id = if self.enable_igw {
// Use provided model_id or fall back to body.model
let model = model_id.or(Some(&body.model));
match self.resolve_model_id(model) {
Ok(id) => Some(id),
Err(err_response) => return *err_response,
}
} else {
None
};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This block of logic for resolving effective_model_id is duplicated in route_completion (lines 564-573). Consider extracting this into a private helper function to reduce code duplication and improve maintainability. The helper could take enable_igw, model_id, and body.model as arguments.

@slin1237 slin1237 merged commit 2f7c629 into main Dec 24, 2025
72 checks passed
@slin1237 slin1237 deleted the fixup-n/1 branch December 24, 2025 16:28
jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025
YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant