[model-gateway]: add gRPC router embeddings endpoint implementation by Ratish1 · Pull Request #15273 · sgl-project/sglang

Ratish1 · 2025-12-16T17:42:56Z

Motivation

This PR introduces initial support for the Embedding API within the gRPC model gateway. This feature allows clients to request vector embeddings for text inputs, leveraging existing gRPC backend workers (SGLang).

Modifications

Dedicated Pipeline Stages: Implements new pipeline stages for embedding requests: EmbeddingPreparationStage, EmbeddingRequestBuildingStage, and EmbeddingResponseProcessingStage. This ensures a clear separation of concerns and a robust processing flow specific to embeddings.
GrpcRouter Integration: Adds an embedding_pipeline to the GrpcRouter and a new route_embeddings_impl method to handle incoming embedding requests, dispatching them through the dedicated pipeline.
Unified Protocol Handling: Extends ProtoRequest and ProtoEmbedResponse embedding-specific protobuf messages, enabling transparent interaction with different backend types.
RequestContext Extension: Updates the RequestContextwith RequestType::Embedding, FinalResponse::Embedding, and ExecutionResult::Embedding to properly track embedding request states throughout the pipeline.
The _convert_embed_request function in python/sglang/srt/entrypoints/grpc_server.py was updated to correctly extract image_inputs, token_type_ids, and sampling_params from the gRPC EmbedRequest.
sampling_params.max_new_tokens is explicitly set to 0 to prevent a TypeError in the downstream scheduler, a workaround for current backend logic.
In python/sglang/srt/managers/scheduler.py, a None check for self.pad_input_ids_func was added within handle_embedding_request, ensuring safe handling of multimodal input in embedding requests with appropriate documentation.
Configurable log_metrics: Adds log_metrics: Option<bool> to EmbeddingRequest for control over metrics logging, defaulting to false.
Implements build_embed_request in sgl-model-gateway/src/grpc_client/sglang_scheduler.rs to abstract the gRPC EmbedRequest
AddsENDPOINT_EMBEDDINGS to smg_labels and utilizes it for accurate metrics tracking of embedding requests, enhancing system observability .

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-16T17:43:34Z

Summary of Changes

Hello @Ratish1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands the model gateway's capabilities by introducing a comprehensive gRPC endpoint for embedding requests. It establishes a dedicated processing pipeline, from request preparation and building to execution and response processing, ensuring that embedding tasks are handled efficiently and distinctly from generation tasks. The changes also include necessary adaptations for both SGLang and vLLM backends, alongside robust error handling and improved metrics for observability.

Highlights

gRPC Embedding API Support: Initial support for the Embedding API has been added to the gRPC model gateway, enabling clients to request vector embeddings for text inputs from SGLang and vLLM backends.
Dedicated Pipeline Stages: New pipeline stages (EmbeddingPreparationStage, EmbeddingRequestBuildingStage, EmbeddingResponseProcessingStage) have been implemented to ensure a robust and separated processing flow specifically for embedding requests.
Unified Protocol Handling: The system now uses unified ProtoRequest and ProtoEmbedResponse enums to wrap backend-specific protobuf messages, facilitating transparent interaction with different backend types.
RequestContext Extension: The RequestContext has been updated to properly track embedding request states, introducing RequestType::Embedding, FinalResponse::Embedding, and ExecutionResult::Embedding.
Backend Integration & Fixes: Updates include correctly extracting image inputs and sampling parameters for embedding requests in grpc_server.py (setting max_new_tokens to 0 as a workaround), and adding a None check for pad_input_ids_func in scheduler.py for multimodal input handling.
Metrics and Configuration: The EmbeddingRequest now supports a configurable log_metrics option, and ENDPOINT_EMBEDDINGS has been added for accurate metrics tracking of embedding requests.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces support for the Embedding API in the gRPC model gateway, which is a significant feature addition. The implementation is well-structured, with dedicated pipeline stages for embedding requests, integration into the GrpcRouter, and updates to the RequestContext and protocol wrappers to handle embeddings. The changes are comprehensive and touch both the Python backend and the Rust model gateway.

My review focuses on a few areas for improvement:

Consistency with the PR description regarding default values.
Removal of unused code.
Ensuring complete implementation of features mentioned in the description.

Overall, this is a solid contribution that extends the gateway's capabilities. Addressing the minor issues identified will further improve the code quality.

slin1237 · 2025-12-16T19:46:05Z

/tag-and-rerun-ci

…glang into grpc-router-embeddings

Ratish1 · 2025-12-17T08:17:16Z

/gemini-review

gemini-code-assist

Code Review

This pull request introduces support for the Embedding API in the gRPC model gateway, which is a significant and well-structured addition. The changes include dedicated pipeline stages, integration with the GrpcRouter, and extensions to the protocol and request context to handle embedding requests. The new e2e tests are also a great addition.

My review has identified a few key areas for improvement:

The handling of batch embedding requests is not aligned with the OpenAI API specification. Currently, an array of strings is concatenated and processed as a single input, whereas it should be processed as a batch, returning an embedding for each string.
There's a discrepancy in the Python gRPC server where image_inputs are hardcoded, which contradicts the PR description.

Addressing these points will ensure the new embedding endpoint is robust and compliant with expected behavior.

slin1237 · 2025-12-22T01:48:48Z

PR looks good to me
please resolve conflict and i can merge this one
nice work

Ratish1 · 2025-12-22T06:56:24Z

PR looks good to me please resolve conflict and i can merge this one nice work

I fixed the conflict. Thanks for the review.

Ratish1 · 2025-12-23T09:10:04Z

I have updated the branch and the tests pass now @slin1237 . Let me know if you need anything else. Thanks for the review.

…gl-project#15273)

[model-gateway]: add gRPC router embeddings endpoint implementation

a99218b

Ratish1 requested review from CatherineSue, Ying1123, hnyls2002, key4ng, merrymercy, slin1237 and xiezhq-hermann as code owners December 16, 2025 17:42

github-actions Bot added the model-gateway label Dec 16, 2025

gemini-code-assist Bot reviewed Dec 16, 2025

View reviewed changes

Comment thread python/sglang/srt/entrypoints/grpc_server.py

Comment thread sgl-model-gateway/src/grpc_client/sglang_scheduler.rs

Comment thread sgl-model-gateway/src/routers/grpc/pipeline.rs Outdated

Ratish1 added 2 commits December 16, 2025 23:23

add integ test and fixes

df8b29a

more fixes

4857a2b

github-actions Bot added the run-ci label Dec 16, 2025

Merge branch 'main' into grpc-router-embeddings

a4d24b3

slin1237 requested changes Dec 17, 2025

View reviewed changes

Ratish1 added 4 commits December 17, 2025 10:30

address comments

55938de

Merge branch 'grpc-router-embeddings' of https://github.com/Ratish1/s…

8e47889

…glang into grpc-router-embeddings

fix metrics

4b259c3

more fixes

b696dfc

gemini-code-assist Bot reviewed Dec 17, 2025

View reviewed changes

Comment thread python/sglang/srt/entrypoints/grpc_server.py

Comment thread sgl-model-gateway/py_test/e2e_grpc/basic/test_embedding_server.py

Comment thread sgl-model-gateway/src/routers/grpc/regular/stages/embedding/response_processing.rs

Ratish1 added 2 commits December 17, 2025 14:29

fix test

1d653ec

change model

16704a0

Ratish1 requested a review from slin1237 December 17, 2025 13:50

Ratish1 added 2 commits December 20, 2025 15:33

Merge remote-tracking branch 'upstream/main' into grpc-router-embeddings

47c2bdb

fix

1a65148

slin1237 approved these changes Dec 22, 2025

View reviewed changes

fix conflict

6b95a76

Ratish1 requested a review from slin1237 December 22, 2025 06:52

Merge remote-tracking branch 'upstream/main' into grpc-router-embeddings

2858b44

slin1237 approved these changes Dec 23, 2025

View reviewed changes

slin1237 merged commit 5f3a47d into sgl-project:main Dec 23, 2025
101 of 111 checks passed

Ratish1 deleted the grpc-router-embeddings branch December 23, 2025 17:14

jiaming1130 pushed a commit to zhuyijie88/sglang that referenced this pull request Dec 25, 2025

[model-gateway]: add gRPC router embeddings endpoint implementation (s…

db6dca6

…gl-project#15273)

YChange01 pushed a commit to YChange01/sglang that referenced this pull request Jan 13, 2026

[model-gateway]: add gRPC router embeddings endpoint implementation (s…

2fd1dba

…gl-project#15273)

Conversation

Ratish1 commented Dec 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 16, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slin1237 commented Dec 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Ratish1 commented Dec 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

slin1237 commented Dec 22, 2025

Uh oh!

Ratish1 commented Dec 22, 2025

Uh oh!

Ratish1 commented Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Ratish1 commented Dec 16, 2025 •

edited

Loading

Ratish1 commented Dec 17, 2025 •

edited

Loading