[model-gateway] Tighten visibility across data_connector and grpc module#16516
Merged
[model-gateway] Tighten visibility across data_connector and grpc module#16516
data_connector and grpc module#16516Conversation
1. StreamingState struct - Removed entirely. The streaming implementation uses local variables within functions instead of this struct.
2. Removed fields from ResponseState:
- streaming: StreamingState
- collected: Option<Vec<ProtoGenerateComplete>>
- collected_embeddings: Option<Vec<ProtoEmbedComplete>>
- harmony_parser: Option<HarmonyParserAdapter>
- harmony_parser_per_index: Option<HashMap<...>>
3. Removed unused accessor methods from impl RequestContext:
- request()
- responses_request()
- embedding_request()
- embedding_request_arc()
- classify_request()
- classify_request_arc()
4. Removed unused imports: HashMap, Value, ProtoGenerateComplete
The code is now cleaner with only the actually-used items remaining.
- Kept the request_id() method in impl ProtoRequest - it was actually being used in dispatch_metadata.rs:32 - Removed as_generate() and as_embed() methods since they were truly unused (code uses pattern matching instead)
1. embedding/response_processing.rs:
- Removed .clone() and Ok(Some(Json(...)))
- Now returns Ok(None) like Chat/Generate
2. classify/response_processing.rs:
- Same change - returns Ok(None)
3. pipeline.rs execute_embeddings:
- Changed from error case to extract response from FinalResponse::Embedding(response)
- Returns axum::Json(response).into_response()
4. pipeline.rs execute_classify:
- Same pattern - extracts and returns the response
5. context.rs FinalResponse:
- Removed #[allow(dead_code)] from Embedding and Classify variants
All four endpoint types (Chat, Generate, Embedding, Classify) now follow the same pipeline pattern:
- Stage stores response in FinalResponse::Variant(response) and returns Ok(None)
- Pipeline extracts response from FinalResponse and converts to HTTP response
1. Removed model_for_metrics clone 2. Use model_id.clone() when passing to context (keeps original for metrics) 3. Use UNKNOWN_MODEL_ID constant instead of hardcoded "unknown"
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
data_connector and grpc module
slin1237
approved these changes
Jan 5, 2026
5 tasks
jamesjxliu
pushed a commit
to jamesjxliu/sglang
that referenced
this pull request
Jan 6, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
This PR tightens the visibility of structs and functions across the source files in sgl-model-gateway, mainly
data_connectorandgrpcmodule.As a result, some dead code are appearing (as clippy masks dead code when it is
pub), the unused methods are removed.Modifications
Tighten visibility across the repo.
Add
allow(dead_code)to the code that are unused.Removed dead code in grpc/context.rs
1.
StreamingStatestruct - Removed entirely. The streaming implementation uses local variables within functions instead of this struct.2. Removed fields from ResponseState:
-
streaming: StreamingState-
collected: Option<Vec<ProtoGenerateComplete>>-
collected_embeddings: Option<Vec<ProtoEmbedComplete>>-
harmony_parser: Option<HarmonyParserAdapter>-
harmony_parser_per_index: Option<HashMap<...>>3. Removed unused accessor methods from impl RequestContext:
-
request()-
responses_request()-
embedding_request()-
embedding_request_arc()-
classify_request()-
classify_request_arc()4. Removed unused imports:
HashMap, Value, ProtoGenerateCompleteRemove unused methods in
impl ProtoRequest- Removed
as_generate()andas_embed()methods since they were truly unused (code uses pattern matching instead)Remove unused methods in
impl HarmonyResponsesContextFix inconsistent patterns in Embedding and Classify
1.
embedding/response_processing.rs:- Removed
.clone()andOk(Some(Json(...)))- Now returns
Ok(None)like Chat/Generate2. classify/response_processing.rs:
- Same change - returns Ok(None)
3. pipeline.rs execute_embeddings:
- Changed from error case to extract response from FinalResponse::Embedding(response)
- Returns axum::Json(response).into_response()
4. pipeline.rs execute_classify:
- Same pattern - extracts and returns the response
5. context.rs FinalResponse:
- Removed
#[allow(dead_code)]from Embedding and Classify variantsAll four endpoint types (Chat, Generate, Embedding, Classify) now follow the same pipeline pattern:
- Stage stores response in FinalResponse::Variant(response) and returns Ok(None)
- Pipeline extracts response from FinalResponse and converts to HTTP response
pipeline.rsexecute_generateto use hardcodedUNKNOWN_MODEL_IDconstantmodel_for_metricscloneAccuracy Tests
Benchmarking and Profiling
Checklist
Review Process
/tag-run-ci-label,/rerun-failed-ci,/tag-and-rerun-ci) or contact authorized users to do so.