[Observability] Add HTTP sidecar endpoints and FlushCache gRPC RPC for gRPC mode#22500
Merged
Kangyan-Zhou merged 3 commits intosgl-project:mainfrom Apr 23, 2026
Merged
Conversation
Contributor
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
4 tasks
a751b12 to
481a3ab
Compare
Kangyan-Zhou
commented
Apr 10, 2026
|
|
||
| app.router.add_post("/start_profile", start_profile_handler) | ||
| app.router.add_post("/stop_profile", stop_profile_handler) | ||
| app.router.add_get("/flush_cache", flush_cache_handler) |
Collaborator
Author
There was a problem hiding this comment.
Some test tools use GET.. Leave it here for backward compatibility
Add /start_profile and /stop_profile endpoints to the HTTP sidecar that runs alongside the gRPC server. This enables torch profiler trace collection in gRPC mode, matching the existing HTTP-mode profiling API. The sidecar (previously metrics-only) is refactored into a unified HTTP server that always starts on --metrics-http-port (default: --port + 1), serving both Prometheus /metrics (when --enable-metrics) and profiling endpoints. Profiling is triggered via an async callback from smg-grpc-servicer's serve_grpc(), which provides access to the GrpcRequestManager after scheduler initialization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
481a3ab to
df9cccf
Compare
slin1237
requested changes
Apr 11, 2026
| A lightweight HTTP sidecar is started alongside the gRPC server to expose: | ||
| - /metrics (Prometheus, when --enable-metrics is set) | ||
| - /start_profile, /stop_profile (profiling control, for direct engine access) | ||
| - /server_info (server configuration and internal state) |
Collaborator
There was a problem hiding this comment.
this is already enabled in rpc
can we remove this
Reviewer pointed out that server_info is already enabled natively in smg-grpc-servicer, so the duplicate HTTP sidecar endpoint is unnecessary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix redundant except (JSONDecodeError, Exception) → JSONDecodeError only - Fix module docstring: remove misleading "always starts" and "direct engine access" - Stop leaking raw str(e) to HTTP callers; return type name only - Replace getattr+or with is-not-None check for sidecar port - Extract _check_communicator_results helper to deduplicate result checking - Unify with_stack/record_shapes boolean idioms - Move lightweight imports (json, time, aiohttp, io_struct) to top level - Fix docstring: "ZMQ transport layer" → "transport to the scheduler" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
slin1237
approved these changes
Apr 20, 2026
zhangying098
pushed a commit
to zhangying098/sglang
that referenced
this pull request
Apr 23, 2026
…r gRPC mode (sgl-project#22500) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Contributor
|
@Kangyan-Zhou the latest smg-grpc-servicer hasn't support the new on_request_manager_ready arg yet? How to fix? |
Contributor
Should you merge lightseekorg/smg#1088 this pr first before that? @slin1237 |
6 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds admin/observability endpoints for SGLang's gRPC mode using a hybrid approach:
HTTP sidecar (on
--grpc-http-sidecar-port, default--port + 1):POST /start_profile— start torch profiler (direct engine access for per-worker profiling)POST /stop_profile— stop profiler and export tracesGET /metrics— Prometheus metrics (existing, refactored)gRPC native (via SMG router):
FlushCacheRPC — handled natively in gRPC, router fans out to all workersGetServerInfoRPC — server config, scheduler info, and internal states (already in smg-grpc-servicer)Motivation
gRPC mode previously lacked profiling, cache management, and server introspection endpoints. This blocked:
bench_serving.pyandbench_one_batch_serveragainst gRPC deploymentsDesign decisions
bench_serving.pytargets prefill/decode workers individually via--profile-prefill-url/--profile-decode-url. Fan-out through the router would be wrong./flush_cacheroute — now it also calls the gRPC FlushCache RPC for gRPC workers.--grpc-http-sidecar-port: Renamed from--metrics-http-portto reflect broader purpose.Companion PR
Requires companion change in smg-grpc-servicer: lightseekorg/smg#1088
Test plan
/metrics200,/start_profile200, bad JSON 400success=True, message="Cache flushed successfully"/flush_cachethrough router 200 (confirmed gRPC path:total_http_workers: 0, workers_flushed: 1)cargo checkfor smg-grpc-client and smg)🤖 Generated with Claude Code