[Observability] Add HTTP sidecar endpoints and FlushCache gRPC RPC for gRPC mode by Kangyan-Zhou · Pull Request #22500 · sgl-project/sglang

Kangyan-Zhou · 2026-04-10T06:29:42Z

Summary

Adds admin/observability endpoints for SGLang's gRPC mode using a hybrid approach:

HTTP sidecar (on --grpc-http-sidecar-port, default --port + 1):
- POST /start_profile — start torch profiler (direct engine access for per-worker profiling)
- POST /stop_profile — stop profiler and export traces
- GET /metrics — Prometheus metrics (existing, refactored)
gRPC native (via SMG router):
- FlushCache RPC — handled natively in gRPC, router fans out to all workers
- GetServerInfo RPC — server config, scheduler info, and internal states (already in smg-grpc-servicer)

Motivation

gRPC mode previously lacked profiling, cache management, and server introspection endpoints. This blocked:

Torch profiler workflows (e.g. the sglang-torch-profiler-analysis skill)
bench_serving.py and bench_one_batch_server against gRPC deployments

Design decisions

Profile endpoints on sidecar (not gRPC): In PD mode, bench_serving.py targets prefill/decode workers individually via --profile-prefill-url/--profile-decode-url. Fan-out through the router would be wrong.
FlushCache as gRPC RPC: Should fan out to all workers through the router. The router already has the HTTP /flush_cache route — now it also calls the gRPC FlushCache RPC for gRPC workers.
ServerInfo as gRPC RPC: Already enabled natively in smg-grpc-servicer, no need for a duplicate HTTP sidecar endpoint.
--grpc-http-sidecar-port: Renamed from --metrics-http-port to reflect broader purpose.

Companion PR

Requires companion change in smg-grpc-servicer: lightseekorg/smg#1088

Test plan

E2E verified on H200 with Qwen2.5-0.5B-Instruct
Single engine: /metrics 200, /start_profile 200, bad JSON 400
gRPC FlushCache direct: success=True, message="Cache flushed successfully"
Engine + Router: Generation through router 200, /flush_cache through router 200 (confirmed gRPC path: total_http_workers: 0, workers_flushed: 1)
Sidecar endpoints still work when router is running
Rust builds clean (cargo check for smg-grpc-client and smg)
CI tests

🤖 Generated with Claude Code

gemini-code-assist · 2026-04-10T06:29:47Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Kangyan-Zhou · 2026-04-10T18:34:47Z

+
+    app.router.add_post("/start_profile", start_profile_handler)
+    app.router.add_post("/stop_profile", stop_profile_handler)
+    app.router.add_get("/flush_cache", flush_cache_handler)


Some test tools use GET.. Leave it here for backward compatibility

Add /start_profile and /stop_profile endpoints to the HTTP sidecar that runs alongside the gRPC server. This enables torch profiler trace collection in gRPC mode, matching the existing HTTP-mode profiling API. The sidecar (previously metrics-only) is refactored into a unified HTTP server that always starts on --metrics-http-port (default: --port + 1), serving both Prometheus /metrics (when --enable-metrics) and profiling endpoints. Profiling is triggered via an async callback from smg-grpc-servicer's serve_grpc(), which provides access to the GrpcRequestManager after scheduler initialization. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

slin1237 · 2026-04-11T15:28:54Z

+A lightweight HTTP sidecar is started alongside the gRPC server to expose:
+- /metrics (Prometheus, when --enable-metrics is set)
+- /start_profile, /stop_profile (profiling control, for direct engine access)
+- /server_info (server configuration and internal state)


this is already enabled in rpc
can we remove this

Reviewer pointed out that server_info is already enabled natively in smg-grpc-servicer, so the duplicate HTTP sidecar endpoint is unnecessary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Fix redundant except (JSONDecodeError, Exception) → JSONDecodeError only - Fix module docstring: remove misleading "always starts" and "direct engine access" - Stop leaking raw str(e) to HTTP callers; return type name only - Replace getattr+or with is-not-None check for sidecar port - Extract _check_communicator_results helper to deduplicate result checking - Unify with_stack/record_shapes boolean idioms - Move lightweight imports (json, time, aiohttp, io_struct) to top level - Fix docstring: "ZMQ transport layer" → "transport to the scheduler" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…r gRPC mode (sgl-project#22500) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Huixxi · 2026-04-28T11:17:51Z

@Kangyan-Zhou the latest smg-grpc-servicer hasn't support the new on_request_manager_ready arg yet? How to fix?

async def serve_grpc(
    server_args: ServerArgs,
    model_info: dict | None = None,
):
    """Start the standalone gRPC server with integrated scheduler."""

Huixxi · 2026-04-28T11:19:50Z

@Kangyan-Zhou the latest smg-grpc-servicer hasn't support the new on_request_manager_ready arg yet? How to fix?
async def serve_grpc(
    server_args: ServerArgs,
    model_info: dict | None = None,
):
    """Start the standalone gRPC server with integrated scheduler."""

Should you merge lightseekorg/smg#1088 this pr first before that? @slin1237

Kangyan-Zhou mentioned this pull request Apr 10, 2026

feat(grpc): add FlushCache RPC and profiling support for gRPC mode lightseekorg/smg#1088

Open

4 tasks

Kangyan-Zhou force-pushed the profile_endpoint_grpc branch 3 times, most recently from a751b12 to 481a3ab Compare April 10, 2026 18:32

Kangyan-Zhou commented Apr 10, 2026

View reviewed changes

Kangyan-Zhou force-pushed the profile_endpoint_grpc branch from 481a3ab to df9cccf Compare April 10, 2026 19:37

Kangyan-Zhou changed the title ~~[Observability] Add profiling HTTP endpoints for gRPC mode~~ [Observability] Add HTTP sidecar endpoints and FlushCache gRPC RPC for gRPC mode Apr 10, 2026

Kangyan-Zhou marked this pull request as ready for review April 10, 2026 19:42

Kangyan-Zhou requested review from CatherineSue and slin1237 as code owners April 10, 2026 19:42

slin1237 requested changes Apr 11, 2026

View reviewed changes

Kangyan-Zhou requested a review from slin1237 April 14, 2026 02:02

Kangyan-Zhou and others added 2 commits April 13, 2026 19:06

Remove /server_info from HTTP sidecar — already available as gRPC RPC

f2a4198

Reviewer pointed out that server_info is already enabled natively in smg-grpc-servicer, so the duplicate HTTP sidecar endpoint is unnecessary. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

slin1237 approved these changes Apr 20, 2026

View reviewed changes

Kangyan-Zhou merged commit f1a70b4 into sgl-project:main Apr 23, 2026
56 of 64 checks passed

zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026

[Observability] Add HTTP sidecar endpoints and FlushCache gRPC RPC fo…

031c43b

…r gRPC mode (sgl-project#22500) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Kangyan-Zhou mentioned this pull request May 1, 2026

[CI] Restore SMG e2e on 2-gpu-h100 / 4-gpu-h100 runners #24222

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Observability] Add HTTP sidecar endpoints and FlushCache gRPC RPC for gRPC mode#22500

[Observability] Add HTTP sidecar endpoints and FlushCache gRPC RPC for gRPC mode#22500
Kangyan-Zhou merged 3 commits intosgl-project:mainfrom
Kangyan-Zhou:profile_endpoint_grpc

Kangyan-Zhou commented Apr 10, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 10, 2026

Uh oh!

Kangyan-Zhou Apr 10, 2026

Uh oh!

slin1237 Apr 11, 2026

Uh oh!

Uh oh!

Huixxi commented Apr 28, 2026

Uh oh!

Huixxi commented Apr 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Kangyan-Zhou commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Design decisions

Companion PR

Test plan

Uh oh!

gemini-code-assist Bot commented Apr 10, 2026

Uh oh!

Kangyan-Zhou Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

slin1237 Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Huixxi commented Apr 28, 2026

Uh oh!

Huixxi commented Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Kangyan-Zhou commented Apr 10, 2026 •

edited

Loading

Huixxi commented Apr 28, 2026 •

edited

Loading