Skip to content

[Observability] Add HTTP sidecar endpoints and FlushCache gRPC RPC for gRPC mode#22500

Merged
Kangyan-Zhou merged 3 commits intosgl-project:mainfrom
Kangyan-Zhou:profile_endpoint_grpc
Apr 23, 2026
Merged

[Observability] Add HTTP sidecar endpoints and FlushCache gRPC RPC for gRPC mode#22500
Kangyan-Zhou merged 3 commits intosgl-project:mainfrom
Kangyan-Zhou:profile_endpoint_grpc

Conversation

@Kangyan-Zhou
Copy link
Copy Markdown
Collaborator

@Kangyan-Zhou Kangyan-Zhou commented Apr 10, 2026

Summary

Adds admin/observability endpoints for SGLang's gRPC mode using a hybrid approach:

  • HTTP sidecar (on --grpc-http-sidecar-port, default --port + 1):

    • POST /start_profile — start torch profiler (direct engine access for per-worker profiling)
    • POST /stop_profile — stop profiler and export traces
    • GET /metrics — Prometheus metrics (existing, refactored)
  • gRPC native (via SMG router):

    • FlushCache RPC — handled natively in gRPC, router fans out to all workers
    • GetServerInfo RPC — server config, scheduler info, and internal states (already in smg-grpc-servicer)

Motivation

gRPC mode previously lacked profiling, cache management, and server introspection endpoints. This blocked:

Design decisions

  • Profile endpoints on sidecar (not gRPC): In PD mode, bench_serving.py targets prefill/decode workers individually via --profile-prefill-url/--profile-decode-url. Fan-out through the router would be wrong.
  • FlushCache as gRPC RPC: Should fan out to all workers through the router. The router already has the HTTP /flush_cache route — now it also calls the gRPC FlushCache RPC for gRPC workers.
  • ServerInfo as gRPC RPC: Already enabled natively in smg-grpc-servicer, no need for a duplicate HTTP sidecar endpoint.
  • --grpc-http-sidecar-port: Renamed from --metrics-http-port to reflect broader purpose.

Companion PR

Requires companion change in smg-grpc-servicer: lightseekorg/smg#1088

Test plan

  • E2E verified on H200 with Qwen2.5-0.5B-Instruct
  • Single engine: /metrics 200, /start_profile 200, bad JSON 400
  • gRPC FlushCache direct: success=True, message="Cache flushed successfully"
  • Engine + Router: Generation through router 200, /flush_cache through router 200 (confirmed gRPC path: total_http_workers: 0, workers_flushed: 1)
  • Sidecar endpoints still work when router is running
  • Rust builds clean (cargo check for smg-grpc-client and smg)
  • CI tests

🤖 Generated with Claude Code

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!


app.router.add_post("/start_profile", start_profile_handler)
app.router.add_post("/stop_profile", stop_profile_handler)
app.router.add_get("/flush_cache", flush_cache_handler)
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some test tools use GET.. Leave it here for backward compatibility

Add /start_profile and /stop_profile endpoints to the HTTP sidecar
that runs alongside the gRPC server. This enables torch profiler
trace collection in gRPC mode, matching the existing HTTP-mode
profiling API.

The sidecar (previously metrics-only) is refactored into a unified
HTTP server that always starts on --metrics-http-port (default:
--port + 1), serving both Prometheus /metrics (when --enable-metrics)
and profiling endpoints.

Profiling is triggered via an async callback from smg-grpc-servicer's
serve_grpc(), which provides access to the GrpcRequestManager after
scheduler initialization.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Kangyan-Zhou Kangyan-Zhou force-pushed the profile_endpoint_grpc branch from 481a3ab to df9cccf Compare April 10, 2026 19:37
@Kangyan-Zhou Kangyan-Zhou changed the title [Observability] Add profiling HTTP endpoints for gRPC mode [Observability] Add HTTP sidecar endpoints and FlushCache gRPC RPC for gRPC mode Apr 10, 2026
@Kangyan-Zhou Kangyan-Zhou marked this pull request as ready for review April 10, 2026 19:42
A lightweight HTTP sidecar is started alongside the gRPC server to expose:
- /metrics (Prometheus, when --enable-metrics is set)
- /start_profile, /stop_profile (profiling control, for direct engine access)
- /server_info (server configuration and internal state)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is already enabled in rpc
can we remove this

@Kangyan-Zhou Kangyan-Zhou requested a review from slin1237 April 14, 2026 02:02
Kangyan-Zhou and others added 2 commits April 13, 2026 19:06
Reviewer pointed out that server_info is already enabled natively in
smg-grpc-servicer, so the duplicate HTTP sidecar endpoint is unnecessary.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Fix redundant except (JSONDecodeError, Exception) → JSONDecodeError only
- Fix module docstring: remove misleading "always starts" and "direct engine access"
- Stop leaking raw str(e) to HTTP callers; return type name only
- Replace getattr+or with is-not-None check for sidecar port
- Extract _check_communicator_results helper to deduplicate result checking
- Unify with_stack/record_shapes boolean idioms
- Move lightweight imports (json, time, aiohttp, io_struct) to top level
- Fix docstring: "ZMQ transport layer" → "transport to the scheduler"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Kangyan-Zhou Kangyan-Zhou merged commit f1a70b4 into sgl-project:main Apr 23, 2026
56 of 64 checks passed
zhangying098 pushed a commit to zhangying098/sglang that referenced this pull request Apr 23, 2026
…r gRPC mode (sgl-project#22500)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Huixxi
Copy link
Copy Markdown
Contributor

Huixxi commented Apr 28, 2026

@Kangyan-Zhou the latest smg-grpc-servicer hasn't support the new on_request_manager_ready arg yet? How to fix?

async def serve_grpc(
    server_args: ServerArgs,
    model_info: dict | None = None,
):
    """Start the standalone gRPC server with integrated scheduler."""

@Huixxi
Copy link
Copy Markdown
Contributor

Huixxi commented Apr 28, 2026

@Kangyan-Zhou the latest smg-grpc-servicer hasn't support the new on_request_manager_ready arg yet? How to fix?

async def serve_grpc(
    server_args: ServerArgs,
    model_info: dict | None = None,
):
    """Start the standalone gRPC server with integrated scheduler."""

Should you merge lightseekorg/smg#1088 this pr first before that? @slin1237

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants