Skip to content

[Observability] Add Prometheus metrics endpoint for gRPC mode#20801

Merged
Kangyan-Zhou merged 3 commits intosgl-project:mainfrom
Kangyan-Zhou:http_engine_metrics
Apr 10, 2026
Merged

[Observability] Add Prometheus metrics endpoint for gRPC mode#20801
Kangyan-Zhou merged 3 commits intosgl-project:mainfrom
Kangyan-Zhou:http_engine_metrics

Conversation

@Kangyan-Zhou
Copy link
Copy Markdown
Collaborator

@Kangyan-Zhou Kangyan-Zhou commented Mar 18, 2026

Summary

  • When --enable-metrics is set in gRPC mode, starts a lightweight aiohttp HTTP server to expose Prometheus /metrics endpoint
  • New --metrics-http-port arg to configure the metrics port (defaults to --port + 1)
  • Uses OpenMetrics format with multiprocess-safe Prometheus collection, matching HTTP-mode output when scraped by Prometheus
  • Entire metrics initialization is wrapped in try/except — failures never crash the gRPC server
  • Properly cleans up the metrics server on shutdown (including AppRunner leak fix on bind failure)

Changes

  • python/sglang/srt/entrypoints/grpc_server.py — new _start_metrics_server() and metrics lifecycle in serve_grpc()
  • python/sglang/srt/server_args.py — new metrics_http_port: Optional[int] field and --metrics-http-port CLI arg

Test plan

  • Start gRPC server with --enable-metrics and verify /metrics is accessible on port+1
  • Verify metrics output is valid OpenMetrics format with # EOF terminator
  • Verified on remote H200 machine (ac-h200-gpu04) with Qwen2.5-0.5B-Instruct — all scheduler metrics present
  • Test --metrics-http-port 9090 to verify custom port works
  • Test port conflict scenario — server should start without metrics and log a warning
  • Test without --enable-metrics — no metrics server should start

🤖 Generated with Claude Code

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Kangyan-Zhou Kangyan-Zhou force-pushed the http_engine_metrics branch from 836e3c5 to 518d053 Compare April 9, 2026 19:25
Kangyan-Zhou and others added 3 commits April 9, 2026 15:23
When --enable-metrics is set, start a lightweight aiohttp HTTP server
on gRPC port + 1 to expose /metrics in standard Prometheus format.
This is the standard pattern for gRPC services that need to serve
Prometheus metrics on a separate HTTP port.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Switch from prometheus_client.generate_latest() (Prometheus exposition
format) to prometheus_client.openmetrics.exposition.generate_latest()
(OpenMetrics format).

OpenMetrics converts colons to underscores in metric names (e.g.
sglang:prompt_tokens_total → sglang_prompt_tokens_total), matching the
output of make_asgi_app() used in HTTP mode. This ensures Grafana
dashboards work consistently across HTTP and gRPC serving modes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ling

- Add configurable --metrics-http-port (default: --port + 1) instead of
  hardcoded port offset
- Wrap entire metrics init in try/except so failures never crash gRPC server
- Fix AppRunner resource leak if site.start() fails
- Fix port defaulting to use `is not None` check (port 0 is valid)
- Fix inaccurate comments about make_asgi_app and OpenMetrics format

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Kangyan-Zhou Kangyan-Zhou force-pushed the http_engine_metrics branch from 518d053 to eaf2187 Compare April 9, 2026 22:25
@Kangyan-Zhou Kangyan-Zhou marked this pull request as ready for review April 9, 2026 23:22
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@Kangyan-Zhou
Copy link
Copy Markdown
Collaborator Author

/tag-and-rerun-ci

@Kangyan-Zhou Kangyan-Zhou merged commit 89553ff into sgl-project:main Apr 10, 2026
138 of 173 checks passed
Fridge003 pushed a commit that referenced this pull request Apr 11, 2026
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026
…oject#20801)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants