feat(metrics): expose raw KV cache pool token counts as prometheus gauges by ishandhanani · Pull Request #22726 · sgl-project/sglang

ishandhanani · 2026-04-13T21:32:04Z

Summary

Expose three Prometheus gauges for the raw KV cache pool token counts:

sglang:kv_available_tokens -- free pool slots
sglang:kv_evictable_tokens -- radix-cached, reclaimable slots
sglang:kv_used_tokens -- actively pinned slots

Motivation

The existing sglang:token_usage metric reports only non-evictable tokens (active requests + pinned sessions). Evictable radix cache nodes are excluded, making the pool appear emptier than it is. This matters for agentic workloads where subagent KV lingers in the radix tree after completion -- token_usage shows ~2% while physical GPU memory is 72% consumed.

Exposing the raw counts at the most natural granularity lets operators derive any ratio they need in PromQL/Grafana, e.g.:

Physical usage: 1 - (kv_available_tokens / (kv_available_tokens + kv_evictable_tokens + kv_used_tokens))
Evictable fraction: kv_evictable_tokens / (kv_available_tokens + kv_evictable_tokens + kv_used_tokens)

Changes

metrics_collector.py: Add kv_available_tokens, kv_evictable_tokens, kv_used_tokens fields to SchedulerStats, add Gauges to SchedulerMetricsCollector, log in log_stats()
scheduler_runtime_checker_mixin.py: Plumb raw counts from PoolStats.update_scheduler_stats()

gemini-code-assist · 2026-04-13T21:32:08Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist · 2026-04-13T21:35:07Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

ishandhanani · 2026-04-13T21:44:43Z

/rerun-stage stage-a-test-1-gpu-small

ishandhanani · 2026-04-13T21:44:43Z

/rerun-stage stage-b-test-1-gpu-small

github-actions · 2026-04-13T21:45:11Z

✅ Triggered stage-b-test-1-gpu-small to run independently (skipping dependencies). View workflow run

github-actions · 2026-04-13T21:45:14Z

✅ Triggered stage-a-test-1-gpu-small to run independently (skipping dependencies). View workflow run

ishandhanani · 2026-04-13T22:41:06Z

/rerun-stage stage-a-test-1-gpu-small

ishandhanani · 2026-04-13T22:41:16Z

/rerun-stage stage-b-test-1-gpu-small

github-actions · 2026-04-13T22:41:30Z

✅ Triggered stage-a-test-1-gpu-small to run independently (skipping dependencies). View workflow run

github-actions · 2026-04-13T22:41:41Z

✅ Triggered stage-b-test-1-gpu-small to run independently (skipping dependencies). View workflow run

ishandhanani · 2026-04-14T01:30:33Z

The correct CI passed

…uges (sgl-project#22726)

Closes #8151 (now unblocked: sgl-project/sglang#22726 landed in v0.5.11, which is the new floor after this PR). Adds a "KV Pool Detail" row to deploy/observability/grafana_dashboards/sglang.json with two new panels driven by the gauges added in 0.5.11: * `KV Pool Breakdown (tokens)` — stacked timeseries of `sglang:kv_used_tokens` (locked by running requests), `sglang:kv_evictable_tokens` (radix-cached, reclaimable), and `sglang:kv_available_tokens` (free). The three series sum to <= `sglang:max_total_num_tokens` per the invariant documented in SGLang's metrics_collector.py. * `KV Pool Physical Usage %` — `(1 - kv_available / (kv_available + kv_evictable + kv_used)) * 100`. Captures true pool occupancy including evictable slots, vs. `sglang:token_usage` which excludes them. 90% threshold drawn in red for the "no headroom even after evict" case. The existing `GPU KV Cache Usage %` panel (driven by `sglang:token_usage`) is unchanged — it's still useful as the "bottleneck across full / SWA / mamba pools" view that the new gauges don't replicate. Verified live on a Qwen/Qwen3-0.6B agg worker: all three gauges export at `<system_port>/metrics`, and `kv_available + kv_evictable + kv_used` = `max_total_num_tokens` after a real request.

ishandhanani changed the title ~~total kv stat~~ feat(metrics): add kv_physical_usage gauge for physical KV cache occupancy Apr 13, 2026

ishandhanani marked this pull request as ready for review April 13, 2026 21:34

ishandhanani requested review from Ying1123, fzyzcjy, hnyls2002, merrymercy, sufeng-buaa and xiezhq-hermann as code owners April 13, 2026 21:34

expose raw kv pool token counts as prometheus gauges

3edcc0f

ishandhanani force-pushed the ishan/morestat branch from 2eee7f6 to 3edcc0f Compare April 13, 2026 22:34

ishandhanani changed the title ~~feat(metrics): add kv_physical_usage gauge for physical KV cache occupancy~~ feat(metrics): expose raw KV cache pool token counts as prometheus gauges Apr 13, 2026

ishandhanani merged commit cc449ac into main Apr 14, 2026
106 of 114 checks passed

ishandhanani deleted the ishan/morestat branch April 14, 2026 01:30

pyc96 pushed a commit to pyc96/sglang that referenced this pull request Apr 14, 2026

feat(metrics): expose raw KV cache pool token counts as prometheus ga…

9444717

…uges (sgl-project#22726)

ishandhanani mentioned this pull request Apr 14, 2026

Add raw KV cache pool gauges to SGLang Grafana dashboard ai-dynamo/dynamo#8151

Open

yhyang201 pushed a commit to yhyang201/sglang that referenced this pull request Apr 22, 2026

feat(metrics): expose raw KV cache pool token counts as prometheus ga…

f850367

…uges (sgl-project#22726)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(metrics): expose raw KV cache pool token counts as prometheus gauges#22726

feat(metrics): expose raw KV cache pool token counts as prometheus gauges#22726
ishandhanani merged 1 commit intomainfrom
ishan/morestat

ishandhanani commented Apr 13, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot commented Apr 13, 2026

Uh oh!

ishandhanani commented Apr 13, 2026

Uh oh!

ishandhanani commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

ishandhanani commented Apr 13, 2026

Uh oh!

ishandhanani commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

ishandhanani commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ishandhanani commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation

Changes

Uh oh!

gemini-code-assist Bot commented Apr 13, 2026

Uh oh!

gemini-code-assist Bot commented Apr 13, 2026

Uh oh!

ishandhanani commented Apr 13, 2026

Uh oh!

ishandhanani commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

ishandhanani commented Apr 13, 2026

Uh oh!

ishandhanani commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

github-actions Bot commented Apr 13, 2026

Uh oh!

ishandhanani commented Apr 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ishandhanani commented Apr 13, 2026 •

edited

Loading