Context
sgl-project/sglang#22726 adds three new Prometheus gauges that expose the raw KV cache pool token counts:
sglang:kv_available_tokens -- free pool slots
sglang:kv_evictable_tokens -- radix-cached, reclaimable slots
sglang:kv_used_tokens -- actively pinned slots
These replace the existing single sglang:token_usage metric (which excludes evictable tokens) with the full breakdown, letting operators derive any ratio in PromQL. For example, physical usage = 1 - (kv_available_tokens / (kv_available_tokens + kv_evictable_tokens + kv_used_tokens)).
What needs to happen
Once SGLang cuts a release containing sgl-project/sglang#22726, update the SGLang Grafana dashboard (deploy/observability/grafana_dashboards/sglang.json) to add panels for:
- KV Pool Breakdown (stacked area) --
kv_used_tokens, kv_evictable_tokens, kv_available_tokens
- KV Physical Usage % (timeseries) -- derived via PromQL from the three raw gauges
- Consider updating the existing "GPU KV Cache Usage" panel to use the new metrics for a more complete picture
Blocked on
Context
sgl-project/sglang#22726 adds three new Prometheus gauges that expose the raw KV cache pool token counts:
sglang:kv_available_tokens-- free pool slotssglang:kv_evictable_tokens-- radix-cached, reclaimable slotssglang:kv_used_tokens-- actively pinned slotsThese replace the existing single
sglang:token_usagemetric (which excludes evictable tokens) with the full breakdown, letting operators derive any ratio in PromQL. For example, physical usage =1 - (kv_available_tokens / (kv_available_tokens + kv_evictable_tokens + kv_used_tokens)).What needs to happen
Once SGLang cuts a release containing sgl-project/sglang#22726, update the SGLang Grafana dashboard (
deploy/observability/grafana_dashboards/sglang.json) to add panels for:kv_used_tokens,kv_evictable_tokens,kv_available_tokensBlocked on