Add raw KV cache pool gauges to SGLang Grafana dashboard

## Context

sgl-project/sglang#22726 adds three new Prometheus gauges that expose the raw KV cache pool token counts:

- `sglang:kv_available_tokens` -- free pool slots
- `sglang:kv_evictable_tokens` -- radix-cached, reclaimable slots
- `sglang:kv_used_tokens` -- actively pinned slots

These replace the existing single `sglang:token_usage` metric (which excludes evictable tokens) with the full breakdown, letting operators derive any ratio in PromQL. For example, physical usage = `1 - (kv_available_tokens / (kv_available_tokens + kv_evictable_tokens + kv_used_tokens))`.

## What needs to happen

Once SGLang cuts a release containing sgl-project/sglang#22726, update the SGLang Grafana dashboard (`deploy/observability/grafana_dashboards/sglang.json`) to add panels for:

1. **KV Pool Breakdown** (stacked area) -- `kv_used_tokens`, `kv_evictable_tokens`, `kv_available_tokens`
2. **KV Physical Usage %** (timeseries) -- derived via PromQL from the three raw gauges
3. Consider updating the existing "GPU KV Cache Usage" panel to use the new metrics for a more complete picture

## Blocked on

- SGLang release containing sgl-project/sglang#22726

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add raw KV cache pool gauges to SGLang Grafana dashboard #8151

Context

What needs to happen

Blocked on

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add raw KV cache pool gauges to SGLang Grafana dashboard #8151

Description

Context

What needs to happen

Blocked on

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions