Skip to content

Add raw KV cache pool gauges to SGLang Grafana dashboard #8151

@ishandhanani

Description

@ishandhanani

Context

sgl-project/sglang#22726 adds three new Prometheus gauges that expose the raw KV cache pool token counts:

  • sglang:kv_available_tokens -- free pool slots
  • sglang:kv_evictable_tokens -- radix-cached, reclaimable slots
  • sglang:kv_used_tokens -- actively pinned slots

These replace the existing single sglang:token_usage metric (which excludes evictable tokens) with the full breakdown, letting operators derive any ratio in PromQL. For example, physical usage = 1 - (kv_available_tokens / (kv_available_tokens + kv_evictable_tokens + kv_used_tokens)).

What needs to happen

Once SGLang cuts a release containing sgl-project/sglang#22726, update the SGLang Grafana dashboard (deploy/observability/grafana_dashboards/sglang.json) to add panels for:

  1. KV Pool Breakdown (stacked area) -- kv_used_tokens, kv_evictable_tokens, kv_available_tokens
  2. KV Physical Usage % (timeseries) -- derived via PromQL from the three raw gauges
  3. Consider updating the existing "GPU KV Cache Usage" panel to use the new metrics for a more complete picture

Blocked on

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions