[Feature] Add KV cache metrics utilities for observability (PR #4) by sriumcp · Pull Request #24 · inference-sim/vllm

sriumcp · 2026-01-28T18:41:41Z

Summary

Add read-only KV cache observability helper module for step-level tracing (PR #4 of 5).

Changes

New module: vllm/v1/core/kv_cache_observability.py with dataclasses and query functions
Tests: 18 unit tests using minimal fakes (no scheduler coupling)

Key Features

Aggregates metrics across all KV cache groups (multi-group support)
Defensive programming (never raises exceptions)
Conservative fallbacks (0.0 when unmeasurable, not 1.0)
Python 3.9+ compatible

Testing

✅ All 18 tests passing in 2.48s
✅ No existing test regressions
✅ Deterministic, CI-safe tests

Design Compliance

✅ Read-only access only (no KV cache changes)
✅ No scheduler behavior changes
✅ No Request field mutations
✅ No new APIs or expensive scans
✅ Independent of other PRs

Part of Step-Level Tracing implementation (PR #4 of 5).

🤖 Generated with Claude Code

Add read-only KV cache observability helper module for step-level tracing. Provides utilities to extract per-request and per-step KV cache metrics using only existing exposed interfaces. Key additions: - vllm/v1/core/kv_cache_observability.py: PerRequestKVMetrics and StepKVSummary dataclasses with query functions - tests/v1/core/test_kv_cache_observability.py: 18 unit tests with minimal fakes (17 fake-based + 1 smoke test) Design principles: - Read-only access to existing KV cache state - Defensive programming (never raises exceptions) - Aggregates across all KV cache groups (multi-group support) - Guaranteed GPU metrics + optional best-effort fields - No changes to KV cache behavior, scheduler, or Request fields - No new APIs or expensive scans - Python 3.9+ compatible (uses __future__ annotations) Implementation details: - Aggregates blocks across all single_type_managers (not just [0]) - Defensive clamping for blocks_total (prevents negative values) - Conservative usage_ratio fallback (0.0 when unmeasurable) - Tests use minimal fakes (no scheduler coupling) - Fast, deterministic tests (2.48s, no heuristics) All 18 tests passing. Zero impact on existing functionality. Part of Step-Level Tracing implementation (PR #4 of 5). Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

sriumcp merged commit 94aab4c into main Jan 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] Add KV cache metrics utilities for observability (PR #4)#24

[Feature] Add KV cache metrics utilities for observability (PR #4)#24
sriumcp merged 1 commit intomainfrom
pr4ofstepstream

sriumcp commented Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sriumcp commented Jan 28, 2026

Summary

Changes

Key Features

Testing

Design Compliance

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant