Skip to content

[Feature] Add KV cache metrics utilities for observability (PR #4)#24

Merged
sriumcp merged 1 commit intomainfrom
pr4ofstepstream
Jan 28, 2026
Merged

[Feature] Add KV cache metrics utilities for observability (PR #4)#24
sriumcp merged 1 commit intomainfrom
pr4ofstepstream

Conversation

@sriumcp
Copy link
Copy Markdown

@sriumcp sriumcp commented Jan 28, 2026

Summary

Add read-only KV cache observability helper module for step-level tracing (PR #4 of 5).

Changes

  • New module: vllm/v1/core/kv_cache_observability.py with dataclasses and query functions
  • Tests: 18 unit tests using minimal fakes (no scheduler coupling)

Key Features

  • Aggregates metrics across all KV cache groups (multi-group support)
  • Defensive programming (never raises exceptions)
  • Conservative fallbacks (0.0 when unmeasurable, not 1.0)
  • Python 3.9+ compatible

Testing

  • ✅ All 18 tests passing in 2.48s
  • ✅ No existing test regressions
  • ✅ Deterministic, CI-safe tests

Design Compliance

  • ✅ Read-only access only (no KV cache changes)
  • ✅ No scheduler behavior changes
  • ✅ No Request field mutations
  • ✅ No new APIs or expensive scans
  • ✅ Independent of other PRs

Part of Step-Level Tracing implementation (PR #4 of 5).

🤖 Generated with Claude Code

Add read-only KV cache observability helper module for step-level tracing.
Provides utilities to extract per-request and per-step KV cache metrics
using only existing exposed interfaces.

Key additions:
- vllm/v1/core/kv_cache_observability.py: PerRequestKVMetrics and
  StepKVSummary dataclasses with query functions
- tests/v1/core/test_kv_cache_observability.py: 18 unit tests with
  minimal fakes (17 fake-based + 1 smoke test)

Design principles:
- Read-only access to existing KV cache state
- Defensive programming (never raises exceptions)
- Aggregates across all KV cache groups (multi-group support)
- Guaranteed GPU metrics + optional best-effort fields
- No changes to KV cache behavior, scheduler, or Request fields
- No new APIs or expensive scans
- Python 3.9+ compatible (uses __future__ annotations)

Implementation details:
- Aggregates blocks across all single_type_managers (not just [0])
- Defensive clamping for blocks_total (prevents negative values)
- Conservative usage_ratio fallback (0.0 when unmeasurable)
- Tests use minimal fakes (no scheduler coupling)
- Fast, deterministic tests (2.48s, no heuristics)

All 18 tests passing. Zero impact on existing functionality.

Part of Step-Level Tracing implementation (PR #4 of 5).
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
@sriumcp sriumcp merged commit 94aab4c into main Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant