[Feat] L0 Subscriber by Oasis-Git · Pull Request #2974 · LMCache/LMCache

Oasis-Git · 2026-04-07T22:25:48Z

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Note

Medium Risk
Adds new event types and publishes an END_SESSION event from the MP server, plus introduces histogram-based lifecycle tracking that maintains in-memory shadow state; errors or overhead here could affect observability performance and correctness.

Overview
Adds L0 (GPU) KV-cache lifecycle observability by introducing L0LifecycleSubscriber, which listens to MP_VLLM_BLOCK_ALLOCATION plus a new MP_VLLM_END_SESSION event and emits OTel histograms for block lifetime, idle time before eviction, and reuse gaps.

Wires the new subscriber into MP observability initialization and publishes MP_VLLM_END_SESSION from MPCacheEngine.end_session(), and updates the event/metrics docs accordingly. Test infrastructure is updated to share a single OTel MeterProvider across metrics tests, and new unit tests validate eviction/reuse/end-session tracking and histogram emission.

^{Reviewed by Cursor Bugbot for commit d15e5e2. Bugbot is set up for automated code reviews on this repo. Configure here.}

gemini-code-assist

Code Review

This pull request implements the L0MetricsSubscriber for tracking GPU KV cache block lifecycles and reporting OpenTelemetry metrics, including block lifetime, idle time, and reuse gaps. It adds a new MP_VLLM_END_SESSION event and integrates the subscriber into the observability configuration. Feedback suggests refining metric accuracy by explicitly tracking block release times, ensuring the initial access is recorded in the reuse history, and improving type safety for allocation records via a Protocol instead of generic objects.

gemini-code-assist · 2026-04-07T22:34:02Z

+
+class _BlockStatus(Enum):
+    ACTIVE = "active"      # Owned by at least one live request.
+    RELEASED = "released"  # All owning requests have ended.
+
+
+@dataclass
+class _L0BlockState:
+    """Per-block lifecycle state in the shadow map."""
+
+    token_ids: list[int]


The _L0BlockState dataclass should include a release_time field to accurately track the idle time of a block before it is evicted. Currently, idle_time is calculated using last_access_time, which represents the start of the last usage period, not the end of it.

Suggested change

class _BlockStatus(Enum):

ACTIVE = "active" # Owned by at least one live request.

RELEASED = "released" # All owning requests have ended.

@dataclass

class _L0BlockState:

"""Per-block lifecycle state in the shadow map."""

token_ids: list[int]

@dataclass

class _L0BlockState:

"""Per-block lifecycle state in the shadow map."""

token_ids: list[int]

owners: set[str] # Set of req_ids currently using this block.

status: _BlockStatus

alloc_time: float

last_access_time: float

release_time: float | None = None

access_history: deque[float] = field(

default_factory=lambda: deque(maxlen=_MAX_ACCESS_HISTORY)

)

gemini-code-assist · 2026-04-07T22:34:02Z

+        """Process a single BlockAllocationRecord."""
+        req_id: str = record.req_id  # type: ignore[attr-defined]


When a block is marked as RELEASED, the current timestamp should be recorded as the release_time to allow for accurate idle time calculation upon eviction.

Suggested change

"""Process a single BlockAllocationRecord."""

req_id: str = record.req_id # type: ignore[attr-defined]

if not state.owners:

state.status = _BlockStatus.RELEASED

state.release_time = now

gemini-code-assist · 2026-04-07T22:34:02Z

+        if not block_ids:
+            return
+
+        num_blocks = len(block_ids)
+        block_size = (


Using object as a type hint for record and then using # type: ignore[attr-defined] is not ideal for type safety. It is better to define a Protocol that specifies the expected attributes of the record object, especially since BlockAllocationRecord is defined in another module and we want to avoid tight coupling or circular dependencies.

from typing import Protocol class _BlockAllocationRecord(Protocol): req_id: str new_block_ids: list[int] new_token_ids: list[int] def _process_record(self, record: _BlockAllocationRecord, now: float) -> None: """Process a single BlockAllocationRecord.""" req_id: str = record.req_id block_ids: list[int] = record.new_block_ids token_ids: list[int] = record.new_token_ids

gemini-code-assist · 2026-04-07T22:34:02Z

+            )
+            self._req_blocks.setdefault(req_id, set()).add(block_id)
+            return
+
+        # Block exists in shadow map.
+        if existing.token_ids == token_ids:
+            # Same content.


Initializing access_history with the current timestamp now ensures that the gap between the initial allocation and the first reuse is captured in the lmcache_mp.l0_block_reuse_gap_seconds metric. Without this, the first reuse gap is missed.

Suggested change

)

self._req_blocks.setdefault(req_id, set()).add(block_id)

return

# Block exists in shadow map.

if existing.token_ids == token_ids:

# Same content.

self._shadow[block_id] = _L0BlockState(

token_ids=token_ids,

owners={req_id},

status=_BlockStatus.ACTIVE,

alloc_time=now,

last_access_time=now,

access_history=deque([now], maxlen=_MAX_ACCESS_HISTORY)

)

gemini-code-assist · 2026-04-07T22:34:03Z

+            self._emit_eviction_metrics(existing, now)
+
+            # Clear old owners from reverse index.
+            for old_req in existing.owners:


When a block transitions from RELEASED back to ACTIVE (Case 4: true reuse), the release_time should be cleared.

Suggested change

for old_req in existing.owners:

existing.status = _BlockStatus.ACTIVE

existing.release_time = None

gemini-code-assist · 2026-04-07T22:34:03Z

+
+    def _emit_eviction_metrics(self, state: _L0BlockState, now: float) -> None:
+        """Record histogram observations for an evicted block."""
+        lifetime = now - state.alloc_time
+        idle_time = now - state.last_access_time
+
+        self._lifetime_hist.record(lifetime)


Similar to the initial allocation, when starting fresh after an eviction, access_history should be initialized with now to capture the first reuse gap of the new block lifecycle.

Suggested change

def _emit_eviction_metrics(self, state: _L0BlockState, now: float) -> None:

"""Record histogram observations for an evicted block."""

lifetime = now - state.alloc_time

idle_time = now - state.last_access_time

self._lifetime_hist.record(lifetime)

self._shadow[block_id] = _L0BlockState(

token_ids=token_ids,

owners={req_id},

status=_BlockStatus.ACTIVE,

alloc_time=now,

last_access_time=now,

access_history=deque([now], maxlen=_MAX_ACCESS_HISTORY)

)

gemini-code-assist · 2026-04-07T22:34:03Z

+        for i in range(1, len(history)):
+            gap = history[i] - history[i - 1]
+            self._reuse_gap_hist.record(gap)
+


The idle_time should be calculated as the duration between the block's release and its eviction. If the block was evicted while still active (which is rare in vLLM), the idle time can be considered zero or calculated from the last access start.

Suggested change

idle_time = now - (state.release_time if state.release_time is not None else state.last_access_time)

cursor · 2026-04-10T20:38:20Z

+                # already existed (e.g., cached request with no new tokens).
+                # Not a real allocation event; skip.
+                continue
+            self._process_block(block_id, chunk_tokens, req_id, now)


Token-to-block chunking uses incorrect distribution logic

Medium Severity

_process_record splits new_token_ids across blocks using ceiling division ((len(token_ids) + num_blocks - 1) // num_blocks), but vLLM uses a fixed block size (e.g., 16 tokens per block, with only the last block potentially shorter). If the same physical block appears in a later record with a different num_blocks, the subscriber computes a different token chunk for that block, causing existing.token_ids == token_ids to fail and triggering a false eviction detection (Case 5). This would inflate eviction metrics. A safer approach would be to use the engine's actual block size for chunking rather than inferring it from the record.

^{Reviewed by Cursor Bugbot for commit 0f34708. Configure here.}

cursor · 2026-04-10T20:38:20Z

+from lmcache.v1.mp_observability.subscribers.metrics.l0 import (
+    L0MetricsSubscriber,
+    _BlockStatus,
+)


Tests rely on private members instead of public interface

Low Severity

The test file imports the module-private _BlockStatus and extensively accesses subscriber._shadow (a private attribute) throughout many test methods. The project rules require no cross-class private member access and that tests verify the public interface and docstring contract rather than implementation details. The OTel histogram assertions already validate behavior through the public output; the _shadow assertions couple tests to internal state.

Additional Locations (1)

tests/v1/mp_observability/subscribers/metrics/test_l0.py#L138-L141

^{Triggered by project rule: LMCache Code Review Style Guide}

^{Reviewed by Cursor Bugbot for commit 0f34708. Configure here.}

ApostaC

Otherwise LGTM!

ApostaC · 2026-04-11T00:58:47Z

+            if block_id in self._skipped:
+                return


If sampling rate is very small in production, it's better to record "non-skipped" because that will be smaller

royyhuang

In addition to the EVENTS.md, please also update the METRICS.md to describe the new metrics added in this PR. Otherwise, LGTM!

- Add L0MetricsSubscriber with shadow-based block tracking and OTel histograms (lifetime, idle_before_evict, reuse_gap) - Use _sampled set instead of _skipped for O(N*sample_rate) memory usage - Add MP_VLLM_BLOCK_ALLOCATION and MP_VLLM_END_SESSION events to server.py - Update EVENTS.md and METRICS.md with L0, L2, and MP_VLLM event/metric docs - Add unit tests for L0MetricsSubscriber Signed-off-by: yuwei <yuwei@dev.local>

Signed-off-by: yuwei <yuwei@dev.local>

The _sampled approach re-rolls the dice on every appearance of an untracked block, causing the effective sample rate to inflate over time (e.g. 10% becomes 20%+) and tracking to start mid-life with wrong alloc_time. The _skipped set makes a once-and-final decision. Signed-off-by: yuwei <yuwei@dev.local>

ApostaC

LGTM!

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit af7d40b. Configure here.}

cursor · 2026-04-13T23:03:44Z

+        # Shadow map: physical block_id -> lifecycle state.
+        self._shadow: dict[int, _L0BlockState] = {}
+        # Set of block_ids we decided NOT to sample.
+        self._skipped: set[int] = set()


Skipped set stores 99% of block IDs unnecessarily

Low Severity

The _skipped set records every block ID that was not sampled. With the default 1% sample_rate, this set holds ~99% of all unique block IDs, while _shadow (which already tracks sampled blocks) holds ~1%. Since _shadow.keys() already identifies sampled blocks, the _skipped set is an inverted representation that uses ~99× more memory than needed. Checking block_id not in self._shadow instead would eliminate this set entirely.

Additional Locations (1)

lmcache/v1/mp_observability/subscribers/metrics/l0_lifecycle.py#L195-L201

^{Triggered by project rule: LMCache Code Review Style Guide}

^{Reviewed by Cursor Bugbot for commit af7d40b. Configure here.}

conftest.py is loaded by pytest as a plugin, not as a regular module, so explicit imports create a second instance with a separate reader. Use a regular module (otel_setup.py) instead. Signed-off-by: yuwei <yuwei@dev.local>

Signed-off-by: yuwei <yuwei@dev.local> Co-authored-by: yuwei <yuwei@dev.local> Signed-off-by: baoloongmao <baoloongmao@tencent.com>

Signed-off-by: yuwei <yuwei@dev.local> Co-authored-by: yuwei <yuwei@dev.local>

Oasis-Git requested review from ApostaC, deng451e, hickeyma and sammshen as code owners April 7, 2026 22:25

Oasis-Git force-pushed the l0 branch from 387725a to 2832813 Compare April 7, 2026 22:32

gemini-code-assist Bot reviewed Apr 7, 2026

View reviewed changes

Oasis-Git added the full Run comprehensive tests on this PR label Apr 9, 2026

cursor Bot reviewed Apr 10, 2026

View reviewed changes

ApostaC requested a review from royyhuang April 10, 2026 22:46

cursor Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread lmcache/v1/mp_observability/subscribers/metrics/l0_lifecycle.py

ApostaC reviewed Apr 11, 2026

View reviewed changes

royyhuang approved these changes Apr 13, 2026

View reviewed changes

cursor Bot reviewed Apr 13, 2026

View reviewed changes

Comment thread lmcache/v1/mp_observability/subscribers/metrics/l0.py Outdated

Oasis-Git force-pushed the l0 branch from 4c5e644 to 27f1fbc Compare April 13, 2026 20:04

Oasis-Git requested review from DongDongJu, KuntaiDu, YaoJiayi, chunxiaozheng and maobaolong as code owners April 13, 2026 20:04

cursor Bot reviewed Apr 13, 2026

View reviewed changes

Oasis-Git force-pushed the l0 branch from 27f1fbc to 3550c48 Compare April 13, 2026 20:08

yuwei added 2 commits April 13, 2026 20:21

Rename L0MetricsSubscriber to L0LifecycleSubscriber in separate file

fa3780e

Signed-off-by: yuwei <yuwei@dev.local>

ApostaC approved these changes Apr 13, 2026

View reviewed changes

ApostaC enabled auto-merge (squash) April 13, 2026 22:47

Oasis-Git force-pushed the l0 branch from 0e1860b to af7d40b Compare April 13, 2026 22:54

cursor Bot reviewed Apr 13, 2026

View reviewed changes

Oasis-Git force-pushed the l0 branch from af7d40b to d15e5e2 Compare April 14, 2026 00:43

ApostaC merged commit a6ee913 into LMCache:dev Apr 14, 2026
37 of 38 checks passed

ekaynar pushed a commit to ekaynar/LMCache that referenced this pull request Apr 15, 2026

[Feat] L0 KV cache lifecycle event subscriber (LMCache#2974)

919c66d

Signed-off-by: yuwei <yuwei@dev.local> Co-authored-by: yuwei <yuwei@dev.local>

ftian1 pushed a commit to ftian1/LMCache that referenced this pull request Apr 20, 2026

[Feat] L0 KV cache lifecycle event subscriber (LMCache#2974)

273e1d7

Signed-off-by: yuwei <yuwei@dev.local> Co-authored-by: yuwei <yuwei@dev.local>

		"""Process a single BlockAllocationRecord."""
		req_id: str = record.req_id # type: ignore[attr-defined]

-        """Process a single BlockAllocationRecord."""
-        req_id: str = record.req_id  # type: ignore[attr-defined]
+            if not state.owners:
+                state.status = _BlockStatus.RELEASED
+                state.release_time = now

-            )
-            self._req_blocks.setdefault(req_id, set()).add(block_id)
-            return
-        # Block exists in shadow map.
-        if existing.token_ids == token_ids:
-            # Same content.
+            self._shadow[block_id] = _L0BlockState(
+                token_ids=token_ids,
+                owners={req_id},
+                status=_BlockStatus.ACTIVE,
+                alloc_time=now,
+                last_access_time=now,
+                access_history=deque([now], maxlen=_MAX_ACCESS_HISTORY)
+            )

	for old_req in existing.owners:
	existing.status = _BlockStatus.ACTIVE
	existing.release_time = None


	idle_time = now - (state.release_time if state.release_time is not None else state.last_access_time)

Conversation

Oasis-Git commented Apr 7, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 10, 2026

Choose a reason for hiding this comment

Token-to-block chunking uses incorrect distribution logic

Uh oh!

cursor Bot Apr 10, 2026

Choose a reason for hiding this comment

Tests rely on private members instead of public interface

Uh oh!

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ApostaC Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

royyhuang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor Bot Apr 13, 2026

Choose a reason for hiding this comment

Skipped set stores 99% of block IDs unnecessarily

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Oasis-Git commented Apr 7, 2026 •

edited by cursor Bot

Loading