[Feat] L1 Subscriber by Oasis-Git · Pull Request #2986 · LMCache/LMCache

Oasis-Git · 2026-04-08T22:34:23Z

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

this PR contains user facing changes - docs added
this PR contains unit tests

Note

Medium Risk
Adds a new event-bus subscriber and default-registered histogram tracking for L1 chunk lifecycles, which can introduce overhead and additional memory use (shadow maps) in production if sample rate is misconfigured.

Overview
Adds a new L1LifecycleSubscriber that records sampled OTel histograms for L1 chunk lifetime, idle-before-evict, reuse gaps, and evict-to-reuse gaps based on L1 read/write/evict events.

Introduces a new --metrics-sample-rate/metrics_sample_rate setting (default 0.01) and wires it into subscriber registration so both L0LifecycleSubscriber and the new L1 lifecycle tracking use the same sampling control. Updates observability docs/metric references and adds unit tests validating histogram recording, deterministic sampling, and eviction sweep behavior.

^{Reviewed by Cursor Bugbot for commit a1e30f5. Bugbot is set up for automated code reviews on this repo. Configure here.}

gemini-code-assist

Code Review

This pull request introduces GPU (L0) KV cache block lifecycle tracking through the new L0MetricsSubscriber and expands L1MetricsSubscriber to include chunk lifecycle histograms. It also implements the MP_VLLM_END_SESSION event to facilitate accurate session-based block release tracking. Feedback suggests improving the L0 reuse gap metrics by recording them at the moment of a cache hit rather than waiting for eviction, which would prevent data loss for long-lived blocks and capture the initial reuse gap more accurately.

gemini-code-assist · 2026-04-08T22:38:47Z

+        for i in range(1, len(history)):
+            gap = history[i] - history[i - 1]
+            self._reuse_gap_hist.record(gap)
+
+    # -- Sampling ----------------------------------------------------------


The current logic for recording reuse gaps in _emit_eviction_metrics misses the gap between the initial allocation (alloc_time) and the first reuse event stored in access_history.

Additionally, recording reuse gaps only at eviction time (Case 5) means that metrics for long-lived blocks that are never evicted will never be reported. Consider recording the reuse gap immediately when a hit is detected in _process_block (Case 4), similar to how it is handled in L1MetricsSubscriber. When implementing this, ensure that distinct variable names are used for each phase's results to prevent accidental overwriting and ensure accurate reporting in summaries.

References

When performance metrics are collected for different phases of an operation, ensure that distinct variable names are used for each phase's results to prevent accidental overwriting and ensure accurate reporting in summaries.

ApostaC

See below

ApostaC · 2026-04-11T01:04:20Z

+            else:
+                # First time seeing this key — sample or skip.
+                if not self._should_sample():
+                    self._skipped.add(key)


skipped -> non-skipped

royyhuang

The implementation itself is pretty straightforward. I do have a high-level concern on code structure. Currently, we are adding the L1 lifecycle histogram subscription to the original subscriber. However, when the event bus was designed, the goal was to separate and divide up the subscribers so that users can easily disable and enable part of the logging, tracing, and/or metrics to avoid unnecessary overhead.

Given the overhead from housekeeping the shadow map is non-negligible, especially memory even if with sampling, I would suggest we put lifecycle tracking into a separate subscriber so that user can enable this only when needed. This also gives clear structure on the code, avoiding all subscription functionality to be squeezed into a single file making the l1 subscriber unmaintainable.

royyhuang · 2026-04-13T18:32:26Z

Also please update EVENTS.md or METRICS.md if necessary.

- Add L0MetricsSubscriber with shadow-based block tracking and OTel histograms (lifetime, idle_before_evict, reuse_gap) - Use _sampled set instead of _skipped for O(N*sample_rate) memory usage - Add MP_VLLM_BLOCK_ALLOCATION and MP_VLLM_END_SESSION events to server.py - Update EVENTS.md and METRICS.md with L0, L2, and MP_VLLM event/metric docs - Add unit tests for L0MetricsSubscriber Signed-off-by: yuwei <yuwei@dev.local>

Signed-off-by: yuwei <yuwei@dev.local>

The _sampled approach re-rolls the dice on every appearance of an untracked block, causing the effective sample rate to inflate over time (e.g. 10% becomes 20%+) and tracking to start mid-life with wrong alloc_time. The _skipped set makes a once-and-final decision. Signed-off-by: yuwei <yuwei@dev.local>

conftest.py is loaded by pytest as a plugin, not as a regular module, so explicit imports create a second instance with a separate reader. Use a regular module (otel_setup.py) instead. Signed-off-by: yuwei <yuwei@dev.local>

- Upgrade L1MetricsSubscriber with chunk lifecycle histograms (lifetime, idle_before_evict, reuse_gap, evict_reuse_gap) - Use _sampled set instead of _skipped for O(N*sample_rate) memory usage - Add --metrics-sample-rate CLI arg to ObservabilityConfig - Pass sample_rate through to L0 and L1 subscribers - Update METRICS.md with L1 chunk lifecycle histograms section - Add comprehensive unit tests for L1MetricsSubscriber Signed-off-by: yuwei <yuwei@dev.local>

Signed-off-by: yuwei <yuwei@dev.local>

Same fix as L0: _sampled re-rolls on every appearance, inflating the effective sample rate and starting tracking mid-life with wrong alloc_time. _skipped makes a once-and-final decision. Signed-off-by: yuwei <yuwei@dev.local>

Signed-off-by: yuwei <yuwei@dev.local>

…additions Signed-off-by: yuwei <yuwei@dev.local>

…config Signed-off-by: yuwei <yuwei@dev.local>

…ubscriber Use hash(key) % prime < threshold instead of maintaining a _skipped set. O(1) memory, deterministic (same key always gets same decision), and no unbounded growth for long-running services. Signed-off-by: yuwei <yuwei@dev.local>

…ubscriber Same approach as L0: hash(key) % prime < threshold. O(1) memory, deterministic, no unbounded set growth for long-running services. Signed-off-by: yuwei <yuwei@dev.local>

…s-sample-rate Signed-off-by: yuwei <yuwei@dev.local>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit fb0a56f. Configure here.}

hash(block_id) == block_id for small ints in CPython, causing deterministic hash sampling to always pick the lowest-numbered blocks. _skipped set is fine for L0 since it's bounded by the finite number of physical GPU blocks. Signed-off-by: yuwei <yuwei@dev.local>

ApostaC

LGTM!

* [Feat] L0 Subscriber: GPU KV cache block lifecycle metrics Signed-off-by: yuwei <yuwei@dev.local> Co-authored-by: yuwei <yuwei@dev.local>

Oasis-Git requested review from ApostaC, deng451e, hickeyma and sammshen as code owners April 8, 2026 22:34

gemini-code-assist Bot reviewed Apr 8, 2026

View reviewed changes

cursor Bot reviewed Apr 8, 2026

View reviewed changes

Comment thread lmcache/v1/mp_observability/subscribers/metrics/l1.py Outdated

cursor Bot reviewed Apr 10, 2026

View reviewed changes

Comment thread lmcache/v1/mp_observability/subscribers/metrics/l1.py Outdated

ApostaC requested a review from royyhuang April 10, 2026 22:47

ApostaC reviewed Apr 11, 2026

View reviewed changes

royyhuang reviewed Apr 13, 2026

View reviewed changes

yuwei added 3 commits April 13, 2026 20:06

Rename L0MetricsSubscriber to L0LifecycleSubscriber in separate file

fa3780e

Signed-off-by: yuwei <yuwei@dev.local>

royyhuang approved these changes Apr 13, 2026

View reviewed changes

yuwei added 5 commits April 14, 2026 00:13

Split L1 lifecycle histograms into separate L1LifecycleSubscriber

2f17715

Signed-off-by: yuwei <yuwei@dev.local>

Revert _sampled back to _skipped in L1LifecycleSubscriber

9e2847a

Same fix as L0: _sampled re-rolls on every appearance, inflating the effective sample rate and starting tracking mid-life with wrong alloc_time. _skipped makes a once-and-final decision. Signed-off-by: yuwei <yuwei@dev.local>

Use shared OTel reader in test_l1_lifecycle.py

da8def3

Signed-off-by: yuwei <yuwei@dev.local>

Oasis-Git force-pushed the l1 branch from c68e38f to da8def3 Compare April 14, 2026 03:49

Merge dev into l1: resolve conflicts keeping L1 lifecycle subscriber …

d018544

…additions Signed-off-by: yuwei <yuwei@dev.local>

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread lmcache/v1/mp_observability/subscribers/metrics/l1_lifecycle.py

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread lmcache/v1/mp_observability/subscribers/metrics/l1_lifecycle.py Outdated

yuwei added 5 commits April 14, 2026 19:49

Merge dev into l1: keep both metrics_sample_rate and lookup_hash_log …

2ea7670

…config Signed-off-by: yuwei <yuwei@dev.local>

Merge branch 'l0' into l1

36cd1af

Replace _skipped set with deterministic hash sampling in L1LifecycleS…

324f281

…ubscriber Same approach as L0: hash(key) % prime < threshold. O(1) memory, deterministic, no unbounded set growth for long-running services. Signed-off-by: yuwei <yuwei@dev.local>

Update observability.rst with L1/L2/L0 lifecycle metrics and --metric…

fb0a56f

…s-sample-rate Signed-off-by: yuwei <yuwei@dev.local>

cursor Bot reviewed Apr 14, 2026

View reviewed changes

Comment thread lmcache/v1/mp_observability/subscribers/metrics/l0_lifecycle.py Outdated

Oasis-Git force-pushed the l1 branch from 65815cd to 02977d9 Compare April 14, 2026 22:02

ApostaC approved these changes Apr 14, 2026

View reviewed changes

ApostaC enabled auto-merge (squash) April 14, 2026 22:46

github-actions Bot added the full Run comprehensive tests on this PR label Apr 14, 2026

Merge branch 'dev' into l1

a1e30f5

ApostaC merged commit e9f8ec2 into LMCache:dev Apr 15, 2026
38 checks passed

Oasis-Git deleted the l1 branch April 15, 2026 04:01

ftian1 pushed a commit to ftian1/LMCache that referenced this pull request Apr 20, 2026

[Feat] L1 Subscriber (LMCache#2986)

22fd27b

* [Feat] L0 Subscriber: GPU KV cache block lifecycle metrics Signed-off-by: yuwei <yuwei@dev.local> Co-authored-by: yuwei <yuwei@dev.local>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feat] L1 Subscriber#2986

[Feat] L1 Subscriber#2986
ApostaC merged 16 commits intoLMCache:devfrom
Oasis-Git:l1

Oasis-Git commented Apr 8, 2026 •

edited by cursor Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Uh oh!

Uh oh!

Uh oh!

ApostaC left a comment

Uh oh!

ApostaC Apr 11, 2026

Uh oh!

royyhuang left a comment

Uh oh!

royyhuang commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Uh oh!

Uh oh!

ApostaC left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Oasis-Git commented Apr 8, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

ApostaC Apr 11, 2026

Choose a reason for hiding this comment

Uh oh!

royyhuang left a comment

Choose a reason for hiding this comment

Uh oh!

royyhuang commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Oasis-Git commented Apr 8, 2026 •

edited by cursor Bot

Loading