Skip to content

[Feat] L1 Subscriber#2986

Merged
ApostaC merged 16 commits intoLMCache:devfrom
Oasis-Git:l1
Apr 15, 2026
Merged

[Feat] L1 Subscriber#2986
ApostaC merged 16 commits intoLMCache:devfrom
Oasis-Git:l1

Conversation

@Oasis-Git
Copy link
Copy Markdown
Member

@Oasis-Git Oasis-Git commented Apr 8, 2026

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Note

Medium Risk
Adds a new event-bus subscriber and default-registered histogram tracking for L1 chunk lifecycles, which can introduce overhead and additional memory use (shadow maps) in production if sample rate is misconfigured.

Overview
Adds a new L1LifecycleSubscriber that records sampled OTel histograms for L1 chunk lifetime, idle-before-evict, reuse gaps, and evict-to-reuse gaps based on L1 read/write/evict events.

Introduces a new --metrics-sample-rate/metrics_sample_rate setting (default 0.01) and wires it into subscriber registration so both L0LifecycleSubscriber and the new L1 lifecycle tracking use the same sampling control. Updates observability docs/metric references and adds unit tests validating histogram recording, deterministic sampling, and eviction sweep behavior.

Reviewed by Cursor Bugbot for commit a1e30f5. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces GPU (L0) KV cache block lifecycle tracking through the new L0MetricsSubscriber and expands L1MetricsSubscriber to include chunk lifecycle histograms. It also implements the MP_VLLM_END_SESSION event to facilitate accurate session-based block release tracking. Feedback suggests improving the L0 reuse gap metrics by recording them at the moment of a cache hit rather than waiting for eviction, which would prevent data loss for long-lived blocks and capture the initial reuse gap more accurately.

Comment on lines +267 to +271
for i in range(1, len(history)):
gap = history[i] - history[i - 1]
self._reuse_gap_hist.record(gap)

# -- Sampling ----------------------------------------------------------
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current logic for recording reuse gaps in _emit_eviction_metrics misses the gap between the initial allocation (alloc_time) and the first reuse event stored in access_history.

Additionally, recording reuse gaps only at eviction time (Case 5) means that metrics for long-lived blocks that are never evicted will never be reported. Consider recording the reuse gap immediately when a hit is detected in _process_block (Case 4), similar to how it is handled in L1MetricsSubscriber. When implementing this, ensure that distinct variable names are used for each phase's results to prevent accidental overwriting and ensure accurate reporting in summaries.

References
  1. When performance metrics are collected for different phases of an operation, ensure that distinct variable names are used for each phase's results to prevent accidental overwriting and ensure accurate reporting in summaries.

Comment thread lmcache/v1/mp_observability/subscribers/metrics/l1.py Outdated
Comment thread lmcache/v1/mp_observability/subscribers/metrics/l1.py Outdated
@ApostaC ApostaC requested a review from royyhuang April 10, 2026 22:47
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See below

else:
# First time seeing this key — sample or skip.
if not self._should_sample():
self._skipped.add(key)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

skipped -> non-skipped

Copy link
Copy Markdown
Contributor

@royyhuang royyhuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The implementation itself is pretty straightforward. I do have a high-level concern on code structure. Currently, we are adding the L1 lifecycle histogram subscription to the original subscriber. However, when the event bus was designed, the goal was to separate and divide up the subscribers so that users can easily disable and enable part of the logging, tracing, and/or metrics to avoid unnecessary overhead.

Given the overhead from housekeeping the shadow map is non-negligible, especially memory even if with sampling, I would suggest we put lifecycle tracking into a separate subscriber so that user can enable this only when needed. This also gives clear structure on the code, avoiding all subscription functionality to be squeezed into a single file making the l1 subscriber unmaintainable.

@royyhuang
Copy link
Copy Markdown
Contributor

Also please update EVENTS.md or METRICS.md if necessary.

yuwei added 3 commits April 13, 2026 20:06
- Add L0MetricsSubscriber with shadow-based block tracking and OTel histograms
  (lifetime, idle_before_evict, reuse_gap)
- Use _sampled set instead of _skipped for O(N*sample_rate) memory usage
- Add MP_VLLM_BLOCK_ALLOCATION and MP_VLLM_END_SESSION events to server.py
- Update EVENTS.md and METRICS.md with L0, L2, and MP_VLLM event/metric docs
- Add unit tests for L0MetricsSubscriber

Signed-off-by: yuwei <yuwei@dev.local>
The _sampled approach re-rolls the dice on every appearance of an
untracked block, causing the effective sample rate to inflate over
time (e.g. 10% becomes 20%+) and tracking to start mid-life with
wrong alloc_time. The _skipped set makes a once-and-final decision.

Signed-off-by: yuwei <yuwei@dev.local>
yuwei added 5 commits April 14, 2026 00:13
conftest.py is loaded by pytest as a plugin, not as a regular module,
so explicit imports create a second instance with a separate reader.
Use a regular module (otel_setup.py) instead.

Signed-off-by: yuwei <yuwei@dev.local>
- Upgrade L1MetricsSubscriber with chunk lifecycle histograms
  (lifetime, idle_before_evict, reuse_gap, evict_reuse_gap)
- Use _sampled set instead of _skipped for O(N*sample_rate) memory usage
- Add --metrics-sample-rate CLI arg to ObservabilityConfig
- Pass sample_rate through to L0 and L1 subscribers
- Update METRICS.md with L1 chunk lifecycle histograms section
- Add comprehensive unit tests for L1MetricsSubscriber

Signed-off-by: yuwei <yuwei@dev.local>
Same fix as L0: _sampled re-rolls on every appearance, inflating
the effective sample rate and starting tracking mid-life with wrong
alloc_time. _skipped makes a once-and-final decision.

Signed-off-by: yuwei <yuwei@dev.local>
Signed-off-by: yuwei <yuwei@dev.local>
…additions

Signed-off-by: yuwei <yuwei@dev.local>
Comment thread lmcache/v1/mp_observability/subscribers/metrics/l1_lifecycle.py
Comment thread lmcache/v1/mp_observability/subscribers/metrics/l1_lifecycle.py Outdated
yuwei added 5 commits April 14, 2026 19:49
…config

Signed-off-by: yuwei <yuwei@dev.local>
…ubscriber

Use hash(key) % prime < threshold instead of maintaining a _skipped set.
O(1) memory, deterministic (same key always gets same decision), and no
unbounded growth for long-running services.

Signed-off-by: yuwei <yuwei@dev.local>
…ubscriber

Same approach as L0: hash(key) % prime < threshold. O(1) memory,
deterministic, no unbounded set growth for long-running services.

Signed-off-by: yuwei <yuwei@dev.local>
…s-sample-rate

Signed-off-by: yuwei <yuwei@dev.local>
Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit fb0a56f. Configure here.

Comment thread lmcache/v1/mp_observability/subscribers/metrics/l0_lifecycle.py Outdated
hash(block_id) == block_id for small ints in CPython, causing
deterministic hash sampling to always pick the lowest-numbered
blocks. _skipped set is fine for L0 since it's bounded by the
finite number of physical GPU blocks.

Signed-off-by: yuwei <yuwei@dev.local>
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ApostaC ApostaC enabled auto-merge (squash) April 14, 2026 22:46
@github-actions github-actions Bot added the full Run comprehensive tests on this PR label Apr 14, 2026
@ApostaC ApostaC merged commit e9f8ec2 into LMCache:dev Apr 15, 2026
38 checks passed
@Oasis-Git Oasis-Git deleted the l1 branch April 15, 2026 04:01
ftian1 pushed a commit to ftian1/LMCache that referenced this pull request Apr 20, 2026
* [Feat] L0 Subscriber: GPU KV cache block lifecycle metrics

Signed-off-by: yuwei <yuwei@dev.local>
Co-authored-by: yuwei <yuwei@dev.local>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants