Skip to content

Add scheduler instance_id and model_name to L0 KV lifecycle tracking#3043

Merged
ApostaC merged 2 commits intoLMCache:devfrom
Oasis-Git:l0-add
Apr 16, 2026
Merged

Add scheduler instance_id and model_name to L0 KV lifecycle tracking#3043
ApostaC merged 2 commits intoLMCache:devfrom
Oasis-Git:l0-add

Conversation

@Oasis-Git
Copy link
Copy Markdown
Member

@Oasis-Git Oasis-Git commented Apr 15, 2026

  • Add instance_id field to BlockAllocationRecord (per-record, default 0)
  • Server looks up model_name from gpu_context_meta and adds to event
  • L0LifecycleSubscriber reads instance_id from each record, model_name from event metadata
  • Key shadow map by (instance_id, block_id) for multi-instance support
  • Emit OTel histograms with instance_id and model_name attributes for per-instance, per-model Prometheus metric slicing
  • Update EVENTS.md, METRICS.md, and observability.rst docs
  • Add test verifying OTel attributes on histogram data points

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Note

Medium Risk
Changes the wire protocol and handler signature for REPORT_BLOCK_ALLOCATION, so mismatched client/server versions could break block-allocation reporting; metric attribute additions and shadow-map keying also affect lifecycle metric cardinality and behavior.

Overview
Adds instance_id and model_name propagation to MP_VLLM_BLOCK_ALLOCATION from the vLLM adapter through the multiprocess protocol/server event metadata, so L0 lifecycle tracking can distinguish blocks across multiple scheduler instances.

Updates L0LifecycleSubscriber to key its shadow state by (instance_id, block_id) and to emit eviction/reuse histograms with instance_id/model_name OTel attributes for Prometheus slicing; tests and observability docs are updated accordingly, including a new assertion that histogram data points carry these attributes.

Reviewed by Cursor Bugbot for commit 2343215. Bugbot is set up for automated code reviews on this repo. Configure here.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces multi-instance and multi-model support for GPU KV cache block lifecycle tracking by adding instance_id and model_name attributes to observability events and metrics. Key changes include updating the shadow map in l0_lifecycle.py to use composite keys and modifying the BlockAllocationRecord to carry instance information. Feedback suggests that server.py should group records by instance_id before publishing events to ensure correct model attribution in mixed batches, and recommends using explicit type hints for BlockAllocationRecord instead of object to improve type safety and maintainability.

Comment thread lmcache/v1/multiprocess/server.py Outdated
Comment on lines 876 to 886
# Look up model_name from the first record's instance_id.
model_name = ""
if records:
meta = self.gpu_context_meta.get(records[0].instance_id)
model_name = meta[0] if meta else ""
self._event_bus.publish(
Event(
event_type=EventType.MP_VLLM_BLOCK_ALLOCATION,
metadata={"records": records},
metadata={"model_name": model_name, "records": records},
)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Looking up model_name from only the first record in the batch is potentially incorrect. In a multi-instance or multi-model scenario, a single batch of records might contain entries from different instance_ids belonging to different models. This would lead to incorrect metric attribution. Furthermore, ensure that all relevant metadata, such as MemoryFormat (fmt), is retrieved and passed correctly during batched allocation to ensure correctness in multi-processing modes. Consider grouping the records by instance_id and publishing separate events for each unique instance.

Suggested change
# Look up model_name from the first record's instance_id.
model_name = ""
if records:
meta = self.gpu_context_meta.get(records[0].instance_id)
model_name = meta[0] if meta else ""
self._event_bus.publish(
Event(
event_type=EventType.MP_VLLM_BLOCK_ALLOCATION,
metadata={"records": records},
metadata={"model_name": model_name, "records": records},
)
)
by_instance = {}
for record in records:
by_instance.setdefault(record.instance_id, []).append(record)
for instance_id, inst_records in by_instance.items():
meta = self.gpu_context_meta.get(instance_id)
model_name = meta[0] if meta else ""
self._event_bus.publish(
Event(
event_type=EventType.MP_VLLM_BLOCK_ALLOCATION,
metadata={"model_name": model_name, "records": inst_records},
)
)
References
  1. When performing batched memory allocation, ensure the MemoryFormat (fmt) is retrieved from the cache and passed to the allocator for correctness in multi-processing modes.

Comment on lines +162 to +167
def _process_record(self, model_name: str, record: object, now: float) -> None:
"""Process a single BlockAllocationRecord."""
req_id: str = record.req_id # type: ignore[attr-defined]
block_ids: list[int] = record.new_block_ids # type: ignore[attr-defined]
token_ids: list[int] = record.new_token_ids # type: ignore[attr-defined]
instance_id: int = record.instance_id # type: ignore[attr-defined]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The record parameter is typed as object, which forces the use of type: ignore[attr-defined] when accessing its fields. Per the Repository Style Guide (line 24), all new/modified functions should have proper type hints. Please import BlockAllocationRecord from lmcache.v1.multiprocess.custom_types and use it as the type hint for the record parameter to improve maintainability and type safety.

References
  1. All new functions have type hints (arguments + return values) (link)

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c0f9e17. Configure here.

Comment thread lmcache/v1/multiprocess/protocols/observability.py Outdated
is unhealthy the report is silently dropped.

Args:
instance_id: The GPU instance ID (scheduler/worker identity).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GPU instance ID -> scheduler instance id

- Adapter sends os.getpid() and self.model_name in
  report_block_allocations — no vLLM change needed
- Protocol: [int, str, list[BlockAllocationRecord]]
- Server passes instance_id and model_name to EventBus
- L0LifecycleSubscriber keys shadow map by (instance_id, block_id)
- Emit OTel histograms with instance_id and model_name attributes
  for per-instance, per-model Prometheus metric slicing
- Update EVENTS.md, METRICS.md, and observability.rst docs
- Add test verifying OTel attributes on histogram data points

Signed-off-by: yuwei <yuwei@dev.local>
@Oasis-Git Oasis-Git added full Run comprehensive tests on this PR and removed full Run comprehensive tests on this PR labels Apr 16, 2026
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@ApostaC ApostaC merged commit c92323f into LMCache:dev Apr 16, 2026
32 of 33 checks passed
@Oasis-Git Oasis-Git deleted the l0-add branch April 16, 2026 23:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants