vllm block event#2930
Conversation
There was a problem hiding this comment.
Code Review
This pull request implements a new observability protocol for reporting vLLM GPU block allocation events to the LMCache server. It includes the necessary protocol definitions, event bus integration, and a logging subscriber to handle these reports. Feedback suggests changing the logging level from info to debug for consistency with the subscriber's design and refining the type hints for the allocation records to improve code clarity.
| logger.info( | ||
| "vLLM block allocation: req_id=%s " | ||
| "new_blocks=%d new_tokens=%d " | ||
| "block_ids=%s", | ||
| rec.get("req_id"), | ||
| len(block_ids), | ||
| len(token_ids), | ||
| block_ids[:10], | ||
| ) |
There was a problem hiding this comment.
The use of logger.info is inconsistent with the rest of the MPServerLoggingSubscriber class, which is explicitly documented to log events at the debug level (see class docstring on line 23 and module description on line 3). For high-frequency events like block allocations in vLLM, debug level is more appropriate to avoid log pollution in production environments.
| logger.info( | |
| "vLLM block allocation: req_id=%s " | |
| "new_blocks=%d new_tokens=%d " | |
| "block_ids=%s", | |
| rec.get("req_id"), | |
| len(block_ids), | |
| len(token_ids), | |
| block_ids[:10], | |
| ) | |
| logger.debug( | |
| "vLLM block allocation: req_id=%s " | |
| "new_blocks=%d new_tokens=%d " | |
| "block_ids=%s", | |
| rec.get("req_id"), | |
| len(block_ids), | |
| len(token_ids), | |
| block_ids[:10], | |
| ) |
References
- The style guide emphasizes code cleanliness and consistency with existing patterns. (link)
| "storage_manager": sm, | ||
| } | ||
|
|
||
| def report_block_allocations(self, records: list) -> None: |
There was a problem hiding this comment.
The type hint for records is imprecise. To maintain consistency with the LMCacheMPSchedulerAdapter implementation and provide better clarity for this public method, the type hint should be more specific.
| def report_block_allocations(self, records: list) -> None: | |
| def report_block_allocations(self, records: list[dict]) -> None: |
References
- All new functions must have type hints for arguments and return values. Using more specific type hints (e.g., list[dict] instead of list) improves maintainability. (link)
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
ApostaC
left a comment
There was a problem hiding this comment.
The structure looks good. Please see the implementation-related comments.
Let's also add some unit tests for the new protocol.
| "REPORT_BLOCK_ALLOCATION": ProtocolDefinition( | ||
| payload_classes=[list], | ||
| response_class=None, | ||
| handler_type=HandlerType.SYNC, | ||
| ), |
There was a problem hiding this comment.
For protocol definitions, we want to have a more rigorous definition (e.g., a clear dataclass defined in custom_types.py) rather than just a list.
For handler_type, we should use HandlerType.BLOCKING.
| for rec in records: | ||
| block_ids = rec.get("new_block_ids", []) | ||
| token_ids = rec.get("new_token_ids", []) | ||
| logger.info( |
There was a problem hiding this comment.
let's use logger.debug here for now
| MP_LOOKUP_PREFETCH_END = "mp.lookup_prefetch.end" | ||
|
|
||
| # vLLM block allocation events | ||
| VLLM_BLOCK_ALLOCATION = "vllm.block_allocation" |
There was a problem hiding this comment.
nit: name it as MP_VLLM_BLOCK_ALLOCATION to follow the naming conventions above
|
|
||
|
|
||
| @dataclass | ||
| class BlockAllocationRecord: |
There was a problem hiding this comment.
Maybe "RequestAllocationRecord"?

What this PR does / why we need it:
Special notes for your reviewers:
If applicable:
Note
Medium Risk
Adds a new
REPORT_BLOCK_ALLOCATIONrequest type to the multiprocess RPC protocol and wires it through the vLLM adapter and MP server, which could impact protocol compatibility and message handling if clients/servers are version-skewed.Overview
Adds a new fire-and-forget multiprocess RPC,
RequestType.REPORT_BLOCK_ALLOCATION, to let vLLM report per-request GPU block/token allocation deltas (BlockAllocationRecord) to the LMCache server.The server handles this request by publishing a new observability event (
EventType.MP_VLLM_BLOCK_ALLOCATION) to theEventBus, with logging support inMPServerLoggingSubscriber. Tests are extended to cover msgpack serialization forBlockAllocationRecord, MQ handling for the new request, and EventBus delivery semantics for the new event.Written by Cursor Bugbot for commit 8f4542f. This will update automatically on new commits. Configure here.