Skip to content

vllm block event#2930

Merged
sammshen merged 4 commits intoLMCache:devfrom
Oasis-Git:vl
Apr 2, 2026
Merged

vllm block event#2930
sammshen merged 4 commits intoLMCache:devfrom
Oasis-Git:vl

Conversation

@Oasis-Git
Copy link
Copy Markdown
Member

@Oasis-Git Oasis-Git commented Apr 1, 2026

What this PR does / why we need it:

Special notes for your reviewers:

If applicable:

  • this PR contains user facing changes - docs added
  • this PR contains unit tests

Note

Medium Risk
Adds a new REPORT_BLOCK_ALLOCATION request type to the multiprocess RPC protocol and wires it through the vLLM adapter and MP server, which could impact protocol compatibility and message handling if clients/servers are version-skewed.

Overview
Adds a new fire-and-forget multiprocess RPC, RequestType.REPORT_BLOCK_ALLOCATION, to let vLLM report per-request GPU block/token allocation deltas (BlockAllocationRecord) to the LMCache server.

The server handles this request by publishing a new observability event (EventType.MP_VLLM_BLOCK_ALLOCATION) to the EventBus, with logging support in MPServerLoggingSubscriber. Tests are extended to cover msgpack serialization for BlockAllocationRecord, MQ handling for the new request, and EventBus delivery semantics for the new event.

Written by Cursor Bugbot for commit 8f4542f. This will update automatically on new commits. Configure here.

Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements a new observability protocol for reporting vLLM GPU block allocation events to the LMCache server. It includes the necessary protocol definitions, event bus integration, and a logging subscriber to handle these reports. Feedback suggests changing the logging level from info to debug for consistency with the subscriber's design and refining the type hints for the allocation records to improve code clarity.

Comment on lines +84 to +92
logger.info(
"vLLM block allocation: req_id=%s "
"new_blocks=%d new_tokens=%d "
"block_ids=%s",
rec.get("req_id"),
len(block_ids),
len(token_ids),
block_ids[:10],
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The use of logger.info is inconsistent with the rest of the MPServerLoggingSubscriber class, which is explicitly documented to log events at the debug level (see class docstring on line 23 and module description on line 3). For high-frequency events like block allocations in vLLM, debug level is more appropriate to avoid log pollution in production environments.

Suggested change
logger.info(
"vLLM block allocation: req_id=%s "
"new_blocks=%d new_tokens=%d "
"block_ids=%s",
rec.get("req_id"),
len(block_ids),
len(token_ids),
block_ids[:10],
)
logger.debug(
"vLLM block allocation: req_id=%s "
"new_blocks=%d new_tokens=%d "
"block_ids=%s",
rec.get("req_id"),
len(block_ids),
len(token_ids),
block_ids[:10],
)
References
  1. The style guide emphasizes code cleanliness and consistency with existing patterns. (link)

Comment thread lmcache/v1/multiprocess/server.py Outdated
"storage_manager": sm,
}

def report_block_allocations(self, records: list) -> None:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The type hint for records is imprecise. To maintain consistency with the LMCacheMPSchedulerAdapter implementation and provide better clarity for this public method, the type hint should be more specific.

Suggested change
def report_block_allocations(self, records: list) -> None:
def report_block_allocations(self, records: list[dict]) -> None:
References
  1. All new functions must have type hints for arguments and return values. Using more specific type hints (e.g., list[dict] instead of list) improves maintainability. (link)

Copy link
Copy Markdown

@cursor cursor Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Fix All in Cursor

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Comment thread lmcache/v1/mp_observability/subscribers/logging/mp_server.py Outdated
Comment thread lmcache/v1/multiprocess/protocols/observability.py
@Oasis-Git Oasis-Git changed the title Vllm block event vllm block event Apr 1, 2026
@Oasis-Git Oasis-Git assigned Oasis-Git and unassigned Oasis-Git Apr 1, 2026
@Oasis-Git Oasis-Git added the full Run comprehensive tests on this PR label Apr 1, 2026
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The structure looks good. Please see the implementation-related comments.

Let's also add some unit tests for the new protocol.

Comment on lines +32 to +36
"REPORT_BLOCK_ALLOCATION": ProtocolDefinition(
payload_classes=[list],
response_class=None,
handler_type=HandlerType.SYNC,
),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For protocol definitions, we want to have a more rigorous definition (e.g., a clear dataclass defined in custom_types.py) rather than just a list.
For handler_type, we should use HandlerType.BLOCKING.

for rec in records:
block_ids = rec.get("new_block_ids", [])
token_ids = rec.get("new_token_ids", [])
logger.info(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's use logger.debug here for now

Comment thread lmcache/v1/mp_observability/event.py Outdated
MP_LOOKUP_PREFETCH_END = "mp.lookup_prefetch.end"

# vLLM block allocation events
VLLM_BLOCK_ALLOCATION = "vllm.block_allocation"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: name it as MP_VLLM_BLOCK_ALLOCATION to follow the naming conventions above

Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
Copy link
Copy Markdown
Contributor

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!



@dataclass
class BlockAllocationRecord:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "RequestAllocationRecord"?

Copy link
Copy Markdown
Contributor

@sammshen sammshen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@sammshen sammshen merged commit a060b4b into LMCache:dev Apr 2, 2026
35 checks passed
@Oasis-Git Oasis-Git deleted the vl branch April 2, 2026 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

full Run comprehensive tests on this PR mp_mode

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants