[MP] Protocol with Single Key#2584
Conversation
Summary of ChangesHello @Oasis-Git, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the IPC mechanism for KV cache operations within the LMCache system. The primary goal is to simplify the protocol by transitioning from handling lists of IPC keys to processing individual keys for store and retrieve requests. This change streamlines the internal data structures, type definitions, and request submission logic, making the system more robust and easier to maintain. The modifications ensure that both individual and batched operations now consistently utilize a single-key paradigm. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request refactors the multiprocess communication protocol to use a single IPC key for STORE and RETRIEVE operations, instead of a list of keys. This simplifies the protocol definition and the server-side implementation. The client-side adapter (vllm_multi_process_adapter.py) has been updated accordingly, notably by removing the client-side batching logic in batched_submit_store_requests and batched_submit_retrieve_requests. While this change simplifies the code, it may have performance implications. I've also suggested a minor improvement for robustness in the zip calls.
| for request_id, op in zip(request_ids, ops, strict=False): | ||
| all_keys.append( | ||
| self._create_key(op.token_ids, op.start, op.end, request_id=request_id) | ||
| ) | ||
| block_ids.extend(op.block_ids) | ||
| future = send_lmcache_request( | ||
| self.mq_client, | ||
| RequestType.STORE, | ||
| [ | ||
| all_keys, | ||
| self.instance_id, | ||
| block_ids, | ||
| event.ipc_handle(), | ||
| ], | ||
| ).to_cuda_future() | ||
| self.store_futures[request_ids[0]] = (future, list(request_ids[1:])) | ||
| self.submit_store_request(request_id, op, event) |
There was a problem hiding this comment.
This change replaces a single batched request with multiple individual requests sent in a loop. This is likely to increase overhead from network communication and serialization, potentially impacting performance. While this aligns with the single-key protocol change, the performance trade-off should be considered. A similar change is present in batched_submit_retrieve_requests.
| """ | ||
| all_keys: list[IPCCacheEngineKey] = [] | ||
| block_ids: list[int] = [] | ||
| for request_id, op in zip(request_ids, ops, strict=False): |
There was a problem hiding this comment.
The docstring states that ops and request_ids should have the same length. Using zip(..., strict=True) would enforce this and raise a ValueError if the lengths differ, which is safer than silently truncating to the shorter list.
| for request_id, op in zip(request_ids, ops, strict=False): | |
| for request_id, op in zip(request_ids, ops, strict=True): |
| """ | ||
| all_keys: list[IPCCacheEngineKey] = [] | ||
| block_ids: list[int] = [] | ||
| for request_id, op in zip(request_ids, ops, strict=False): |
There was a problem hiding this comment.
Similar to batched_submit_store_requests, the docstring states that ops and request_ids should have the same length. Using zip(..., strict=True) would be safer here as well to prevent silently ignoring mismatched inputs.
| for request_id, op in zip(request_ids, ops, strict=False): | |
| for request_id, op in zip(request_ids, ops, strict=True): |
ApostaC
left a comment
There was a problem hiding this comment.
LGTM in general. Please fix the tests
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com> Signed-off-by: Ofer Kiselov Nahman <ofer.kiselovnahman@weka.io>
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com> Signed-off-by: shaoxiawjc <wjc2800@163.com>
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com> Signed-off-by: Aaron Wu <aaron.wu@dell.com>
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
Signed-off-by: Oasis-Git <ayw.sirius19@gmail.com>
What this PR does / why we need it:
Set the protocol from list of IPC key to Single IPC key
Special notes for your reviewers:
If applicable: