[LMCache] Pass TP size in lookup for MLA multi-reader locking#36129
[LMCache] Pass TP size in lookup for MLA multi-reader locking#36129ApostaC merged 4 commits intovllm-project:mainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces the tp_size parameter to LMCacheMPSchedulerAdapter to enable multi-reader locking in LMCache for MLA models, incorporating a backward-compatibility check. While no security vulnerabilities were identified, a critical issue remains: the LMCacheMPWorkerAdapter has not been updated to handle tp_size. This omission is likely to cause key mismatches between scheduler lookups and worker retrievals, thereby breaking the cache functionality.
302c5d5 to
cea1669
Compare
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
cea1669 to
730d142
Compare
|
@ApostaC Would you like to take a look at this PR? Thanks |
ApostaC
left a comment
There was a problem hiding this comment.
LGTM! Let's merge it after the LMCache-side PR is merged.
|
@ApostaC Thanks for the review, the LMCache side has been merged. It looks flaky for the |
…roject#36129) Signed-off-by: baoloongmao <baoloongmao@tencent.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
…roject#36129) Signed-off-by: baoloongmao <baoloongmao@tencent.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
…roject#36129) Signed-off-by: baoloongmao <baoloongmao@tencent.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
…roject#36129) Signed-off-by: baoloongmao <baoloongmao@tencent.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
Summary
When MLA is enabled, the world_size passed to LMCache is divided by tp_size (since all TP ranks share the same KV cache in MLA models). However, the read lock count during lookup still needs the original TP size to acquire the correct number of locks for all workers that will independently retrieve the same cached chunks.
This PR adds a tp_size parameter to LMCacheMPSchedulerAdapter and propagates it through IPCCacheEngineKey so the LMCache server can use it for multi-reader locking.
Changes
vllm side (this PR)
LMCache side (companion PR: LMCache/LMCache#2697)
Backward Compatibility