Skip to content

[LMCache] Pass TP size in lookup for MLA multi-reader locking#36129

Merged
ApostaC merged 4 commits intovllm-project:mainfrom
maobaolong:lmcache/pass-tp-size-in-lookup
Mar 11, 2026
Merged

[LMCache] Pass TP size in lookup for MLA multi-reader locking#36129
ApostaC merged 4 commits intovllm-project:mainfrom
maobaolong:lmcache/pass-tp-size-in-lookup

Conversation

@maobaolong
Copy link
Copy Markdown
Contributor

Summary

When MLA is enabled, the world_size passed to LMCache is divided by tp_size (since all TP ranks share the same KV cache in MLA models). However, the read lock count during lookup still needs the original TP size to acquire the correct number of locks for all workers that will independently retrieve the same cached chunks.

This PR adds a tp_size parameter to LMCacheMPSchedulerAdapter and propagates it through IPCCacheEngineKey so the LMCache server can use it for multi-reader locking.

Changes

vllm side (this PR)

  • create_scheduler_adapter: extracts tp_size from parallel_config and passes it
  • _adapter_accepts_tp_size(): inspect-based compatibility check so newer vLLM works with older LMCache
  • Fallback LMCacheMPSchedulerAdapter: accepts tp_size and propagates it in key creation

LMCache side (companion PR: LMCache/LMCache#2697)

  • IPCCacheEngineKey: adds tp_size: int = 1 field (backward compatible default)
  • LMCacheMPSchedulerAdapter: accepts tp_size parameter
  • server.py lookup(): uses key.tp_size instead of key.world_size for extra_readers

Backward Compatibility

  • New vllm + old LMCache: _adapter_accepts_tp_size() detects the old signature and skips passing tp_size
  • Old vllm + new LMCache: tp_size defaults to 1 in both the adapter and the key
  • New vllm + new LMCache: full TP-aware multi-reader locking enabled

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the tp_size parameter to LMCacheMPSchedulerAdapter to enable multi-reader locking in LMCache for MLA models, incorporating a backward-compatibility check. While no security vulnerabilities were identified, a critical issue remains: the LMCacheMPWorkerAdapter has not been updated to handle tp_size. This omission is likely to cause key mismatches between scheduler lookups and worker retrievals, thereby breaking the cache functionality.

@maobaolong maobaolong force-pushed the lmcache/pass-tp-size-in-lookup branch from 302c5d5 to cea1669 Compare March 5, 2026 11:47
Signed-off-by: baoloongmao <baoloongmao@tencent.com>
@maobaolong maobaolong force-pushed the lmcache/pass-tp-size-in-lookup branch from cea1669 to 730d142 Compare March 5, 2026 11:57
@maobaolong
Copy link
Copy Markdown
Contributor Author

@ApostaC Would you like to take a look at this PR? Thanks

Copy link
Copy Markdown
Collaborator

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Let's merge it after the LMCache-side PR is merged.

@ApostaC ApostaC added the ready ONLY add when PR is ready to merge/full CI is needed label Mar 6, 2026
@maobaolong
Copy link
Copy Markdown
Contributor Author

@ApostaC Thanks for the review, the LMCache side has been merged.

It looks flaky for the ci/pr, would you like to help me to trigger the retry, thanks a lots.

@ApostaC ApostaC enabled auto-merge (squash) March 11, 2026 17:49
@ApostaC ApostaC merged commit 12001f2 into vllm-project:main Mar 11, 2026
48 checks passed
wendyliu235 pushed a commit to wendyliu235/vllm-public that referenced this pull request Mar 18, 2026
…roject#36129)

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
khairulkabir1661 pushed a commit to khairulkabir1661/vllm that referenced this pull request Mar 27, 2026
…roject#36129)

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
mtparet pushed a commit to blackfuel-ai/vllm that referenced this pull request Apr 9, 2026
…roject#36129)

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
…roject#36129)

Signed-off-by: baoloongmao <baoloongmao@tencent.com>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants