[MP][optimize] optimize save when mla enabled by chunxiaozheng · Pull Request #2935 · LMCache/LMCache

chunxiaozheng · 2026-04-02T12:38:28Z

When MLA is enabled, store or retrieve requests only need to be sent once in multi workers, which can greatly reduce the number of requests in the server.
The current PR only modifies store requests and will modify retrieve requests in the next request.

Note

Medium Risk
Changes how world_size, tp_size, and worker_id are derived for LMCache multiprocess requests, which can affect cache-keying and cross-worker coordination in distributed runs. Backward-compat defaults exist, but misconfigured parallel_strategy could lead to incorrect sharing or runtime errors.

Overview
Adds a new ParallelStrategy dataclass and wires it into LMCacheMPSchedulerAdapter and LMCacheMPWorkerAdapter so cache-key fields (world_size, worker_id) and tp_size can be computed from KV-parallel topology (e.g., MLA excluding TP) rather than the raw process rank.

Updates adapter constructors with defaults and new properties (world_size, tp_size, worker_id, plus use_mla/is_first_rank_of_pp_group on the worker) and adjusts FREE_LOOKUP_LOCKS tests to construct adapters via parallel_strategy instead of directly setting world_size/tp_size.

^{Reviewed by Cursor Bugbot for commit ac60e13. Bugbot is set up for automated code reviews on this repo. Configure here.}

gemini-code-assist

Code Review

This pull request adds use_mla and is_first_rank_of_pp_group parameters to the LMCacheMPWorkerAdapter constructor. Feedback indicates that the use of is_first_rank_of_pp_group may be logically incorrect for KV cache storage and should likely be is_first_rank_of_tp_group to avoid data loss. Additionally, the new fields are currently unused in the class logic, and the constructor is missing a required docstring according to the repository style guide.

ApostaC

Quick comment: I think right now we have world_size and kv_rank in the initialization args, though their meaning is not super clear. How about we define a new dataclass to describe the parallel strategy and local rank information (and include mla flag there)?

chunxiaozheng · 2026-04-05T02:05:32Z

Quick comment: I think right now we have world_size and kv_rank in the initialization args, though their meaning is not super clear. How about we define a new dataclass to describe the parallel strategy and local rank information (and include mla flag there)?

@ApostaC Sorry, I just saw the message and I think it's okay, thanks for your review.

chunxiaozheng · 2026-04-07T04:08:21Z

@ApostaC I have updated, could you help take a look?

(the change of LMCacheMPConnector is vllm-project/vllm#38810)

ApostaC

I like this PR! One thing on my mind is how to introduce the PRs on both the LMCache & vLLM side without breaking the CI/CD.

Potential idea:

Step 1: introduce ParallelStrategy as an optional argument with default value None, and keep the old world_size and kv_rank parameter with default value None as well. (Probably we need some logic to handle both the old version and new version). -- This will not break LMCache CI .
Step 2: update the vLLM side code to use ParallelStrategy and remove the old world_size and kv_rank arguments. Since in step 1, we made world_size and kv_rank optional with a default value None, it will not break LMCache CI either.
Step 3: clean up the LMCache-side code, to completely remove the old world_size and kv_rank related arguments.

I hate this complicated process, so let me know if you have better ideas @chunxiaozheng

ApostaC · 2026-04-07T18:06:56Z

+    The world size, world_size may not be equal to the actual_world_size, 
+    in the case of mla, it will 'exclude' the effect of TP.


What does it mean by "exclude" the effect of TP? Does that mean for MLA TP 8 case, we will have:

world_size = 1, actual_world_size = 8

worker_id = 1, actualy_worker_id =

In this case, can we name it to kv_world_size and kv_worker_id, which means how do we divide the KV cache. WDYT?

@ApostaC Good idea, I will update here.

@ApostaC The meaning of "exclude" the effect of TP can refer to the following annotations in LMCacheMPConnector

def extract_world_size_and_kv_rank( world_size: int, rank: int, vllm_config: VllmConfig, ) -> tuple[int, int]: """ Convert the rank for the MLA. """ use_mla = mla_enabled(vllm_config.model_config) if not use_mla: return world_size, rank else: # Tensor parallel does not change the KV caches for MLA models. # So we need to "exclude" the effect of TP on rank and world size tp_size = vllm_config.parallel_config.tensor_parallel_size # vLLM constructs TP groups first, and then construct other # parallel groups on top of TP groups. # for example, TP=4, PP=2, # PP group: [0, 1, 2, 3], [4, 5, 6, 7] # TP group: [0, 4], [1, 5], [2, 6], [3, 7] # So we can "exclude" the effect of TP by rank // tp_size. return world_size // tp_size, rank // tp_size

chunxiaozheng · 2026-04-08T03:16:21Z

I like this PR! One thing on my mind is how to introduce the PRs on both the LMCache & vLLM side without breaking the CI/CD.

Potential idea:

Step 1: introduce ParallelStrategy as an optional argument with default value None, and keep the old world_size and kv_rank parameter with default value None as well. (Probably we need some logic to handle both the old version and new version). -- This will not break LMCache CI .

Step 2: update the vLLM side code to use ParallelStrategy and remove the old world_size and kv_rank arguments. Since in step 1, we made world_size and kv_rank optional with a default value None, it will not break LMCache CI either.

Step 3: clean up the LMCache-side code, to completely remove the old world_size and kv_rank related arguments.

I hate this complicated process, so let me know if you have better ideas @chunxiaozheng

@ApostaC Thanks for your suggestion, Iet me think about this.

chunxiaozheng · 2026-04-08T04:01:45Z

I like this PR! One thing on my mind is how to introduce the PRs on both the LMCache & vLLM side without breaking the CI/CD.
Potential idea:

Step 1: introduce ParallelStrategy as an optional argument with default value None, and keep the old world_size and kv_rank parameter with default value None as well. (Probably we need some logic to handle both the old version and new version). -- This will not break LMCache CI .

Step 2: update the vLLM side code to use ParallelStrategy and remove the old world_size and kv_rank arguments. Since in step 1, we made world_size and kv_rank optional with a default value None, it will not break LMCache CI either.

Step 3: clean up the LMCache-side code, to completely remove the old world_size and kv_rank related arguments.

I hate this complicated process, so let me know if you have better ideas @chunxiaozheng

@ApostaC Thanks for your suggestion, Iet me think about this.

@ApostaC If the LMCacheMPConnector is also maintained in the LMCache repository, would it simplify a lot? we only need to update the code in LMCache, @maobaolong has already submitted some PRs about this.

ApostaC · 2026-04-08T22:14:26Z

@chunxiaozheng I think the problem is the CI/CD. Right now, we are using vllm-side connector implementation to do the CI/CD. In that case, the CI/CD may still become red.

chunxiaozheng · 2026-04-09T02:46:38Z

@chunxiaozheng I think the problem is the CI/CD. Right now, we are using vllm-side connector implementation to do the CI/CD. In that case, the CI/CD may still become red.

@ApostaC Okay, I think the suggestion you gave above is a good choice.

Signed-off-by: idellzheng <idellzheng@tencent.com>

chunxiaozheng · 2026-04-09T03:40:51Z

@ApostaC I have updated this PR and it will be fully compatible with the previous lmcache_mp_connector, could you help take a look again.

Signed-off-by: idellzheng <idellzheng@tencent.com>

cursor

Cursor Bugbot has reviewed your changes and found 1 potential issue.

There are 3 total unresolved issues (including 2 from previous reviews).

^{❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.}

^{Reviewed by Cursor Bugbot for commit ac60e13. Configure here.}

ApostaC

LGTM!

maobaolong

LGTM.

Signed-off-by: idellzheng <idellzheng@tencent.com>

chunxiaozheng requested review from ApostaC and maobaolong April 2, 2026 12:38

gemini-code-assist Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py Outdated

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py Outdated

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py Outdated

cursor Bot reviewed Apr 2, 2026

View reviewed changes

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py Outdated

chunxiaozheng added the mp_mode label Apr 2, 2026

ApostaC reviewed Apr 2, 2026

View reviewed changes

chunxiaozheng force-pushed the mp-optimize-save branch from 6f02316 to f5a3eb1 Compare April 7, 2026 04:07

chunxiaozheng requested review from YaoJiayi, deng451e, hickeyma and sammshen as code owners April 7, 2026 04:07

cursor Bot reviewed Apr 7, 2026

View reviewed changes

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py

ApostaC reviewed Apr 7, 2026

View reviewed changes

chunxiaozheng added 4 commits April 9, 2026 11:04

[MP][optimize] optimize save when mla enabled

c253995

Signed-off-by: idellzheng <idellzheng@tencent.com>

use ParallelStrategy

f1cf916

Signed-off-by: idellzheng <idellzheng@tencent.com>

rename world_sise and worker_id

b4b75b6

Signed-off-by: idellzheng <idellzheng@tencent.com>

adapt older lmcache_mp_connector

2aff145

Signed-off-by: idellzheng <idellzheng@tencent.com>

chunxiaozheng force-pushed the mp-optimize-save branch from 3d4ba18 to 2aff145 Compare April 9, 2026 03:31

cursor Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py

add default value

ac60e13

Signed-off-by: idellzheng <idellzheng@tencent.com>

cursor Bot reviewed Apr 9, 2026

View reviewed changes

Comment thread lmcache/integration/vllm/vllm_multi_process_adapter.py

chunxiaozheng added the full Run comprehensive tests on this PR label Apr 9, 2026

ApostaC approved these changes Apr 10, 2026

View reviewed changes

ApostaC enabled auto-merge (squash) April 10, 2026 00:42

maobaolong approved these changes Apr 10, 2026

View reviewed changes

ApostaC merged commit 06981d6 into LMCache:dev Apr 10, 2026
35 of 36 checks passed

chunxiaozheng mentioned this pull request Apr 10, 2026

[step3] remove unnecessary code in mp adapter #2994

Merged

Oasis-Git pushed a commit to Oasis-Git/LMCache that referenced this pull request Apr 13, 2026

[MP][optimize] optimize save when mla enabled (LMCache#2935)

b521166

Signed-off-by: idellzheng <idellzheng@tencent.com>

ftian1 pushed a commit to ftian1/LMCache that referenced this pull request Apr 20, 2026

[MP][optimize] optimize save when mla enabled (LMCache#2935)

6714d92

Signed-off-by: idellzheng <idellzheng@tencent.com>

		The world size, world_size may not be equal to the actual_world_size,
		in the case of mla, it will 'exclude' the effect of TP.

Conversation

chunxiaozheng commented Apr 2, 2026 • edited by cursor Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

chunxiaozheng commented Apr 5, 2026

Uh oh!

chunxiaozheng commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

ApostaC Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

chunxiaozheng Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chunxiaozheng Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

chunxiaozheng commented Apr 8, 2026

Uh oh!

chunxiaozheng commented Apr 8, 2026

Uh oh!

ApostaC commented Apr 8, 2026

Uh oh!

chunxiaozheng commented Apr 9, 2026

Uh oh!

Uh oh!

chunxiaozheng commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cursor Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

maobaolong left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

chunxiaozheng commented Apr 2, 2026 •

edited by cursor Bot

Loading

chunxiaozheng commented Apr 7, 2026 •

edited

Loading

chunxiaozheng commented Apr 9, 2026 •

edited

Loading