[LMCache][MP] optimize save when mla enabled by chunxiaozheng · Pull Request #38810 · vllm-project/vllm

chunxiaozheng · 2026-04-02T12:40:02Z

When MLA is enabled, store or retrieve requests only need to be sent once in multi workers, which can greatly reduce the number of requests in the server.
The current PR only modifies store requests and will modify retrieve requests in the next request.

gemini-code-assist

Code Review

This pull request introduces support for Multi-Head Latent Attention (MLA) in the KV cache transfer mechanism by ensuring only the first rank of a tensor parallel group saves the cache to avoid redundant operations. The review feedback highlights that the documentation incorrectly swapped tensor parallel (TP) and pipeline parallel (PP) group definitions, and that the variable naming should be updated from is_first_rank_of_pp_group to is_first_rank_of_tp_group to accurately reflect the implementation logic. Additionally, the reviewer recommended using inspect.signature and getattr to maintain backward compatibility with older versions of the lmcache package and to prevent potential runtime errors.

chunxiaozheng · 2026-04-10T01:51:43Z

hi~@ApostaC, could you help take a look?

ApostaC

LGTM!

mergify · 2026-04-11T06:50:00Z

Hi @chunxiaozheng, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: idellzheng <idellzheng@tencent.com>

chunxiaozheng · 2026-04-11T07:07:52Z

hi~@KuntaiDu, could you help take a look again.

Signed-off-by: idellzheng <idellzheng@tencent.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>

Signed-off-by: idellzheng <idellzheng@tencent.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu> Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>

Signed-off-by: idellzheng <idellzheng@tencent.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>

chunxiaozheng requested review from ApostaC, NickLucche and orozery as code owners April 2, 2026 12:40

mergify Bot added the kv-connector label Apr 2, 2026

gemini-code-assist Bot reviewed Apr 2, 2026

View reviewed changes

chunxiaozheng mentioned this pull request Apr 7, 2026

[MP][optimize] optimize save when mla enabled LMCache/LMCache#2935

Merged

ApostaC approved these changes Apr 10, 2026

View reviewed changes

chunxiaozheng mentioned this pull request Apr 10, 2026

[step3] remove unnecessary code in mp adapter LMCache/LMCache#2994

Merged

ApostaC added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 10, 2026

chunxiaozheng added 5 commits April 11, 2026 15:05

[LMCache][MP] optimize save when mla enabled

70c2aa9

Signed-off-by: idellzheng <idellzheng@tencent.com>

use ParallelStrategy

3d120e5

Signed-off-by: idellzheng <idellzheng@tencent.com>

rename

63357f2

Signed-off-by: idellzheng <idellzheng@tencent.com>

add param name

ca143ad

Signed-off-by: idellzheng <idellzheng@tencent.com>

checkstyle fix

344d0aa

Signed-off-by: idellzheng <idellzheng@tencent.com>

chunxiaozheng force-pushed the lmcache-mp-mla-optimize branch from 21b91dc to 344d0aa Compare April 11, 2026 07:06

Merge branch 'main' into lmcache-mp-mla-optimize

6855ada

ApostaC merged commit c687bf2 into vllm-project:main Apr 14, 2026
57 checks passed

whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026

[LMCache][MP] optimize save when mla enabled (vllm-project#38810)

5204756

Signed-off-by: idellzheng <idellzheng@tencent.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>

mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026

[LMCache][MP] optimize save when mla enabled (vllm-project#38810)

5155eec

Signed-off-by: idellzheng <idellzheng@tencent.com> Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LMCache][MP] optimize save when mla enabled#38810

[LMCache][MP] optimize save when mla enabled#38810
ApostaC merged 6 commits intovllm-project:mainfrom
chunxiaozheng:lmcache-mp-mla-optimize

chunxiaozheng commented Apr 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chunxiaozheng commented Apr 10, 2026

Uh oh!

ApostaC left a comment

Uh oh!

mergify Bot commented Apr 11, 2026

Uh oh!

chunxiaozheng commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chunxiaozheng commented Apr 2, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chunxiaozheng commented Apr 10, 2026

Uh oh!

ApostaC left a comment

Choose a reason for hiding this comment

Uh oh!

mergify Bot commented Apr 11, 2026

Uh oh!

chunxiaozheng commented Apr 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants