Skip to content

[LMCache][MP] optimize save when mla enabled#38810

Merged
ApostaC merged 6 commits intovllm-project:mainfrom
chunxiaozheng:lmcache-mp-mla-optimize
Apr 14, 2026
Merged

[LMCache][MP] optimize save when mla enabled#38810
ApostaC merged 6 commits intovllm-project:mainfrom
chunxiaozheng:lmcache-mp-mla-optimize

Conversation

@chunxiaozheng
Copy link
Copy Markdown
Contributor

When MLA is enabled, store or retrieve requests only need to be sent once in multi workers, which can greatly reduce the number of requests in the server.
The current PR only modifies store requests and will modify retrieve requests in the next request.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Multi-Head Latent Attention (MLA) in the KV cache transfer mechanism by ensuring only the first rank of a tensor parallel group saves the cache to avoid redundant operations. The review feedback highlights that the documentation incorrectly swapped tensor parallel (TP) and pipeline parallel (PP) group definitions, and that the variable naming should be updated from is_first_rank_of_pp_group to is_first_rank_of_tp_group to accurately reflect the implementation logic. Additionally, the reviewer recommended using inspect.signature and getattr to maintain backward compatibility with older versions of the lmcache package and to prevent potential runtime errors.

Comment thread vllm/distributed/kv_transfer/kv_connector/v1/lmcache_mp_connector.py Outdated
@chunxiaozheng
Copy link
Copy Markdown
Contributor Author

hi~@ApostaC, could you help take a look?

Copy link
Copy Markdown
Collaborator

@ApostaC ApostaC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@ApostaC ApostaC added the ready ONLY add when PR is ready to merge/full CI is needed label Apr 10, 2026
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented Apr 11, 2026

Hi @chunxiaozheng, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: idellzheng <idellzheng@tencent.com>
Signed-off-by: idellzheng <idellzheng@tencent.com>
Signed-off-by: idellzheng <idellzheng@tencent.com>
Signed-off-by: idellzheng <idellzheng@tencent.com>
Signed-off-by: idellzheng <idellzheng@tencent.com>
@chunxiaozheng chunxiaozheng force-pushed the lmcache-mp-mla-optimize branch from 21b91dc to 344d0aa Compare April 11, 2026 07:06
@chunxiaozheng
Copy link
Copy Markdown
Contributor Author

hi~@KuntaiDu, could you help take a look again.

@ApostaC ApostaC merged commit c687bf2 into vllm-project:main Apr 14, 2026
57 checks passed
whk-lab pushed a commit to whk-lab/vllm that referenced this pull request Apr 23, 2026
Signed-off-by: idellzheng <idellzheng@tencent.com>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
avinashsingh77 pushed a commit to avinashsingh77/vllm that referenced this pull request Apr 27, 2026
Signed-off-by: idellzheng <idellzheng@tencent.com>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
Signed-off-by: Avinash Singh <avinashsingh.rcoem@gmail.com>
mystous pushed a commit to mystous/vllm_hybrid that referenced this pull request May 10, 2026
Signed-off-by: idellzheng <idellzheng@tencent.com>
Co-authored-by: Yihua Cheng <yihua98@uchicago.edu>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kv-connector ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants