[compile] fuse rope and cache insertion for mla by ZJY0516 · Pull Request #38646 · vllm-project/vllm

ZJY0516 · 2026-03-31T16:38:04Z

Purpose

add a compilation pass for rope and cache insertion in mla

Test Plan

vllm serve deepseek-ai/DeepSeek-V3 -tp 8 -cc.pass_config.fuse_rope_kvcache=True -cc.use_inductor_graph_partition=True

lm_eval --model local-completions --model_args "model=deepseek-ai/DeepSeek-V3,base_url=http://0.0.0.0:8000/v1/completions,tokenized_requests=False,tokenizer_backend=None,num_concurrent=256,timeout=5000,max_length=4096" --tasks gsm8k --num_fewshot 5

Test Result

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
gsm8k	3	flexible-extract	5	exact_match	↑	0.9439	±	0.0063
		strict-match	5	exact_match	↑	0.9424	±	0.0064

TODO
perf test

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

gemini-code-assist

Code Review

This pull request introduces fusion for MLA (Multi-Head Latent Attention) RoPE and KV cache updates. It adds a new custom operator fused_rope_and_unified_mla_kv_cache_update, implements a pattern matcher for this fusion, and updates the RopeKVCacheFusionPass to handle MLA layers and FlashInfer chains. Additionally, platform checks for KV cache fusion were broadened from ROCm-only to all CUDA-like platforms. Feedback was provided regarding the consistency of the return value when the KV cache is empty.

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 54bfeb5f6c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

mergify · 2026-04-01T15:53:47Z

Hi @ZJY0516, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

mergify · 2026-05-23T07:51:36Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ZJY0516.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

init

59f30a6

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

gemini-code-assist Bot reviewed Mar 31, 2026

View reviewed changes

Comment thread vllm/compilation/passes/fusion/rope_kvcache_fusion.py

update

ed16288

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 force-pushed the mla-rope-cache-fuse branch from bfb3a37 to ed16288 Compare March 31, 2026 16:59

ZJY0516 added 3 commits April 1, 2026 13:21

update

8267cd1

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

a080f28

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

54bfeb5

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 marked this pull request as ready for review April 1, 2026 08:49

ZJY0516 requested review from BoyuanFeng, ProExpertProg, WoosukKwon, hmellor, houseroad, mgoin, robertgshaw2-redhat, tlrmchlsmth, vadiklyutiy, yewentao256, youkaichao and zou3519 as code owners April 1, 2026 08:49

chatgpt-codex-connector Bot reviewed Apr 1, 2026

View reviewed changes

Comment thread vllm/compilation/passes/fusion/rope_kvcache_fusion.py

ZJY0516 added 3 commits April 1, 2026 17:25

update

1bf01a6

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

7569aaa

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

update

32225d6

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

ZJY0516 added 2 commits April 2, 2026 00:17

update

b119296

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

Merge branch 'main' into mla-rope-cache-fuse

e8757cc

This was referenced Apr 10, 2026

BaseKVCacheMethod.apply_kv_cache captainpete/vllm#2

Open

Deterministic Hadamard KQ rotation captainpete/vllm#1

Open

Rohan138 mentioned this pull request Apr 20, 2026

[Performance][DSR1]: Fused RoPE+KVCache+q_concat for MLA #40392

Merged

4 tasks

mergify Bot added the needs-rebase label May 23, 2026

zou3519 removed their request for review June 1, 2026 21:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[compile] fuse rope and cache insertion for mla#38646

[compile] fuse rope and cache insertion for mla#38646
ZJY0516 wants to merge 10 commits into
vllm-project:mainfrom
ZJY0516:mla-rope-cache-fuse

ZJY0516 commented Mar 31, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

Uh oh!

mergify Bot commented Apr 1, 2026

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ZJY0516 commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mergify Bot commented Apr 1, 2026

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZJY0516 commented Mar 31, 2026 •

edited

Loading