Skip to content

[compile] fuse rope and cache insertion for mla#38646

Open
ZJY0516 wants to merge 10 commits into
vllm-project:mainfrom
ZJY0516:mla-rope-cache-fuse
Open

[compile] fuse rope and cache insertion for mla#38646
ZJY0516 wants to merge 10 commits into
vllm-project:mainfrom
ZJY0516:mla-rope-cache-fuse

Conversation

@ZJY0516

@ZJY0516 ZJY0516 commented Mar 31, 2026

Copy link
Copy Markdown
Member

Purpose

add a compilation pass for rope and cache insertion in mla

Test Plan

vllm serve deepseek-ai/DeepSeek-V3 -tp 8 -cc.pass_config.fuse_rope_kvcache=True -cc.use_inductor_graph_partition=True

lm_eval --model local-completions --model_args "model=deepseek-ai/DeepSeek-V3,base_url=http://0.0.0.0:8000/v1/completions,tokenized_requests=False,tokenizer_backend=None,num_concurrent=256,timeout=5000,max_length=4096" --tasks gsm8k --num_fewshot 5

Test Result

Tasks Version Filter n-shot Metric Value Stderr
gsm8k 3 flexible-extract 5 exact_match 0.9439 ± 0.0063
strict-match 5 exact_match 0.9424 ± 0.0064

TODO
perf test


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces fusion for MLA (Multi-Head Latent Attention) RoPE and KV cache updates. It adds a new custom operator fused_rope_and_unified_mla_kv_cache_update, implements a pattern matcher for this fusion, and updates the RopeKVCacheFusionPass to handle MLA layers and FlashInfer chains. Additionally, platform checks for KV cache fusion were broadened from ROCm-only to all CUDA-like platforms. Feedback was provided regarding the consistency of the return value when the KV cache is empty.

Comment thread vllm/compilation/passes/fusion/rope_kvcache_fusion.py
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
@ZJY0516 ZJY0516 force-pushed the mla-rope-cache-fuse branch from bfb3a37 to ed16288 Compare March 31, 2026 16:59
ZJY0516 added 3 commits April 1, 2026 13:21
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 54bfeb5f6c

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread vllm/compilation/passes/fusion/rope_kvcache_fusion.py
ZJY0516 added 3 commits April 1, 2026 17:25
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
@mergify

mergify Bot commented Apr 1, 2026

Copy link
Copy Markdown
Contributor

Hi @ZJY0516, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

ZJY0516 added 2 commits April 2, 2026 00:17
Signed-off-by: zjy0516 <riverclouds.zhu@qq.com>
@mergify

mergify Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @ZJY0516.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label May 23, 2026
@zou3519 zou3519 removed their request for review June 1, 2026 21:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant