Skip to content

[ModelRunner V2] Share identical MTP weights#42538

Merged
njhill merged 2 commits into
vllm-project:mainfrom
njhill:mrv2-mtp-weightshare
May 13, 2026
Merged

[ModelRunner V2] Share identical MTP weights#42538
njhill merged 2 commits into
vllm-project:mainfrom
njhill:mrv2-mtp-weightshare

Conversation

@njhill

@njhill njhill commented May 13, 2026

Copy link
Copy Markdown
Member

This is already done in V1 but wasn't covered in V2.

  • Dedup identical layer weights and topk_indices_buffer
  • Skip in PP case

Without this, the CI fails one of the eagle tests which was tight on GPU memory.

Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill requested a review from yewentao256 May 13, 2026 15:46
@njhill njhill requested a review from WoosukKwon as a code owner May 13, 2026 15:46

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the weight-sharing logic between draft and target models in EAGLE speculative decoding by introducing a _should_share utility and adding support for sharing the topk_indices_buffer. Feedback identifies a logic inconsistency in _should_share where it fails to share when the draft is None, and recommends performing tensor comparisons on the GPU to avoid the memory overhead and latency of moving data to the CPU. Additionally, it is suggested to apply the Pipeline Parallelism check to lm_head sharing to ensure consistency with the embedding sharing logic.

Comment thread vllm/v1/worker/gpu/spec_decode/eagle/utils.py Outdated
Comment thread vllm/v1/worker/gpu/spec_decode/eagle/utils.py
@mergify mergify Bot added the v1 label May 13, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
@njhill njhill added the ready ONLY add when PR is ready to merge/full CI is needed label May 13, 2026

@yewentao256 yewentao256 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the work!

Comment thread vllm/v1/worker/gpu/spec_decode/eagle/utils.py

@yewentao256 yewentao256 left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

@njhill njhill enabled auto-merge (squash) May 13, 2026 18:49
@njhill njhill merged commit a505cf8 into vllm-project:main May 13, 2026
71 checks passed
@njhill njhill deleted the mrv2-mtp-weightshare branch May 13, 2026 20:13
mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
rishitdholakia13 pushed a commit to rishitdholakia13/vllm that referenced this pull request May 19, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
h1t35h pushed a commit to h1t35h/vllm that referenced this pull request May 21, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026
Signed-off-by: Nick Hill <nickhill123@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants