[ModelRunner V2] Share identical MTP weights by njhill · Pull Request #42538 · vllm-project/vllm

njhill · 2026-05-13T15:46:33Z

This is already done in V1 but wasn't covered in V2.

Dedup identical layer weights and topk_indices_buffer
Skip in PP case

Without this, the CI fails one of the eagle tests which was tight on GPU memory.

Signed-off-by: Nick Hill <nickhill123@gmail.com>

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

gemini-code-assist

Code Review

This pull request refactors the weight-sharing logic between draft and target models in EAGLE speculative decoding by introducing a _should_share utility and adding support for sharing the topk_indices_buffer. Feedback identifies a logic inconsistency in _should_share where it fails to share when the draft is None, and recommends performing tensor comparisons on the GPU to avoid the memory overhead and latency of moving data to the CPU. Additionally, it is suggested to apply the Pipeline Parallelism check to lm_head sharing to ensure consistency with the embedding sharing logic.

Signed-off-by: Nick Hill <nickhill123@gmail.com>

yewentao256

Thanks for the work!

yewentao256

LGTM, thanks for the work!

Signed-off-by: Nick Hill <nickhill123@gmail.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

Signed-off-by: Nick Hill <nickhill123@gmail.com>

[ModelRunner V2] Share identical MTP weights

28c2b8d

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill requested a review from yewentao256 May 13, 2026 15:46

njhill requested a review from WoosukKwon as a code owner May 13, 2026 15:46

claude Bot reviewed May 13, 2026

View reviewed changes

gemini-code-assist Bot reviewed May 13, 2026

View reviewed changes

Comment thread vllm/v1/worker/gpu/spec_decode/eagle/utils.py Outdated

Comment thread vllm/v1/worker/gpu/spec_decode/eagle/utils.py

mergify Bot added the v1 label May 13, 2026

improve sharing check

4c3a24b

Signed-off-by: Nick Hill <nickhill123@gmail.com>

njhill added the ready ONLY add when PR is ready to merge/full CI is needed label May 13, 2026

yewentao256 reviewed May 13, 2026

View reviewed changes

Comment thread vllm/v1/worker/gpu/spec_decode/eagle/utils.py

njhill mentioned this pull request May 13, 2026

[Model Runner v2] Oracle for model runner v2 - qwen3 dense model by default [1/N] #39337

Merged

yewentao256 mentioned this pull request May 13, 2026

[Feature]: Migration from Model Runner v1 to Model Runner v2 #41286

Open

30 tasks

yewentao256 approved these changes May 13, 2026

View reviewed changes

njhill enabled auto-merge (squash) May 13, 2026 18:49

njhill merged commit a505cf8 into vllm-project:main May 13, 2026
71 checks passed

njhill deleted the mrv2-mtp-weightshare branch May 13, 2026 20:13

mfylcek pushed a commit to mfylcek/vllm that referenced this pull request May 19, 2026

[ModelRunner V2] Share identical MTP weights (vllm-project#42538)

3853a59

Signed-off-by: Nick Hill <nickhill123@gmail.com>

rishitdholakia13 pushed a commit to rishitdholakia13/vllm that referenced this pull request May 19, 2026

[ModelRunner V2] Share identical MTP weights (vllm-project#42538)

7ce4499

Signed-off-by: Nick Hill <nickhill123@gmail.com>

jhu960213 pushed a commit to jhu960213/vllm that referenced this pull request May 20, 2026

[ModelRunner V2] Share identical MTP weights (vllm-project#42538)

8038958

Signed-off-by: Nick Hill <nickhill123@gmail.com>

h1t35h pushed a commit to h1t35h/vllm that referenced this pull request May 21, 2026

[ModelRunner V2] Share identical MTP weights (vllm-project#42538)

fde1f5a

Signed-off-by: Nick Hill <nickhill123@gmail.com>

pasta-paul mentioned this pull request May 23, 2026

DSV4-Pro MTP draft: stacked attn FP8 scale loader gap + MTP forward-path mainline-vs-fork divergence #43472

Closed

mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026

[ModelRunner V2] Share identical MTP weights (vllm-project#42538)

b647ee2

Signed-off-by: Nick Hill <nickhill123@gmail.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

This was referenced Jun 5, 2026

[RFC]: MTP speculative decoding under pipeline parallelism (PP>1) #44697

Open

[Spec][PP] Support MTP speculative decoding under pipeline parallelism (PP>1) #44698

Open

knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026

[ModelRunner V2] Share identical MTP weights (vllm-project#42538)

432ad38

Signed-off-by: Nick Hill <nickhill123@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ModelRunner V2] Share identical MTP weights#42538

[ModelRunner V2] Share identical MTP weights#42538
njhill merged 2 commits into
vllm-project:mainfrom
njhill:mrv2-mtp-weightshare

njhill commented May 13, 2026

Uh oh!

claude Bot left a comment

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

yewentao256 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

njhill commented May 13, 2026

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yewentao256 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants