Eagle: MM Cuda Graphs with MRope by IzzyPutterman · Pull Request #28896 · vllm-project/vllm

IzzyPutterman · 2025-11-18T01:52:38Z

Purpose

Enabling MRope for Eagle in #22872 broke torch compile, this was due to the fact that the positions now have dynamic shapes in last dim instead of dim=0. Porting that change from other MM models results torch compile + cuda graphs running.

Test Plan

Test Result

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

gemini-code-assist

Code Review

This pull request enables CUDA graph compilation for multimodal Eagle models by specifying dynamic argument dimensions, which is a good improvement. However, the accompanying change to increase the size of the mrope_positions buffer by one seems to be an incomplete fix. Other related buffers used for CUDA graph inputs are not resized, posing a risk of out-of-bounds memory access. This should be addressed to ensure stability.

vllm/v1/spec_decode/eagle.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/models/llama_eagle3.py

benchislett · 2025-11-19T04:13:32Z

vllm/v1/spec_decode/eagle.py

@@ -118,7 +118,7 @@ def __init__(
        if self.uses_mrope:
            # M-RoPE need (3, max_num_tokens)
            self.mrope_positions = torch.zeros(
-                (3, self.max_num_tokens), dtype=torch.int64, device=device
+                (3, self.max_num_tokens + 1), dtype=torch.int64, device=device


Context for this change is here:

vllm/vllm/v1/worker/gpu_model_runner.py

Line 491 in 4c23690

# NOTE: `mrope_positions` is implemented with one additional dummy

benchislett

LGTM

Signed-off-by: Izzy Putterman <iputterman@nvidia.com>

Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

IzzyPutterman requested review from benchislett and luccafong as code owners November 18, 2025 01:52

mergify bot added llama Related to Llama models nvidia speculative-decoding v1 labels Nov 18, 2025

github-project-automation bot added this to NVIDIA Nov 18, 2025

IzzyPutterman added speculative-decoding v1 llama Related to Llama models nvidia labels Nov 18, 2025

gemini-code-assist bot reviewed Nov 18, 2025

View reviewed changes

vllm/v1/spec_decode/eagle.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Nov 18, 2025

View reviewed changes

vllm/model_executor/models/llama_eagle3.py Show resolved Hide resolved

xinli-sw mentioned this pull request Nov 18, 2025

[Tracking Issue][Performance]: Speculative decoding performance/QoL improvements #28947

Open

24 tasks

IzzyPutterman force-pushed the iputterman/mm-eagle-cg branch 2 times, most recently from 4635816 to 46f1237 Compare November 18, 2025 23:27

benchislett reviewed Nov 19, 2025

View reviewed changes

benchislett approved these changes Nov 19, 2025

View reviewed changes

github-project-automation bot moved this to In review in NVIDIA Nov 19, 2025

benchislett added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 19, 2025

IzzyPutterman force-pushed the iputterman/mm-eagle-cg branch from 46f1237 to 47491c7 Compare November 19, 2025 05:05

Eagle: MM Cuda Graphs with MRope

586192f

Signed-off-by: Izzy Putterman <iputterman@nvidia.com>

IzzyPutterman force-pushed the iputterman/mm-eagle-cg branch from 47491c7 to 586192f Compare November 19, 2025 06:14

Merge branch 'main' into iputterman/mm-eagle-cg

ff7adef

benchislett merged commit 02f5903 into vllm-project:main Nov 19, 2025
51 checks passed

github-project-automation bot moved this from In review to Done in NVIDIA Nov 19, 2025

devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025

Eagle: MM Cuda Graphs with MRope (vllm-project#28896)

0fafed1

Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025

Eagle: MM Cuda Graphs with MRope (vllm-project#28896)

f3796fb

Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Eagle: MM Cuda Graphs with MRope#28896

Eagle: MM Cuda Graphs with MRope#28896
benchislett merged 2 commits intovllm-project:mainfrom
IzzyPutterman:iputterman/mm-eagle-cg

IzzyPutterman commented Nov 18, 2025 •

edited by github-actions bot

Loading

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

benchislett Nov 19, 2025

Uh oh!

benchislett left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

IzzyPutterman commented Nov 18, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

benchislett Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

benchislett left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

IzzyPutterman commented Nov 18, 2025 •

edited by github-actions bot

Loading