Skip to content

Eagle: MM Cuda Graphs with MRope#28896

Merged
benchislett merged 2 commits intovllm-project:mainfrom
IzzyPutterman:iputterman/mm-eagle-cg
Nov 19, 2025
Merged

Eagle: MM Cuda Graphs with MRope#28896
benchislett merged 2 commits intovllm-project:mainfrom
IzzyPutterman:iputterman/mm-eagle-cg

Conversation

@IzzyPutterman
Copy link
Copy Markdown
Contributor

@IzzyPutterman IzzyPutterman commented Nov 18, 2025

Purpose

Enabling MRope for Eagle in #22872 broke torch compile, this was due to the fact that the positions now have dynamic shapes in last dim instead of dim=0. Porting that change from other MM models results torch compile + cuda graphs running.

Test Plan

Test Result


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables CUDA graph compilation for multimodal Eagle models by specifying dynamic argument dimensions, which is a good improvement. However, the accompanying change to increase the size of the mrope_positions buffer by one seems to be an incomplete fix. Other related buffers used for CUDA graph inputs are not resized, posing a risk of out-of-bounds memory access. This should be addressed to ensure stability.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@@ -118,7 +118,7 @@ def __init__(
if self.uses_mrope:
# M-RoPE need (3, max_num_tokens)
self.mrope_positions = torch.zeros(
(3, self.max_num_tokens), dtype=torch.int64, device=device
(3, self.max_num_tokens + 1), dtype=torch.int64, device=device
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Context for this change is here:

# NOTE: `mrope_positions` is implemented with one additional dummy

Copy link
Copy Markdown
Collaborator

@benchislett benchislett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-project-automation github-project-automation bot moved this to In review in NVIDIA Nov 19, 2025
@benchislett benchislett added the ready ONLY add when PR is ready to merge/full CI is needed label Nov 19, 2025
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
@benchislett benchislett merged commit 02f5903 into vllm-project:main Nov 19, 2025
51 checks passed
@github-project-automation github-project-automation bot moved this from In review to Done in NVIDIA Nov 19, 2025
devpatelio pushed a commit to SumanthRH/vllm that referenced this pull request Nov 29, 2025
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
kitaekatt pushed a commit to kitaekatt/vllm that referenced this pull request Dec 1, 2025
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

llama Related to Llama models nvidia ready ONLY add when PR is ready to merge/full CI is needed speculative-decoding v1

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

3 participants