Eagle: MM Cuda Graphs with MRope#28896
Conversation
There was a problem hiding this comment.
Code Review
This pull request enables CUDA graph compilation for multimodal Eagle models by specifying dynamic argument dimensions, which is a good improvement. However, the accompanying change to increase the size of the mrope_positions buffer by one seems to be an incomplete fix. Other related buffers used for CUDA graph inputs are not resized, posing a risk of out-of-bounds memory access. This should be addressed to ensure stability.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
4635816 to
46f1237
Compare
| @@ -118,7 +118,7 @@ def __init__( | |||
| if self.uses_mrope: | |||
| # M-RoPE need (3, max_num_tokens) | |||
| self.mrope_positions = torch.zeros( | |||
| (3, self.max_num_tokens), dtype=torch.int64, device=device | |||
| (3, self.max_num_tokens + 1), dtype=torch.int64, device=device | |||
There was a problem hiding this comment.
Context for this change is here:
vllm/vllm/v1/worker/gpu_model_runner.py
Line 491 in 4c23690
46f1237 to
47491c7
Compare
Signed-off-by: Izzy Putterman <iputterman@nvidia.com>
47491c7 to
586192f
Compare
Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Signed-off-by: Izzy Putterman <iputterman@nvidia.com> Co-authored-by: Cyrus Leung <tlleungac@connect.ust.hk>
Purpose
Enabling MRope for Eagle in #22872 broke torch compile, this was due to the fact that the positions now have dynamic shapes in last dim instead of dim=0. Porting that change from other MM models results torch compile + cuda graphs running.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.