[Bugfix] [Model] Missing MRoPE function definition from KeyeForConditionalGeneration#27895
Conversation
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
There was a problem hiding this comment.
Code Review
This pull request correctly identifies and fixes a missing get_mrope_input_positions function for KeyeForConditionalGeneration by adding the function and the SupportsMRoPE interface. The changes also include a necessary ROCm-specific bugfix and refactoring of the attention backend selection, which improves code clarity. A new unit test is added to validate the fix. However, I've found a critical issue in the implementation of get_mrope_input_positions that could lead to a crash under certain conditions.
| if isinstance(video_grid_thw, list) and len(video_grid_thw) > 0: | ||
| video_grid_thw = video_grid_thw[0] |
There was a problem hiding this comment.
This conditional statement appears to incorrectly handle the case where video_grid_thw is a list[list[int]]. If video_grid_thw is a list of multiple video grids (e.g., [[t1, h1, w1], [t2, h2, w2]]), this line will slice it to just the first grid ([t1, h1, w1]). When this 1D list is passed to split_thw, it will be converted to a 1D tensor, causing an indexing error at grid_thw[:, 0] and crashing the execution. Since split_thw is already capable of handling a list[list[int]] by converting it to a 2D tensor, this slicing logic is both incorrect and unnecessary. Removing it will ensure correct behavior for all valid input types.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if current_platform.is_cuda(): | ||
| from vllm.vllm_flash_attn.layers.rotary import apply_rotary_emb | ||
| elif current_platform.is_rocm(): | ||
| from flash_attn.ops.triton.rotary import apply_rotary as apply_rotary_emb | ||
|
|
||
| q_embed = apply_rotary_emb(q.float(), cos.float(), sin.float()).type_as(q) | ||
| k_embed = apply_rotary_emb(k.float(), cos.float(), sin.float()).type_as(k) |
There was a problem hiding this comment.
Call ROCm rotary kernel with wrong signature
In apply_rotary_pos_emb_flashatt the ROCm branch imports flash_attn.ops.triton.rotary.apply_rotary, which expects both q and k tensors and returns the rotated pair. The current implementation calls this kernel twice with only (tensor, cos, sin) just like the CUDA wrapper. On ROCm this will raise a TypeError for the missing argument and prevents rotary embeddings from being applied. The ROCm path should invoke apply_rotary(q, k, cos, sin) once and unpack the returned tensors, mirroring the existing usage in layers/rotary_embedding/common.py.
Useful? React with 👍 / 👎.
Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
…tionalGeneration` (vllm-project#27895) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
…tionalGeneration` (vllm-project#27895) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
…tionalGeneration` (vllm-project#27895) (vllm-project#5) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: TJian <tunjian.tan@embeddedllm.com>
…tionalGeneration` (vllm-project#27895) Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com>
Purpose
This is a bugfix for
Kwai-Keye/Keye-VL-8B-Previewas an effort to complete the unit test for upcoming RFC ViT Attention Reorganization. The fix will be used as validation of RFC correctness.Ever since the shift of
get_mrope_input_positionsrole from the GPU runner into model definition file. This model is broken as it is missingget_mrope_input_positionsandSupportsMRoPE.I am not exactly sure how the exact
get_mrope_input_positionswould look like. I am referring toKeyeVL1_5ForConditionalGeneration'sget_mrope_input_positionsimplementation.I have also made a ROCm specific bugfix. But the major issue is the model is broken.
CC model author of PR #20126 , @Kwai-Keye .
Test Plan
Add a new unit test and ensure it pass the simple unit test where it outputs sensible data
tests/models/multimodal/generation/test_keye.pyChartQA lm eval
Test Result
ChartQA Lmeval score
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.