[CustomOp][MM] Extract MMEncoderAttention as CustomOp and replace the backend of QwenVisionAttention with it. by shen-shanshan · Pull Request #30125 · vllm-project/vllm

shen-shanshan · 2025-12-05T09:47:09Z

Purpose

To avoid maintaining a variety of modeling files in vllm-ascend, we propose to remove all files in models dir in vllm-ascend. After this, the only thing a vllm plugin need to do is just registering their custom device-specific OOT ops to vllm when adding a new model. To achieve this, there are some refactors need to be done both in vllm and vllm-ascend, such as extracting some general layers as CustomOp, find more details at vllm-project/vllm-ascend#4084.

Following #27919 and #27147, this PR has unified the getting logic of vit_attn_backend and extracted MMEncoderAttention as a CustomOp.

To be specific, vision attention backend should only be checked and overwritten in the platform-specific implementation. We should not overwrite this logic in any other places, such as model_executor/models/<model_name>.py. In addition, I have moved scattered forward dispatch logic into this CustomOp to avoid verification for current_platform in any other places.

To minimize the influence, I only replaced the backend of QwenVisionAttention with this CustomOp and have tested this PR both on Ascend A2 NPU and NVIDIA A100 GPU (TODO). I will modify other modeling files and delete the old MultiHeadAttention in the future if this PR could be merged.

Test Plan

Test this PR together with [CustomOp] Register AscendMMEncoderAttention CustomOp and remove related patch vllm-ascend#4750 on Ascend A2 NPU.

Test Result

✅ Ascend A2 NPU

Run Qwen2.5-VL:

vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct \
--max_model_len 16384 \
--max-num-batched-tokens 16384 \
--tensor-parallel-size 2 \
--enforce-eager

Output:

{"id":"chatcmpl-b4e3053f30ab2442","object":"chat.completion","created":1764922950,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the image is \"TONGYI Qwen.\" The word \"TONGYI\" is written in blue, and \"Qwen\" is written in gray. The font appears to be modern and clean, with \"TONGYI\" being slightly larger than \"Qwen.\" The design includes a geometric, abstract shape on the left side of the logo, which complements the text.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":78,"total_tokens":162,"completion_tokens":84,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

Run Qwen3-VL:

vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct \
--max_model_len 16384 \
--tensor-parallel-size 2 \
--enforce-eager

Output:

{"id":"chatcmpl-97571fbda8267bd1","object":"chat.completion","created":1764923306,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the illustration is **“TONGYI Qwen”**.\n\n### How it looks:\n- **“TONGYI”** is written in **uppercase letters** in a **bold, modern sans-serif font**, colored **blue**.\n- **“Qwen”** is written in **lowercase letters** in a **slightly thinner, elegant sans-serif font**, colored **dark gray**.\n- The two lines of text are stacked vertically, with “TONG","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":112,"total_tokens":212,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: shen-shanshan 467638484@qq.com
Co-authored-by: Isotr0py mozf@mail2.sysu.edu.cn
Co-authored-by: tjtanaa tunjian.tan@embeddedllm.com

gemini-code-assist

Code Review

This pull request is a good step towards refactoring the attention mechanisms and making the codebase more modular by introducing MMEncoderAttention as a CustomOp. The unification of the vision attention backend logic is also a welcome improvement.

I've found a critical bug in vllm/attention/layer.py where a variable was renamed but not all its usages were updated, which would cause a runtime error. I've also pointed out an instance of code duplication in the new mm_encoder_attention.py file that should be addressed to improve maintainability.

Once these issues are resolved, this PR will be a solid contribution to the project's architecture.

vllm/attention/layer.py

vllm/attention/layers/mm_encoder_attention.py

chatgpt-codex-connector

💡 Codex Review

https://github.com/vllm-project/vllm/blob/a995c1480683198b2f5ee9e5fcc8e149bdae8790/vllm/model_executor/models/paddleocr_vl.py#L612-L616
PaddleOCR vision attention calls helper with outdated signature

maybe_get_vit_flash_attn_backend now only accepts the backend and returns a single function, but the PaddleOCR vision attention still unpacks two return values and passes attn_backend_override. Instantiating this module will now raise TypeError: maybe_get_vit_flash_attn_backend() got an unexpected keyword argument 'attn_backend_override', preventing the model from loading.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/attention/layer.py

mergify · 2025-12-08T11:44:34Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @shen-shanshan.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm/model_executor/models/vision.py

vllm/attention/layers/mm_encoder_attention.py

shen-shanshan · 2025-12-08T12:30:52Z

CC @Isotr0py @tjtanaa @DarkLight1337

Isotr0py

Thanks for this effort! I left some initial comments, and will further look into this tomorrow. PTAL!

vllm/platforms/interface.py

vllm/attention/layers/mm_encoder_attention.py

vllm/model_executor/models/qwen2_5_vl.py

vllm/platforms/tpu.py

Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: shen-shanshan <467638484@qq.com>

Signed-off-by: shen-shanshan <467638484@qq.com>

Fix for upstream PR: vllm-project/vllm#30125 Signed-off-by: Paweł Olejniczak <polejniczakx@habana.ai>

… backend of QwenVisionAttention with it. (vllm-project#30125) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>

… backend of QwenVisionAttention with it. (vllm-project#30125) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: Joachim Studnia <joachim@mistral.ai>

… backend of QwenVisionAttention with it. (vllm-project#30125) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>

JartX · 2025-12-16T14:41:13Z

Hi to everyone
@shen-shanshan @Isotr0py @tjtanaa has broken TORCH.SDPA attention on vit and now come back to hallucinate:

#27744 (comment)

Have checked that has changed:

https://github.com/vllm-project/vllm/pull/30125/changes#diff-65596b65d014c0a5be414881cb834532b23b593599ae1126aafd575af0218225

At removed the Contiguous
This means that when, for example, you ask them for a bill, they might say that what they're looking at is a cake or that it's a Chinese restaurant.

tjtanaa · 2025-12-16T14:51:03Z

@JartX let me fix it quickly. Thank you for pinging.

tjtanaa · 2025-12-16T15:17:34Z

@JartX This is the fix #30789

JartX · 2025-12-16T17:58:01Z

@tjtanaa many thanks :D

…project#718) Fix for upstream PR: vllm-project/vllm#30125 Signed-off-by: Paweł Olejniczak <polejniczakx@habana.ai> Signed-off-by: lvkaokao <kaokao.lv@intel.com>

AndreasKaratzas · 2025-12-20T02:29:25Z

@shen-shanshan Hello, I guess I am late for this one but would appreciate if we could address the following issue. First of all, I see that all tests proposed here tests/models/multimodal/generation/test_vit_backend_functionality.py are skipped and the reason is Broken test due to memory segmentation fault. So we should probably make those work. Furthermore this PR brakes ROCm on several parts:

FAILED models/multimodal/generation/test_pixtral.py::test_chat[bfloat16-8192-mistralai/Pixtral-12B-2409] - AssertionError: Test2:
FAILED models/multimodal/generation/test_pixtral.py::test_chat[bfloat16-65536-mistralai/Pixtral-12B-2409] - AssertionError: Test2:
FAILED models/multimodal/pooling/test_intern_vit.py::test_models[half-OpenGVLab/InternViT-300M-448px] - RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
FAILED models/multimodal/pooling/test_intern_vit.py::test_models[half-OpenGVLab/InternViT-6B-448px-V1-5] - RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
FAILED models/multimodal/pooling/test_radio.py::test_radio[half-nvidia/C-RADIOv2-H] - RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.
FAILED models/multimodal/pooling/test_radio.py::test_radio[bfloat16-nvidia/C-RADIOv2-H] - RuntimeError: view size is not compatible with input tensor's size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(...) instead.

I would appreciate some help here to get ROCm also functional wrt the recent SDPA changes.

cc @tjtanaa @JartX @Isotr0py

AndreasKaratzas · 2025-12-20T02:32:03Z

Btw regarding the test tests/models/multimodal/generation/test_vit_backend_functionality.py, if I remove that pytest.skip line, everything passes on ROCm. But the current SDPA implementation is still broken on ROCm. Which means that the proposed test does not cover properly the newly proposed changes.

AndreasKaratzas · 2025-12-20T20:54:36Z

After debugging this a bit, I realized that it's the flash attention that is broken on ROCm, not SDPA. But because Pixtral is probably skipped on NVIDIA due to large memory requirements, it was not migrated to the new MMEncoderAttention. I'll continue debugging flash attention on ROCm.

AndreasKaratzas · 2025-12-20T21:20:07Z

Edit: SDPA is also buggy on ROCm with bfloat16 or float16, the test (pytest -s -v tests/models/multimodal/generation/test_pixtral.py::test_chat[bfloat16-8192-mistralai/Pixtral-12B-2409]) only passed with float32

AndreasKaratzas · 2025-12-20T21:42:23Z

Final Update: After playing more with the attention backends, I can report the following:

sdpa for vit with triton fails with float16 and bfloat16
flash attention for vit with triton fails with float16 and bfloat16
sdpa for vit with triton passes with float32
aiter flash attention for vit with triton backend fails with float16 and bloat16
aiter flash attention for vit with aiter flash attention backend passes with both float16 and bfloat16
flash attention for vit with aiter flash attention backend passes (I only tested bfloat16)
sdpa for vit with aiter flash attention backend passes (I only tested bfloat16)

Personal verdict: Something is going on with the Triton backend when handling vision embeds in 16-bit mode on ROCm. Since this PR had nothing to do with the above, and only to do with vision backends, my initial claim that this PR broke ROCm is probably invalid. I apologize for this mistake. I will investigate anything related to my final observation in a different thread/issue.

…ted patch (#4750) ### What this PR does / why we need it? Following vllm-project/vllm#30125, register `AscendMMEncoderAttention` CustomOp and remove related patch. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ✅ Run Qwen2.5-VL: ```bash vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct \ --max_model_len 16384 ``` Output: ``` {"id":"chatcmpl-b4e3053f30ab2442","object":"chat.completion","created":1764922950,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the image is \"TONGYI Qwen.\" The word \"TONGYI\" is written in blue, and \"Qwen\" is written in gray. The font appears to be modern and clean, with \"TONGYI\" being slightly larger than \"Qwen.\" The design includes a geometric, abstract shape on the left side of the logo, which complements the text.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":78,"total_tokens":162,"completion_tokens":84,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null} ``` ✅ Run Qwen3-VL: ```bash vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct \ --max_model_len 16384 ``` Output: ``` {"id":"chatcmpl-97571fbda8267bd1","object":"chat.completion","created":1764923306,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the illustration is **“TONGYI Qwen”**.\n\n### How it looks:\n- **“TONGYI”** is written in **uppercase letters** in a **bold, modern sans-serif font**, colored **blue**.\n- **“Qwen”** is written in **lowercase letters** in a **slightly thinner, elegant sans-serif font**, colored **dark gray**.\n- The two lines of text are stacked vertically, with “TONG","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":112,"total_tokens":212,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null} ``` - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: shen-shanshan <467638484@qq.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com>

… backend of QwenVisionAttention with it. (vllm-project#30125) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: Ubuntu <mjtaheri68@gmail.com>

… backend of QwenVisionAttention with it. (vllm-project#30125) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: dsuhinin <suhinin.dmitriy@gmail.com>

…project#718) Fix for upstream PR: vllm-project/vllm#30125 Signed-off-by: Paweł Olejniczak <polejniczakx@habana.ai>

… backend of QwenVisionAttention with it. (vllm-project#30125) Signed-off-by: shen-shanshan <467638484@qq.com> Signed-off-by: Isotr0py <mozf@mail2.sysu.edu.cn> Signed-off-by: tjtanaa <tunjian.tan@embeddedllm.com> Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com>

…ted patch (vllm-project#4750) ### What this PR does / why we need it? Following vllm-project/vllm#30125, register `AscendMMEncoderAttention` CustomOp and remove related patch. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? ✅ Run Qwen2.5-VL: ```bash vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct \ --max_model_len 16384 ``` Output: ``` {"id":"chatcmpl-b4e3053f30ab2442","object":"chat.completion","created":1764922950,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the image is \"TONGYI Qwen.\" The word \"TONGYI\" is written in blue, and \"Qwen\" is written in gray. The font appears to be modern and clean, with \"TONGYI\" being slightly larger than \"Qwen.\" The design includes a geometric, abstract shape on the left side of the logo, which complements the text.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":78,"total_tokens":162,"completion_tokens":84,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null} ``` ✅ Run Qwen3-VL: ```bash vllm serve /root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct \ --max_model_len 16384 ``` Output: ``` {"id":"chatcmpl-97571fbda8267bd1","object":"chat.completion","created":1764923306,"model":"/root/.cache/modelscope/hub/models/Qwen/Qwen3-VL-8B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"The text in the illustration is **“TONGYI Qwen”**.\n\n### How it looks:\n- **“TONGYI”** is written in **uppercase letters** in a **bold, modern sans-serif font**, colored **blue**.\n- **“Qwen”** is written in **lowercase letters** in a **slightly thinner, elegant sans-serif font**, colored **dark gray**.\n- The two lines of text are stacked vertically, with “TONG","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"length","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":112,"total_tokens":212,"completion_tokens":100,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null} ``` - vLLM version: v0.12.0 - vLLM main: vllm-project/vllm@ad32e3e --------- Signed-off-by: shen-shanshan <467638484@qq.com> Co-authored-by: Yikun Jiang <yikunkero@gmail.com> Signed-off-by: zrj026 <zhangrunjiang026@gmail.com>

shen-shanshan requested review from LucasWilkinson, NickLucche, jikunshang, sighingnow and tjtanaa as code owners December 5, 2025 09:47

mergify bot added qwen Related to Qwen models nvidia rocm Related to AMD ROCm tpu Related to Google TPUs labels Dec 5, 2025

github-project-automation bot added this to NVIDIA Dec 5, 2025

shen-shanshan mentioned this pull request Dec 5, 2025

[RFC]: Remove VL Modeling Files vllm-project/vllm-ascend#4084

Closed

17 tasks

gemini-code-assist bot reviewed Dec 5, 2025

View reviewed changes

vllm/attention/layer.py Show resolved Hide resolved

vllm/attention/layers/mm_encoder_attention.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Dec 5, 2025

View reviewed changes

vllm/attention/layer.py Show resolved Hide resolved

shen-shanshan marked this pull request as draft December 5, 2025 09:51

mergify bot added the needs-rebase label Dec 8, 2025

shen-shanshan force-pushed the vit branch from a995c14 to 79733e7 Compare December 8, 2025 12:11

shen-shanshan marked this pull request as ready for review December 8, 2025 12:11

mergify bot removed the needs-rebase label Dec 8, 2025

chatgpt-codex-connector bot reviewed Dec 8, 2025

View reviewed changes

vllm/model_executor/models/vision.py Outdated Show resolved Hide resolved

vllm/attention/layers/mm_encoder_attention.py Outdated Show resolved Hide resolved

DarkLight1337 requested a review from Isotr0py December 8, 2025 12:33

Isotr0py reviewed Dec 8, 2025

View reviewed changes

vllm/platforms/interface.py Show resolved Hide resolved

vllm/attention/layers/mm_encoder_attention.py Outdated Show resolved Hide resolved

vllm/model_executor/models/qwen2_5_vl.py Show resolved Hide resolved

vllm/platforms/tpu.py Outdated Show resolved Hide resolved

Isotr0py assigned Isotr0py and tjtanaa Dec 8, 2025

shen-shanshan added 4 commits December 9, 2025 08:23

extract mm encoder attention as custom op.

82648be

Co-authored-by: Isotr0py <mozf@mail2.sysu.edu.cn> Co-authored-by: tjtanaa <tunjian.tan@embeddedllm.com> Signed-off-by: shen-shanshan <467638484@qq.com>

fix

3b6bf39

Signed-off-by: shen-shanshan <467638484@qq.com>

update

8676aa8

Signed-off-by: shen-shanshan <467638484@qq.com>

address comments

628dbfa

Signed-off-by: shen-shanshan <467638484@qq.com>

shen-shanshan force-pushed the vit branch from 626d3bb to 628dbfa Compare December 9, 2025 08:23

pawel-olejniczak mentioned this pull request Dec 15, 2025

[FIX_FOR_VLLM_LATEST] Remove use_data_parallel from qwen2_5_vl vllm-project/vllm-gaudi#718

Merged

iboiko-habana pushed a commit to vllm-project/vllm-gaudi that referenced this pull request Dec 15, 2025

[FIX_FOR_VLLM_LATEST] Remove use_data_parallel from qwen2_5_vl (#718)

262802a

Fix for upstream PR: vllm-project/vllm#30125 Signed-off-by: Paweł Olejniczak <polejniczakx@habana.ai>

NickLucche mentioned this pull request Dec 16, 2025

[Core] WhisperEncoder support torch.compile #30549

Open

tjtanaa mentioned this pull request Dec 16, 2025

[ROCm] [Bugfix] Fix torch sdpa hallucination #30789

Merged

5 tasks

shen-shanshan mentioned this pull request Dec 19, 2025

[CustomOp] Register AscendMMEncoderAttention CustomOp and remove related patch vllm-project/vllm-ascend#4750

Merged

shen-shanshan mentioned this pull request Dec 23, 2025

[Doc] Add developer guide for CustomOp #30886

Merged

5 tasks

Isotr0py mentioned this pull request Jan 5, 2026

[Models]: Use MMEncoderAttention for MoonViT #31738

Merged

5 tasks

Uh oh!

Conversation

shen-shanshan commented Dec 5, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

✅ Ascend A2 NPU

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

mergify bot commented Dec 8, 2025

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

shen-shanshan commented Dec 8, 2025

Uh oh!

Isotr0py left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

JartX commented Dec 16, 2025

Uh oh!

tjtanaa commented Dec 16, 2025

Uh oh!

tjtanaa commented Dec 16, 2025

Uh oh!

JartX commented Dec 16, 2025

Uh oh!

AndreasKaratzas commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreasKaratzas commented Dec 20, 2025

Uh oh!

AndreasKaratzas commented Dec 20, 2025

Uh oh!

AndreasKaratzas commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AndreasKaratzas commented Dec 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

shen-shanshan commented Dec 5, 2025 •

edited by github-actions bot

Loading

AndreasKaratzas commented Dec 20, 2025 •

edited

Loading

AndreasKaratzas commented Dec 20, 2025 •

edited

Loading

AndreasKaratzas commented Dec 20, 2025 •

edited

Loading