[XPU] Whisper model support on XPU Platform by chaojun-zhang · Pull Request #25123 · vllm-project/vllm

chaojun-zhang · 2025-09-18T01:47:19Z

Purpose

Add Whisper model support on XPU

Test Plan

VLLM_USE_V1=1 XPU_CCL_BACKEND=xccl CCL_ATL_SHM=1 VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 VLLM_WORKER_MULTIPROC_METHOD=spawn python3 -m vllm.entrypoints.openai.api_server --model openai/whisper-large-v3 --dtype=float16 --enforce-eager --port 8000 --trust-remote-code --max_num_batched_tokens 32768 --gpu-memory-util 0.85

Test Result

with this PR:

server started

Without this pr:

raise error "ViT attention hasn't supported _Backend.IPEX "
raise error "_Backend.IPEX" when bind_kv_cache

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

Signed-off-by: chzhang <chaojun.zhang@intel.com>

gemini-code-assist

Code Review

This pull request adds support for the Whisper model on the XPU platform. The changes are minimal and confined to two files, vllm/attention/layer.py and vllm/v1/worker/utils.py. In both cases, the changes add current_platform.is_xpu() to existing platform-specific conditional logic to make XPU behave similarly to CUDA or ROCm. Specifically, it forces the use of the TORCH_SDPA attention backend for MultiHeadAttention and allows a simplified KV cache binding logic for encoder-decoder models. These changes appear to be a correct and standard approach for enabling a new hardware backend. I have not found any issues of high or critical severity.

jikunshang

thanks for fixing.

…litPR into model_register * 'model_register' of https://github.com/dsxsteven/vllm_splitPR: (138 commits) Retrieve `sliding_window` from text config in Gemma3 MM (vllm-project#25085) [Docs] Fix API Reference (vllm-project#25140) [Kernel] Better inf handling for grouped topk cu (vllm-project#24886) [CLI] Use streaming in CLI chat and completion commands (vllm-project#23769) [benchmark] add peak throughput metrics and plot (vllm-project#23867) [Spec Decode] Efficient padded speculation (vllm-project#24539) [V0 Deprecation] Remove more V0 tests (vllm-project#25117) [EPLB] Add EPLB support for hunyuan_v1 (vllm-project#23078) [XPU] Whisper model support on XPU Platform (vllm-project#25123) Mark prompt logprobs as incompatible with prompt embeds at API level (vllm-project#25077) [Model] enable data parallel for InternVL vision encoder (vllm-project#23909) [Kernels] Overlap shared experts with combine instead of dispatch (vllm-project#24254) [Bugfix][Qwen3-Next] add prefixes to shared_expert in qwen3-next and mlp in qwen2moe to successfully load ignored params in quantized models (vllm-project#24960) [Core][MM] Cleanup `MultiModalCache` (vllm-project#25006) [Docs] Clean up the contributing README (vllm-project#25099) [MM Encoder] Apply DP ViT for Qwen3-VL model series (vllm-project#24955) [Kernels] Enable DeepGEMM by default (vllm-project#24462) [V0 Deprecation] Skip PP test (vllm-project#25128) [V0 Deprecation] Remove misc V0 tests (vllm-project#25118) [V0 Deprecation] Remove V0 Tracing & Metrics tests (vllm-project#25115) ...

Signed-off-by: chzhang <chaojun.zhang@intel.com>

Signed-off-by: chzhang <chaojun.zhang@intel.com> Signed-off-by: charlifu <charlifu@amd.com>

Signed-off-by: chzhang <chaojun.zhang@intel.com>

Whisper model support on XPU

4b42426

Signed-off-by: chzhang <chaojun.zhang@intel.com>

chaojun-zhang requested review from LucasWilkinson, WoosukKwon, alexm-redhat, comaniac, njhill, robertgshaw2-redhat and ywang96 as code owners September 18, 2025 01:47

mergify Bot added the v1 label Sep 18, 2025

gemini-code-assist Bot reviewed Sep 18, 2025

View reviewed changes

jikunshang approved these changes Sep 18, 2025

View reviewed changes

jikunshang enabled auto-merge (squash) September 18, 2025 02:57

github-actions Bot added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 18, 2025

jikunshang merged commit 3bc1812 into vllm-project:main Sep 18, 2025
58 checks passed

debroy-rh pushed a commit to debroy-rh/vllm that referenced this pull request Sep 19, 2025

[XPU] Whisper model support on XPU Platform (vllm-project#25123)

3ca64f2

Signed-off-by: chzhang <chaojun.zhang@intel.com>

ABC12345anouys pushed a commit to ABC12345anouys/vllm that referenced this pull request Sep 25, 2025

[XPU] Whisper model support on XPU Platform (vllm-project#25123)

392dbd2

Signed-off-by: chzhang <chaojun.zhang@intel.com>

charlifu pushed a commit to ROCm/vllm that referenced this pull request Sep 25, 2025

[XPU] Whisper model support on XPU Platform (vllm-project#25123)

5e3ce21

Signed-off-by: chzhang <chaojun.zhang@intel.com> Signed-off-by: charlifu <charlifu@amd.com>

choprahetarth pushed a commit to Tandemn-Labs/vllm that referenced this pull request Oct 11, 2025

[XPU] Whisper model support on XPU Platform (vllm-project#25123)

c9f5d83

Signed-off-by: chzhang <chaojun.zhang@intel.com>

faaany mentioned this pull request Dec 8, 2025

[Feature][XPU]: mulitmodal inputs support on Intel GPU #26965

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XPU] Whisper model support on XPU Platform#25123

[XPU] Whisper model support on XPU Platform#25123
jikunshang merged 1 commit intovllm-project:mainfrom
chaojun-zhang:whisper_model_support

chaojun-zhang commented Sep 18, 2025 •

edited by github-actions Bot

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

jikunshang left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

chaojun-zhang commented Sep 18, 2025 • edited by github-actions Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

jikunshang left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chaojun-zhang commented Sep 18, 2025 •

edited by github-actions Bot

Loading