[XPU] Add XPU block-scaled W8A8 fp8 path by xwu-intel · Pull Request #39968 · vllm-project/vllm

xwu-intel · 2026-04-16T05:24:58Z

Purpose

This PR adds the XPU block-scaled W8A8 FP8 path and updates FP8 block kernel selection so XPU can fall back to Triton when the native XPU FP8 block kernel is unavailable.

Changes included in this update:

Enable TritonFp8BlockScaledMMKernel.is_supported() on XPU (in addition to CUDA-like).
Add TritonFp8BlockScaledMMKernel to the XPU FP8 block kernel candidate list as fallback.
Add unit tests.

Dependency:

[OneDNN] upgrade onednn to 3.12 and add fp8 block gemm vllm-xpu-kernels#173
Release of vllm_xpu_kernels-0.1.9 (https://github.com/vllm-project/vllm-xpu-kernels/releases)

Test Plan

gsm8k test on Qwen/Qwen3-4B-Instruct-2507-FP8 (see below comment)

Test Result

B60 gsm8k ACC test OK.

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results.
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

mergify · 2026-04-16T05:26:04Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @xwu-intel.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

gemini-code-assist

Code Review

This pull request implements FP8 block-scaled matrix multiplication support for XPU platforms. It adds a fake operator registration for fp8_gemm to support shape inference and introduces the _w8a8_xpu_block_scaled_mm utility along with a corresponding dispatch path in _dispatch_w8a8_blockscale_op. I have no feedback to provide.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

xwu-intel · 2026-05-15T15:04:32Z

Depends on fix: vllm-project/vllm-xpu-kernels#353

@zufangzhu pls help to review this. thx!

mergify · 2026-05-23T08:45:16Z

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @xwu-intel.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

jikunshang · 2026-05-29T04:09:47Z

please rebase and add some ut/exmaple in ci if possible.

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

…xpu-kernels pr vllm-project#173 Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

- Pass the new dummy flags to _C.per_token_group_fp8_quant in XPU paths\n- Remove obsolete scaled-mm kernel selection test file\n\nCo-authored-by: GitHub Copilot <noreply@github.com> Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Revert deletion by restoring tests/kernels/quantization/test_scaled_mm_kernel_selection.py to match origin/main.\n\nCo-authored-by: GitHub Copilot <noreply@github.com> Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Add test_xpu_fp8_scaled_mm.py to validate torch.ops._xpu_C.fp8_gemm correctness against native block-scaled matmul reference. Co-authored-by: GitHub Copilot Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com> Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Delete tests/kernels/quantization/test_xpu_fp8_scaled_mm.py as requested.\n\nCo-authored-by: GitHub Copilot <noreply@github.com> Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

mergify · 2026-06-02T05:50:49Z

Hi @xwu-intel, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?

mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:

# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com> Co-authored-by: Yuxiang <yuxiang.liang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com> Co-authored-by: Yuxiang <yuxiang.liang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: JisoLya <523420504@qq.com>

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com> Co-authored-by: Yuxiang <yuxiang.liang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com> Co-authored-by: Yuxiang <yuxiang.liang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>

mergify Bot added the intel-gpu Related to Intel GPU label Apr 16, 2026

mergify Bot added the needs-rebase label Apr 16, 2026

gemini-code-assist Bot reviewed Apr 16, 2026

View reviewed changes

This was referenced Apr 16, 2026

[OneDNN] upgrade onednn to 3.12 and add fp8 block gemm vllm-project/vllm-xpu-kernels#173

Merged

GLM-5 / 5.1 support and optimization plan vllm-project/vllm-xpu-kernels#224

Open

xwu-intel force-pushed the xwu/w8a8-xpu-blockscaled-mm branch from aca3738 to 841fee7 Compare May 9, 2026 03:20

mergify Bot removed the needs-rebase label May 9, 2026

jikunshang reviewed May 11, 2026

View reviewed changes

Comment thread vllm/model_executor/kernels/linear/__init__.py

xwu-intel force-pushed the xwu/w8a8-xpu-blockscaled-mm branch 2 times, most recently from 5773a6c to 617e11b Compare May 12, 2026 01:56

xwu-intel marked this pull request as ready for review May 12, 2026 02:08

xwu-intel requested review from WoosukKwon, mgoin, tlrmchlsmth, yewentao256 and zyongye as code owners May 12, 2026 02:08

claude Bot reviewed May 12, 2026

View reviewed changes

xwu-intel mentioned this pull request May 12, 2026

[XPU] Enable multiple key kernels for sparse attention #37888

Merged

2 tasks

xwu-intel force-pushed the xwu/w8a8-xpu-blockscaled-mm branch from dc316fe to 309cafa Compare May 12, 2026 06:14

jikunshang reviewed May 13, 2026

View reviewed changes

Comment thread vllm/model_executor/kernels/linear/scaled_mm/xpu.py Outdated

jikunshang mentioned this pull request May 18, 2026

Add DeepSeek-V4 XPU support with FP8 KV cache #42919

Closed

mergify Bot added the needs-rebase label May 23, 2026

jikunshang mentioned this pull request May 26, 2026

v0.1.9 release tracker vllm-project/vllm-xpu-kernels#331

Closed

3 tasks

yuwenzho mentioned this pull request May 27, 2026

[XPU] Enable compressed-tensors FP8 block-scaled quantization #43657

Closed

4 tasks

xwu-intel force-pushed the xwu/w8a8-xpu-blockscaled-mm branch from 0ad1d7c to 4254caf Compare May 29, 2026 07:22

xwu-intel requested a review from AndreasKaratzas as a code owner May 29, 2026 07:22

xwu-intel force-pushed the xwu/w8a8-xpu-blockscaled-mm branch from 4a99046 to 4d39fd9 Compare June 1, 2026 13:25

jikunshang reviewed Jun 1, 2026

View reviewed changes

Comment thread tests/kernels/quantization/test_xpu_fp8_scaled_mm.py Outdated

jikunshang mentioned this pull request Jun 1, 2026

[XPU][Bugfix] Fix per_token_group_fp8_quant missing dummy args on XPU #43930

Merged

4 tasks

jikunshang reviewed Jun 1, 2026

View reviewed changes

Comment thread vllm/model_executor/layers/quantization/utils/fp8_utils.py Outdated

xwu-intel and others added 11 commits June 2, 2026 03:41

Add XPU block-scaled W8A8 fp8 dispatch path

6c98a44

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Add XPU block-scaled W8A8 FP8 dispatch path via oneDNN based on vllm-…

469e26d

…xpu-kernels pr vllm-project#173 Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

add TritonFp8BlockScaledMMKernel for xpu

402d4c9

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

[XPU] Guard fp8 block-scaled support on fp8_gemm op

c7636d8

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

transpose weight scale

5fc01bb

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

fix pre-commit

626675e

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Restore scaled MM kernel selection test from main

cb808da

Revert deletion by restoring tests/kernels/quantization/test_scaled_mm_kernel_selection.py to match origin/main.\n\nCo-authored-by: GitHub Copilot <noreply@github.com> Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

nit

f5f4db2

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

Remove obsolete XPU FP8 scaled-mm test

40d2254

Delete tests/kernels/quantization/test_xpu_fp8_scaled_mm.py as requested.\n\nCo-authored-by: GitHub Copilot <noreply@github.com> Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

xwu-intel force-pushed the xwu/w8a8-xpu-blockscaled-mm branch from 55d1d00 to 40d2254 Compare June 2, 2026 03:43

xwu-intel requested a review from jikunshang June 2, 2026 05:31

jikunshang added the verified Run pre-commit for new contributors without triggering other tests label Jun 2, 2026

fix pre-commit

d41d7c8

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>

jikunshang approved these changes Jun 3, 2026

View reviewed changes

jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 3, 2026

jikunshang and others added 3 commits June 3, 2026 13:16

Merge branch 'main' into xwu/w8a8-xpu-blockscaled-mm

f7dfddd

Merge branch 'main' into xwu/w8a8-xpu-blockscaled-mm

050d6da

Merge branch 'main' into xwu/w8a8-xpu-blockscaled-mm

6488c5d

jikunshang merged commit e523267 into vllm-project:main Jun 3, 2026
49 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[XPU] Add XPU block-scaled W8A8 fp8 path#39968

[XPU] Add XPU block-scaled W8A8 fp8 path#39968
jikunshang merged 15 commits into
vllm-project:mainfrom
xwu-intel:xwu/w8a8-xpu-blockscaled-mm

xwu-intel commented Apr 16, 2026 •

edited

Loading

Uh oh!

mergify Bot commented Apr 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

xwu-intel commented May 15, 2026 •

edited

Loading

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

jikunshang commented May 29, 2026

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

xwu-intel commented Apr 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test Plan

Test Result

Uh oh!

mergify Bot commented Apr 16, 2026

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

Uh oh!

xwu-intel commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mergify Bot commented May 23, 2026

Uh oh!

jikunshang commented May 29, 2026

Uh oh!

Uh oh!

Uh oh!

mergify Bot commented Jun 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xwu-intel commented Apr 16, 2026 •

edited

Loading

xwu-intel commented May 15, 2026 •

edited

Loading