Skip to content

[XPU] Add XPU block-scaled W8A8 fp8 path#39968

Merged
jikunshang merged 15 commits into
vllm-project:mainfrom
xwu-intel:xwu/w8a8-xpu-blockscaled-mm
Jun 3, 2026
Merged

[XPU] Add XPU block-scaled W8A8 fp8 path#39968
jikunshang merged 15 commits into
vllm-project:mainfrom
xwu-intel:xwu/w8a8-xpu-blockscaled-mm

Conversation

@xwu-intel

@xwu-intel xwu-intel commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

Purpose

This PR adds the XPU block-scaled W8A8 FP8 path and updates FP8 block kernel selection so XPU can fall back to Triton when the native XPU FP8 block kernel is unavailable.

Changes included in this update:

  • Enable TritonFp8BlockScaledMMKernel.is_supported() on XPU (in addition to CUDA-like).
  • Add TritonFp8BlockScaledMMKernel to the XPU FP8 block kernel candidate list as fallback.
  • Add unit tests.

Dependency:

Test Plan

  • gsm8k test on Qwen/Qwen3-4B-Instruct-2507-FP8 (see below comment)

Test Result

B60 gsm8k ACC test OK.


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results.
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

@mergify mergify Bot added the intel-gpu Related to Intel GPU label Apr 16, 2026
@mergify

mergify Bot commented Apr 16, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @xwu-intel.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify Bot added the needs-rebase label Apr 16, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements FP8 block-scaled matrix multiplication support for XPU platforms. It adds a fake operator registration for fp8_gemm to support shape inference and introduces the _w8a8_xpu_block_scaled_mm utility along with a corresponding dispatch path in _dispatch_w8a8_blockscale_op. I have no feedback to provide.

Comment thread vllm/model_executor/kernels/linear/__init__.py
@xwu-intel xwu-intel force-pushed the xwu/w8a8-xpu-blockscaled-mm branch 2 times, most recently from 5773a6c to 617e11b Compare May 12, 2026 01:56
@xwu-intel xwu-intel marked this pull request as ready for review May 12, 2026 02:08

@claude claude Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Comment thread vllm/model_executor/kernels/linear/scaled_mm/xpu.py Outdated
@xwu-intel

xwu-intel commented May 15, 2026

Copy link
Copy Markdown
Contributor Author

Depends on fix: vllm-project/vllm-xpu-kernels#353

@zufangzhu pls help to review this. thx!

@mergify

mergify Bot commented May 23, 2026

Copy link
Copy Markdown
Contributor

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @xwu-intel.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@jikunshang

Copy link
Copy Markdown
Member

please rebase and add some ut/exmaple in ci if possible.

@xwu-intel xwu-intel force-pushed the xwu/w8a8-xpu-blockscaled-mm branch from 0ad1d7c to 4254caf Compare May 29, 2026 07:22
@xwu-intel xwu-intel force-pushed the xwu/w8a8-xpu-blockscaled-mm branch from 4a99046 to 4d39fd9 Compare June 1, 2026 13:25
Comment thread tests/kernels/quantization/test_xpu_fp8_scaled_mm.py Outdated
Comment thread vllm/model_executor/layers/quantization/utils/fp8_utils.py Outdated
xwu-intel and others added 11 commits June 2, 2026 03:41
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
…xpu-kernels pr vllm-project#173

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
- Pass the new dummy flags to _C.per_token_group_fp8_quant in XPU paths\n- Remove obsolete scaled-mm kernel selection test file\n\nCo-authored-by: GitHub Copilot <noreply@github.com>

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Revert deletion by restoring tests/kernels/quantization/test_scaled_mm_kernel_selection.py to match origin/main.\n\nCo-authored-by: GitHub Copilot <noreply@github.com>

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Add test_xpu_fp8_scaled_mm.py to validate torch.ops._xpu_C.fp8_gemm
correctness against native block-scaled matmul reference.

Co-authored-by: GitHub Copilot
Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Delete tests/kernels/quantization/test_xpu_fp8_scaled_mm.py as requested.\n\nCo-authored-by: GitHub Copilot <noreply@github.com>

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
@xwu-intel xwu-intel force-pushed the xwu/w8a8-xpu-blockscaled-mm branch from 55d1d00 to 40d2254 Compare June 2, 2026 03:43
@xwu-intel xwu-intel requested a review from jikunshang June 2, 2026 05:31
@jikunshang jikunshang added the verified Run pre-commit for new contributors without triggering other tests label Jun 2, 2026
@mergify

mergify Bot commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Hi @xwu-intel, the pre-commit checks have failed. Please run:

uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-files

Then, commit the changes and push to your branch.

For future commits, pre-commit will run automatically on changed files before each commit.

Tip

Is mypy failing?
mypy is run differently in CI. If the failure is related to this check, please use the following command to run it locally:
# For mypy (substitute "3.10" with the failing version if needed)
pre-commit run --hook-stage manual mypy-3.10

Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
@jikunshang jikunshang added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 3, 2026
@jikunshang jikunshang merged commit e523267 into vllm-project:main Jun 3, 2026
49 checks passed
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com>
Co-authored-by: Yuxiang <yuxiang.liang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
JisoLya pushed a commit to JisoLya/vllm that referenced this pull request Jun 5, 2026
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com>
Co-authored-by: Yuxiang <yuxiang.liang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: JisoLya <523420504@qq.com>
knight0528 pushed a commit to knight0528/vllm that referenced this pull request Jun 8, 2026
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com>
Co-authored-by: Yuxiang <yuxiang.liang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
waqahmed-amd-fi pushed a commit to waqahmed-amd-fi/vllm that referenced this pull request Jun 10, 2026
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com>
Co-authored-by: Yuxiang <yuxiang.liang@intel.com>
Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

intel-gpu Related to Intel GPU ready ONLY add when PR is ready to merge/full CI is needed verified Run pre-commit for new contributors without triggering other tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants