[XPU] Add XPU block-scaled W8A8 fp8 path#39968
Conversation
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Code Review
This pull request implements FP8 block-scaled matrix multiplication support for XPU platforms. It adds a fake operator registration for fp8_gemm to support shape inference and introduces the _w8a8_xpu_block_scaled_mm utility along with a corresponding dispatch path in _dispatch_w8a8_blockscale_op. I have no feedback to provide.
aca3738 to
841fee7
Compare
5773a6c to
617e11b
Compare
dc316fe to
309cafa
Compare
|
Depends on fix: vllm-project/vllm-xpu-kernels#353 @zufangzhu pls help to review this. thx! |
|
This pull request has merge conflicts that must be resolved before it can be |
|
please rebase and add some ut/exmaple in ci if possible. |
0ad1d7c to
4254caf
Compare
4a99046 to
4d39fd9
Compare
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
…xpu-kernels pr vllm-project#173 Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
- Pass the new dummy flags to _C.per_token_group_fp8_quant in XPU paths\n- Remove obsolete scaled-mm kernel selection test file\n\nCo-authored-by: GitHub Copilot <noreply@github.com> Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Revert deletion by restoring tests/kernels/quantization/test_scaled_mm_kernel_selection.py to match origin/main.\n\nCo-authored-by: GitHub Copilot <noreply@github.com> Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Add test_xpu_fp8_scaled_mm.py to validate torch.ops._xpu_C.fp8_gemm correctness against native block-scaled matmul reference. Co-authored-by: GitHub Copilot Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com> Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Delete tests/kernels/quantization/test_xpu_fp8_scaled_mm.py as requested.\n\nCo-authored-by: GitHub Copilot <noreply@github.com> Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
55d1d00 to
40d2254
Compare
|
Hi @xwu-intel, the pre-commit checks have failed. Please run: uv pip install pre-commit>=4.5.1
pre-commit install
pre-commit run --all-filesThen, commit the changes and push to your branch. For future commits, Tip Is
|
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com> Co-authored-by: Yuxiang <yuxiang.liang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com> Co-authored-by: Yuxiang <yuxiang.liang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: JisoLya <523420504@qq.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com> Co-authored-by: Yuxiang <yuxiang.liang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com>
Signed-off-by: Wu, Xiaochang <xiaochang.wu@intel.com> Signed-off-by: Xiaochang Wu <xiaochang.wu@intel.com> Co-authored-by: Yuxiang <yuxiang.liang@intel.com> Co-authored-by: Kunshang Ji <kunshang.ji@intel.com> Signed-off-by: Waqar Ahmed <waqar.ahmed@amd.com>
Purpose
This PR adds the XPU block-scaled W8A8 FP8 path and updates FP8 block kernel selection so XPU can fall back to Triton when the native XPU FP8 block kernel is unavailable.
Changes included in this update:
TritonFp8BlockScaledMMKernel.is_supported()on XPU (in addition to CUDA-like).TritonFp8BlockScaledMMKernelto the XPU FP8 block kernel candidate list as fallback.Dependency:
Test Plan
Test Result
B60 gsm8k ACC test OK.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.