[tools] add fp8 max/min constant in utils by yiakwy-xpu-ml-framework-team · Pull Request #3959 · sgl-project/sglang

yiakwy-xpu-ml-framework-team · 2025-02-28T10:34:19Z

Motivation

Suggested in PR#3702, AMD uses two FP8 formats :

OCP fp8 format (fp8_e4m3), and
FP8 FNUZ format which is jointly, originally proposed by Graphcore and officialy supported in meta Pytorch with reference to Graphcore uint8 representation format:

https://github.com/pytorch/pytorch/blob/main/c10/util/Float8_e4m3fnuz.h

Mismatch of behavior of FP8 in Meta implementation and AMD implementation

In tradition FP8 MAX in torch.float8_e4m3fnuz format is represented with

fp8_max = 0b01111 111 (0b01000 000 is used as NaN) = 2^{15 - 8} * (1+0.875) = 240

Currently in AMD, 224 is chosen as FP8 MAX :

fp8_max = 0b01111 110 (0b01111 111 is used as NaN in OCP FP8 format, see OCP 2023-12-1 specification) = 2^{15 - 8} * (1+0.75) = 224

But both values can be represented in AMD device functions. Here is the device function test with ROCm SDK 6.3:

From the above test, both 224, 240 are valid numer in AMD FP8 E4M3 FNUZ format. So what's the problem ?

The problem is round trip. Here is the torch test with AMD device :

Any value great than FP8 MAX will be represented as NaN in pytorch, not clamped to a proper max value.

However, in AMD device implementation, it is clamp to 240.

In current vllm application, AMD FP8 MAX is hard coded as 224 for better precision, though 240 is also valid in ROCM 6.3 SDK

Besides I also found, min sub norm value (1e-3 ~ 0.000976562) is not provided by default in torch. So I added them, so it can be useful extend the represetnation of digits below minimum normals, hence can be used to adjust weights.
Group Quant efficiency

MXFP8 was introduced in OCP project and supported in NVIDIA Blackwell, AMD Quantization toolkit Quark, and projected to be availabe in pytorch in 2025 Q1. This enables native group quant with 32 consecutive elements instead of per tensor quant.

The immediate benefits bought by MXFP8 is that we can use FP8 E8M0 to store scaling number instead of FP32 .

~~Currently SGLang implemented group quant with group size 128, then we can adjust the implementation to support 32-group quant in Blackwell and other chips support OCP MXFP8.~~

Modifications

Add global constant. Usage example :

Python 3.12.8 (main, Dec  4 2024, 08:54:12) [GCC 11.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from sglang.srt.utils import is_hip
>>> FP8_E4M3_MAX
224.0
>>> FP8_E4M3_MIN
-224.0

Note : FP8_E4M3_MIN_SUB_NORM , OCP_MXFP8_E8M0_GROUP_QUANT_GRANULARITY will be added in the relevant PRs.

Checklist

Format your code according to the Code Formatting with Pre-Commit.
Add unit tests as outlined in the Running Unit Tests.
Update documentation / docstrings / example tutorials as needed, according to Writing Documentation.
Provide throughput / latency benchmark results and accuracy evaluation results as needed, according to Benchmark and Profiling and Accuracy Results.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.
Please feel free to join our Slack channel at https://slack.sglang.ai to discuss your PR.

yiakwy-xpu-ml-framework-team · 2025-02-28T10:34:32Z

@HaiShaw

…ANT_GRANULARITY

yiakwy-xpu-ml-framework-team requested review from ByronHsu, Ying1123, hnyls2002, ispobock, merrymercy and zhyncs as code owners February 28, 2025 10:34

yiakwy-xpu-ml-framework-team mentioned this pull request Feb 28, 2025

[ROCm] Enable per token group quant fp8 in amd #3702

Closed

6 tasks

hebiao064 approved these changes Mar 6, 2025

View reviewed changes

yiakwy-xpu-ml-framework-team force-pushed the add_host_side_ocp_fp8nuz_constant branch from ebd517e to 5976635 Compare March 6, 2025 09:16

BBuf reviewed Mar 6, 2025

View reviewed changes

Comment thread python/sglang/srt/utils.py Outdated

BBuf mentioned this pull request Mar 7, 2025

[Feature] remove vllm _custom_ops #2965

Closed

18 tasks

BBuf reviewed Mar 7, 2025

View reviewed changes

Comment thread python/sglang/srt/utils.py Outdated

yiakwy-xpu-ml-framework-team force-pushed the add_host_side_ocp_fp8nuz_constant branch 2 times, most recently from d8ea19a to 8e218c8 Compare March 8, 2025 16:27

BBuf approved these changes Mar 9, 2025

View reviewed changes

yiakwy-xpu-ml-framework-team force-pushed the add_host_side_ocp_fp8nuz_constant branch from 7a46477 to eec5980 Compare March 9, 2025 12:06

BBuf changed the title ~~[ROCm] add amd ocp fp8 E3M4FUZ constant~~ [tools] add fp8 max/min constant in utils Mar 10, 2025

yiakwy-xpu-ml-framework-team added 4 commits March 11, 2025 22:02

add amd ocp fp8 E3M4FNUZ constant

75dfebd

use platform agnostic name

e1ed35b

add more FP8_E4M3_MIN, FP8_E4M3_MIN_SUB_NORM, OCP_MXFP8_E8M0_GROUP_QU…

12cc922

…ANT_GRANULARITY

remove mxfp8, fp8 subnormals

2915363

yiakwy-xpu-ml-framework-team force-pushed the add_host_side_ocp_fp8nuz_constant branch from eec5980 to 2915363 Compare March 12, 2025 03:07

merrymercy merged commit 18c2713 into sgl-project:main Mar 13, 2025

hebiao064 pushed a commit to hebiao064/sglang that referenced this pull request Mar 13, 2025

[tools] add fp8 max/min constant in utils (sgl-project#3959)

748f857

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[tools] add fp8 max/min constant in utils#3959

[tools] add fp8 max/min constant in utils#3959
merrymercy merged 4 commits intosgl-project:mainfrom
yiakwy-xpu-ml-framework-team:add_host_side_ocp_fp8nuz_constant

yiakwy-xpu-ml-framework-team commented Feb 28, 2025 •

edited

Loading

Uh oh!

yiakwy-xpu-ml-framework-team commented Feb 28, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

yiakwy-xpu-ml-framework-team commented Feb 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Mismatch of behavior of FP8 in Meta implementation and AMD implementation

Group Quant efficiency

Modifications

Checklist

Uh oh!

yiakwy-xpu-ml-framework-team commented Feb 28, 2025

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

yiakwy-xpu-ml-framework-team commented Feb 28, 2025 •

edited

Loading