[AMD] fix the regression issue for DeepseekV3 on MI300 by yctseng0211 · Pull Request #14383 · sgl-project/sglang

yctseng0211 · 2025-12-04T02:34:51Z

Motivation

the root cause of the accuracy regression issue should be the commit from [PR-13960]
The critical code section should be here :
before [PR-13960] we hit both of these two if-condition if _is_fp8_fnuz and if _use_aiter on M300 :

if _is_fp8_fnuz:
    ....
    ....
if _use_aiter:
    ....
    ....

, but with [PR-13960], the second if-condition if _use_aiter was changed to " elif _use_aiter " so that the code section under _use_aiter was skipped, causing the accuracy issue on MI300.

command=python3 -m sglang.launch_server --model-path /models/deepseek-ai/DeepSeek-V3-0324 --tp 8 --trust-remote-code --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --mem-frac 0.7 --device cuda --host 127.0.0.1 --port 21000
[CI Test Method] TestDeepseekV3MTP.test_a_gsm8k
......
Accuracy: 0.000
Invalid: 0.690
Latency: 162.072 s
Output throughput: 631.816 token/s
metrics={'accuracy': 0.0, 'invalid': 0.69, 'latency': 162.0724634733051, 'output_throughput': 631.8161506619307}
avg_spec_accept_length=1.0063940886699507

Modifications

python/sglang/srt/layers/quantization/fp8.py

Accuracy Tests

device: MI300

model:DeepSeek-V3-0324
test file: test_deepseek_v3_mtp.py

Accuracy: 0.950
Invalid: 0.000
Latency: 43.894 s
Output throughput: 460.814 token/s
metrics={'accuracy': 0.95, 'invalid': 0.0, 'latency': 43.894099209457636, 'output_throughput': 460.81364840132755}
avg_spec_accept_length=3.00316551100

devicce: MI355

model:DeepSeek-V3-0324
test file: test_deepseek_v3_mtp.py

Accuracy: 0.955
Invalid: 0.000
Latency: 16.481 s
Output throughput: 1204.678 token/s
metrics={'accuracy': 0.955, 'invalid': 0.0, 'latency': 16.48075593682006, 'output_throughput': 1204.6777511973037}
avg_spec_accept_length=2.9998464137613268

device: MI300

model:DeepSeek-R1-0528
server script:

export SGLANG_USE_AITER=1
export RCCL_MSCCL_ENABLE=0

python3 -m sglang.launch_server \
--model-path /models/deepseek-ai/DeepSeek-R1-0528/ \
--host=0.0.0.0 \
--port 9000 \
--tensor-parallel-size 8 \
--trust-remote-code \
--chunked-prefill-size 196608 \
--mem-fraction-static 0.8 --disable-radix-cache \
--num-continuous-decode-steps 4 \
--max-prefill-tokens 196608 \
--cuda-graph-max-bs 128 \

python3 benchmark/gsm8k/bench_sglang.py --num-questions 2000 --parallel 2000 --port 9000
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [02:39<00:00,  8.24it/s]
Accuracy: 0.948
Invalid: 0.000
Latency: 160.393 s
Output throughput: 826.413 token/s

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.
Work with maintainers to merge your PR. See the PR Merge Process

gemini-code-assist · 2025-12-04T02:34:54Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

HaiShaw · 2025-12-04T04:34:45Z

/tag-and-rerun-ci

…4383)

root and others added 2 commits December 3, 2025 15:14

fix the accuracy error

5a375e4

add comment

4b89a86

yctseng0211 marked this pull request as ready for review December 4, 2025 03:47

yctseng0211 requested review from AniZpZ, BBuf, Edwardf0t1, FlamingoPg and ch-wan as code owners December 4, 2025 03:47

HaiShaw approved these changes Dec 4, 2025

View reviewed changes

HaiShaw added the run-ci label Dec 4, 2025

HaiShaw merged commit d6c4901 into sgl-project:main Dec 4, 2025
146 of 155 checks passed

tom-jerr pushed a commit to tom-jerr/sglang that referenced this pull request Dec 4, 2025

[AMD] fix the regression issue for DeepseekV3 on MI300 (sgl-project#1…

c661234

…4383)

yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Dec 4, 2025

[AMD] fix the regression issue for DeepseekV3 on MI300 (sgl-project#1…

b7b700f

…4383)

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

[AMD] fix the regression issue for DeepseekV3 on MI300 (sgl-project#1…

36fbf1a

…4383)

tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025

[AMD] fix the regression issue for DeepseekV3 on MI300 (sgl-project#1…

4c5f09e

…4383)

yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025

[AMD] fix the regression issue for DeepseekV3 on MI300 (sgl-project#1…

c551bb6

…4383)

Kevin-XiongC pushed a commit to novitalabs/sglang that referenced this pull request Dec 9, 2025

[AMD] fix the regression issue for DeepseekV3 on MI300 (sgl-project#1…

5c54fe1

…4383)

yctseng0211 mentioned this pull request Dec 17, 2025

[AMD] add unit-test-backend-8-gpu-amd back #15253

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] fix the regression issue for DeepseekV3 on MI300#14383

[AMD] fix the regression issue for DeepseekV3 on MI300#14383
HaiShaw merged 2 commits intosgl-project:mainfrom
yctseng0211:fix_dsv3

yctseng0211 commented Dec 4, 2025 •

edited

Loading

Uh oh!

gemini-code-assist Bot commented Dec 4, 2025

Uh oh!

HaiShaw commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

yctseng0211 commented Dec 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

device: MI300

devicce: MI355

device: MI300

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented Dec 4, 2025

Uh oh!

HaiShaw commented Dec 4, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yctseng0211 commented Dec 4, 2025 •

edited

Loading