Skip to content

[AMD] fix the regression issue for DeepseekV3 on MI300#14383

Merged
HaiShaw merged 2 commits intosgl-project:mainfrom
yctseng0211:fix_dsv3
Dec 4, 2025
Merged

[AMD] fix the regression issue for DeepseekV3 on MI300#14383
HaiShaw merged 2 commits intosgl-project:mainfrom
yctseng0211:fix_dsv3

Conversation

@yctseng0211
Copy link
Copy Markdown
Collaborator

@yctseng0211 yctseng0211 commented Dec 4, 2025

Motivation

the root cause of the accuracy regression issue should be the commit from [PR-13960]
The critical code section should be here :
before [PR-13960] we hit both of these two if-condition if _is_fp8_fnuz and if _use_aiter on M300 :

if _is_fp8_fnuz:
    ....
    ....
if _use_aiter:
    ....
    ....

, but with [PR-13960], the second if-condition if _use_aiter was changed to " elif _use_aiter " so that the code section under _use_aiter was skipped, causing the accuracy issue on MI300.

command=python3 -m sglang.launch_server --model-path /models/deepseek-ai/DeepSeek-V3-0324 --tp 8 --trust-remote-code --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --mem-frac 0.7 --device cuda --host 127.0.0.1 --port 21000
[CI Test Method] TestDeepseekV3MTP.test_a_gsm8k
......
Accuracy: 0.000
Invalid: 0.690
Latency: 162.072 s
Output throughput: 631.816 token/s
metrics={'accuracy': 0.0, 'invalid': 0.69, 'latency': 162.0724634733051, 'output_throughput': 631.8161506619307}
avg_spec_accept_length=1.0063940886699507

Modifications

python/sglang/srt/layers/quantization/fp8.py

Accuracy Tests

device: MI300

model:DeepSeek-V3-0324
test file: test_deepseek_v3_mtp.py

Accuracy: 0.950
Invalid: 0.000
Latency: 43.894 s
Output throughput: 460.814 token/s
metrics={'accuracy': 0.95, 'invalid': 0.0, 'latency': 43.894099209457636, 'output_throughput': 460.81364840132755}
avg_spec_accept_length=3.00316551100

devicce: MI355

model:DeepSeek-V3-0324
test file: test_deepseek_v3_mtp.py

Accuracy: 0.955
Invalid: 0.000
Latency: 16.481 s
Output throughput: 1204.678 token/s
metrics={'accuracy': 0.955, 'invalid': 0.0, 'latency': 16.48075593682006, 'output_throughput': 1204.6777511973037}
avg_spec_accept_length=2.9998464137613268

device: MI300

model:DeepSeek-R1-0528
server script:

export SGLANG_USE_AITER=1
export RCCL_MSCCL_ENABLE=0

python3 -m sglang.launch_server \
--model-path /models/deepseek-ai/DeepSeek-R1-0528/ \
--host=0.0.0.0 \
--port 9000 \
--tensor-parallel-size 8 \
--trust-remote-code \
--chunked-prefill-size 196608 \
--mem-fraction-static 0.8 --disable-radix-cache \
--num-continuous-decode-steps 4 \
--max-prefill-tokens 196608 \
--cuda-graph-max-bs 128 \
python3 benchmark/gsm8k/bench_sglang.py --num-questions 2000 --parallel 2000 --port 9000
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [02:39<00:00,  8.24it/s]
Accuracy: 0.948
Invalid: 0.000
Latency: 160.393 s
Output throughput: 826.413 token/s

Benchmarking and Profiling

Checklist

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@yctseng0211 yctseng0211 marked this pull request as ready for review December 4, 2025 03:47
@HaiShaw HaiShaw added the run-ci label Dec 4, 2025
@HaiShaw
Copy link
Copy Markdown
Collaborator

HaiShaw commented Dec 4, 2025

/tag-and-rerun-ci

@HaiShaw HaiShaw merged commit d6c4901 into sgl-project:main Dec 4, 2025
146 of 155 checks passed
tom-jerr pushed a commit to tom-jerr/sglang that referenced this pull request Dec 4, 2025
yingluosanqian pushed a commit to yingluosanqian/sglang that referenced this pull request Dec 4, 2025
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
tonyluj pushed a commit to openanolis/sglang that referenced this pull request Dec 5, 2025
yuchengz816-bot pushed a commit to yuchengz816-bot/sglang that referenced this pull request Dec 8, 2025
Kevin-XiongC pushed a commit to novitalabs/sglang that referenced this pull request Dec 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants