Skip to content

[ROCm] Enable MTP (NextN) on AMD GPU#4631

Merged
zhyncs merged 15 commits intosgl-project:mainfrom
alexsun07:amd-mtp
Mar 24, 2025
Merged

[ROCm] Enable MTP (NextN) on AMD GPU#4631
zhyncs merged 15 commits intosgl-project:mainfrom
alexsun07:amd-mtp

Conversation

@alexsun07
Copy link
Copy Markdown
Contributor

@alexsun07 alexsun07 commented Mar 20, 2025

Motivation

To enable MTP (NextN) on AMD GPU.

Modifications

Added a few kernels to torch_extension_rocm.cc. Enabled MTP test for AMD GPU.

Benchmark result

# benchmark
python3 -m sglang.bench_one_batch_server --model None --base-url http://127.0.0.1:30000 --batch-size 1 --input-len 256 --output-len 256

# baseline on main branch
python3 -m sglang.launch_server \
--model-path deepseek-ai/DeepSeek-V3 \
--port 30000 \
--tp 8 \
--trust-remote-code \
--disable-radix-cache \
--chunked-prefill-size -1

# result
batch size: 1
latency: 11.84 s
output throughput: 21.61 token/s
(input + output) throughput: 43.23 token/s

# w/ nextn speculative decoding
python3 -m sglang.launch_server \
--model-path deepseek-ai/DeepSeek-V3 \
--port 30000 \
--tp 8 \
--trust-remote-code \
--disable-radix-cache \
--chunked-prefill-size -1 \
--speculative-algo EAGLE \
--speculative-draft SGLang/DeepSeek-V3-NextN \
--speculative-num-steps 2 \
--speculative-eagle-topk 4 \
--speculative-num-draft-tokens 4  \
--mem-fraction-static 0.6 \

# result
batch size: 1
latency: 5.59 s
output throughput: 45.80 token/s
(input + output) throughput: 91.61 token/s

Correctness check

python3 benchmark/gsm8k/bench_sglang.py --num-questions 2000 --parallel 2000
100%|███████████████████████████████████████████████████████████████| 1319/1319 [12:03<00:00,  1.82it/s]
Accuracy: 0.942
Invalid: 0.000
Latency: 728.192 s
Output throughput: 183.007 token/s

Checklist

@alexsun07
Copy link
Copy Markdown
Contributor Author

alexsun07 commented Mar 20, 2025

Closed #3670 and use this one. As I used main branch in previous PR. Apologize for that.

@zhyncs
Copy link
Copy Markdown
Collaborator

zhyncs commented Mar 20, 2025

@HaiShaw

Copy link
Copy Markdown
Collaborator

@HaiShaw HaiShaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG.

@HaiShaw
Copy link
Copy Markdown
Collaborator

HaiShaw commented Mar 23, 2025

@alexsun07 let's fix CI and update.

@HaiShaw HaiShaw removed the wip label Mar 23, 2025
@zhyncs zhyncs merged commit af6535e into sgl-project:main Mar 24, 2025
zhyncs pushed a commit that referenced this pull request Mar 24, 2025
@alexsun07 alexsun07 deleted the amd-mtp branch March 24, 2025 06:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants