[ROCm] Enable MTP (NextN) on AMD GPU#3670
[ROCm] Enable MTP (NextN) on AMD GPU#3670alexsun07 wants to merge 7 commits intosgl-project:mainfrom
Conversation
|
Tested on 8*MI308X GPUs. # server
python3 -m sglang.launch_server --host 0.0.0.0 --trust-remote-code --tp 8 --model-path deepseek-ai/DeepSeek-V3 --speculative-algo NEXTN --speculative-draft SGLang/DeepSeek-V3-NextN --speculative-num-steps 2 --speculative-eagle-topk 4 --speculative-num-draft-tokens 4 --mem-fraction-static 0.5
# bench
python3 -m sglang.bench_one_batch_server --model None --base-url http://127.0.0.1:30000 --batch-size 1 --input-len 256 --output-len 256 |
|
@HaiShaw hai, could you take a look? Thanks! |
i can run it on mi300x. however, it got lower perf than without it. I used 0.4.3post2-rocm630. Serve cmdline - Client - Perf: Perf without it: Do you get better perf with it on AMD GPU? |
@andyluo7 Yes. With |
|
@yiakwy-xpu-ml-framework-team @HaiShaw Could you take a look at this? Thanks! |
Thank you @zhaochenyang20 for including me in. Yes I am watching it , really great job @alexsun07. I am having a look at the algorithm design if it is helpful to facilitat merging. vLLM is also working on this import feature in a different approach. |
|
@yiakwy-xpu-ml-framework-team, what shall we do next? |
|
@zhaochenyang20 I am looking into it. |
|
Hit out-of-bounds error at running gsm8k tests.
|
|
sure, thansk1! |
This issue has been fixed with merging upstream. |
HaiShaw
left a comment
There was a problem hiding this comment.
Glad that issue was addressed.
|
@saienduri can we have test/srt/test_mla_deepseek_v3.py covered? |
Added in test. @zhyncs Would you please review again? |
|
Didn't noticed that I shouldn't use main branch. Close this one, track with new PR: #4631 |


Motivation
To support MTP (NextN) on AMD GPU
Modifications
MTP(NextN) relies on
build_tree_kernelandbuild_tree_kernel_efficient. Add them into torch_extension_rocm.ccChecklist