[Feature] Support DeepSeek MTP on NPU by iforgetmyname · Pull Request #11897 · sgl-project/sglang

iforgetmyname · 2025-10-21T07:01:44Z

Motivation

This pr primarily aims to support deepseek's mtp on ascend npus.

Modifications

Introduces NPU support for newest eagle framework
Includes ascend specific ops for draft tree build/verify

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.

Alcanderian · 2025-10-23T02:52:51Z

Can we add an PR Test for NPU MTP？

Yellowhappy · 2025-10-23T10:41:14Z

Hi, which DeepSeek model is this, and is it running on one machine or two?

iforgetmyname · 2025-10-25T03:05:25Z

Hi, which DeepSeek model is this, and is it running on one machine or two?

Hi, this supports both V3 and V3.2, and it could run on one machine if hbm capacity allows

iforgetmyname · 2025-10-25T10:21:26Z

Can we add an PR Test for NPU MTP？

for sure, we have test_ascend_deepseek_mtp.py for pr-test now

sglang-bot · 2025-11-02T03:10:47Z

          export PATH="/usr/local/Ascend/8.3.RC1/compiler/bishengir/bin:${PATH}"
          cd test/srt
-          python3 run_suite.py --suite per-commit-16-ascend-a3 --timeout-per-file 3600
+          python3 run_suite.py --suite per-commit-16-ascend-a3 --timeout-per-file 3600 --auto-partition-id ${{ matrix.part }} --auto-partition-size 2


sglang/docs/developer_guide/contribution_guide.md

Line 86 in 95191eb

- If a single test file run longer than 500 seconds, split it into multiple smaller files (e.g., `test_eagle_infer_a.py`, `test_eagle_infer_b.py`).

sglang-bot · 2025-11-02T03:12:21Z

+    if not _is_npu:
+        device: str = "cuda"
+    else:
+        device: str = "npu"


you should set this value to npu when you create it instead of adding if/else here

or do this
device: str = "cuda" if not is_npu else "npu"

sglang-bot · 2025-11-02T03:15:57Z

        self.lm_head.weight = head
-        torch.cuda.empty_cache()
-        torch.cuda.synchronize()
+        if not _is_npu:


remove if/else

sglang-bot · 2025-11-02T03:18:25Z

+        if not _is_npu:
+            device = "cuda"
+        else:
+            device = "npu"


read from the global variable?

sglang-bot · 2025-11-02T03:20:25Z

            )

-        if is_all_greedy or not TREE_SPEC_KERNEL_AVAILABLE:
+        if is_all_greedy or not TREE_SPEC_KERNEL_AVAILABLE or _is_npu:


Style: use more general filed to replace is_npu

sglang-bot · 2025-11-02T03:21:35Z


        # Sample tokens
-        if sampling_info.is_all_greedy:
+        if sampling_info.is_all_greedy or _is_npu:


do not use is_npu

sglang-bot · 2025-11-02T03:23:42Z

+            bs,
+        )
+    else:
+        sgl_build_tree_kernel_efficient(


the GPU code should be in the first branch of if/else

sglang-bot · 2025-11-02T03:26:13Z

+    if _is_cuda or _is_hip:
+        from sgl_kernel import verify_tree_greedy
+
+        verify_tree_greedy(


you can try to add more arguments to sgl kernel and remove these

wangtiance · 2025-11-26T11:30:46Z

Hello, is deepseek the only model supporting speculative decoding on NPU? Will qwen3 etc. be supported?

sglang-bot added the run-ci label Oct 21, 2025

liupeng374 force-pushed the feature/mtp branch 3 times, most recently from 43b19de to 009918a Compare October 22, 2025 07:27

iforgetmyname marked this pull request as ready for review October 22, 2025 11:39

iforgetmyname requested review from Ying1123, hnyls2002, ispobock, kssteven418, merrymercy, ping1jing2, xiezhq-hermann and zhyncs as code owners October 22, 2025 11:39

iforgetmyname closed this Oct 22, 2025

iforgetmyname reopened this Oct 22, 2025

npu support mtp and mtp(beta)

4537d03

liupeng374 force-pushed the feature/mtp branch from 009918a to 4537d03 Compare October 22, 2025 11:46

iforgetmyname changed the title ~~[Feature] Support MTP on NPU~~ [Feature] Support DeepSeek MTP on NPU Oct 22, 2025

Merge branch 'main' into feature/mtp

4f773bd

liupeng374 added 3 commits October 23, 2025 16:03

npu support mtp and mtp(beta)

9eb2ea8

Merge branch 'main' into feature/mtp

e5582d1

Merge branch 'main' into feature/mtp

56e47be

iforgetmyname added 3 commits October 24, 2025 09:09

add deepseek mtp testcase

e4cea7b

add deepseek mtp testcase

118313a

Merge branch 'main' into feature/mtp

cfd1942

iforgetmyname marked this pull request as draft October 25, 2025 02:14

ignore modelscope import error

10af834

iforgetmyname marked this pull request as ready for review October 25, 2025 03:04

Merge branch 'main' into feature/mtp

d7516b8

add partition for 16-a3 testcases

c7a39ec

ping1jing2 approved these changes Oct 26, 2025

View reviewed changes

Alcanderian reviewed Oct 28, 2025

View reviewed changes

iforgetmyname and others added 7 commits October 29, 2025 09:53

fix comments

7dd6355

abstract assign_req_to_token_pool caller

52d5c4d

unify verify_tree_greedy_func & assign_extend_cache_locs_func

ff81742

fix is_npu()

721d12c

Merge branch 'main' into feature/mtp

345f3a8

Merge branch 'main' into feature/mtp

0af006d

Merge branch 'main' into feature/mtp

6425c57

hnyls2002 approved these changes Oct 30, 2025

View reviewed changes

hnyls2002 merged commit ce6b17c into sgl-project:main Oct 30, 2025
55 of 71 checks passed

sglang-bot reviewed Nov 2, 2025

View reviewed changes

iforgetmyname deleted the feature/mtp branch November 3, 2025 01:03

iforgetmyname mentioned this pull request Jan 23, 2026

[Roadmap] Ascend NPU Development (2026 Q1) #13664

Open

28 tasks

Conversation

iforgetmyname commented Oct 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

Alcanderian commented Oct 23, 2025

Uh oh!

Yellowhappy commented Oct 23, 2025

Uh oh!

iforgetmyname commented Oct 25, 2025

Uh oh!

iforgetmyname commented Oct 25, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangtiance commented Nov 26, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

iforgetmyname commented Oct 21, 2025 •

edited

Loading