[Feature] Support DeepSeek MTP on NPU#11897
Conversation
43b19de to
009918a
Compare
009918a to
4537d03
Compare
|
Can we add an PR Test for NPU MTP? |
|
Hi, which DeepSeek model is this, and is it running on one machine or two? |
Hi, this supports both V3 and V3.2, and it could run on one machine if hbm capacity allows |
for sure, we have test_ascend_deepseek_mtp.py for pr-test now |
| export PATH="/usr/local/Ascend/8.3.RC1/compiler/bishengir/bin:${PATH}" | ||
| cd test/srt | ||
| python3 run_suite.py --suite per-commit-16-ascend-a3 --timeout-per-file 3600 | ||
| python3 run_suite.py --suite per-commit-16-ascend-a3 --timeout-per-file 3600 --auto-partition-id ${{ matrix.part }} --auto-partition-size 2 |
There was a problem hiding this comment.
| if not _is_npu: | ||
| device: str = "cuda" | ||
| else: | ||
| device: str = "npu" |
There was a problem hiding this comment.
- you should set this value to
npuwhen you create it instead of adding if/else here - or do this
device: str = "cuda" if not is_npu else "npu"
| self.lm_head.weight = head | ||
| torch.cuda.empty_cache() | ||
| torch.cuda.synchronize() | ||
| if not _is_npu: |
| if not _is_npu: | ||
| device = "cuda" | ||
| else: | ||
| device = "npu" |
There was a problem hiding this comment.
read from the global variable?
| ) | ||
|
|
||
| if is_all_greedy or not TREE_SPEC_KERNEL_AVAILABLE: | ||
| if is_all_greedy or not TREE_SPEC_KERNEL_AVAILABLE or _is_npu: |
There was a problem hiding this comment.
Style: use more general filed to replace is_npu
|
|
||
| # Sample tokens | ||
| if sampling_info.is_all_greedy: | ||
| if sampling_info.is_all_greedy or _is_npu: |
| bs, | ||
| ) | ||
| else: | ||
| sgl_build_tree_kernel_efficient( |
There was a problem hiding this comment.
the GPU code should be in the first branch of if/else
| if _is_cuda or _is_hip: | ||
| from sgl_kernel import verify_tree_greedy | ||
|
|
||
| verify_tree_greedy( |
There was a problem hiding this comment.
you can try to add more arguments to sgl kernel and remove these
|
Hello, is deepseek the only model supporting speculative decoding on NPU? Will qwen3 etc. be supported? |
Motivation
This pr primarily aims to support deepseek's mtp on ascend npus.
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist