Skip to content

[Fix] Fix raw_bs bug when using flashinfer mla and eagle#4557

Merged
zhyncs merged 1 commit intosgl-project:mainfrom
Fridge003:deepseek
Mar 19, 2025
Merged

[Fix] Fix raw_bs bug when using flashinfer mla and eagle#4557
zhyncs merged 1 commit intosgl-project:mainfrom
Fridge003:deepseek

Conversation

@Fridge003
Copy link
Copy Markdown
Collaborator

@Fridge003 Fridge003 commented Mar 18, 2025

Motivation

Fix the bug mentioned in #4536: when bs != raw_bs when eagle replays cuda graph, seq_len_cpu needed by flashinfer mla needs to be handled.

Modifications

Fix bug

Accuracy

Launch

python3 -m sglang.launch_server --model deepseek-ai/DeepSeek-V3 --speculative-algo EAGLE --speculative-draft lmsys/DeepSeek-V3-NextN --speculative-num-steps 4 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --trust-remote --tp 8 --enable-flashinfer-mla

GSM8K

python3 benchmark/gsm8k/bench_sglang.py --num-shots 8 --num-questions 200 --parallel 128
Accuracy: 0.965
Invalid: 0.000
Latency: 81.523 s
Output throughput: 252.357 token/s

MMLU

bash benchmark/mmlu/download_data.sh
python3 benchmark/mmlu/bench_sglang.py --nsub 100 --ntrain 5 --parallel 128
Total latency: 185.320
Average accuracy: 0.878

Benchmark

Benchmark results of this PR on 8*H200. Throughput (tokens/sec) is used as the metric. Benchmark results before this PR can be found in #4218 for reference. Launching command is the same as in accuracy tests.

Input-4000-Output-200

python3 -m sglang.bench_serving --backend sglang --dataset-name random --random-input 4000 --random-output 200 --num-prompt 32
Prefill Decode Total
5482.19 221.07 5703.26

Input-128-Output-128

python3 -m sglang.bench_serving --backend sglang --dataset-name random --random-input 128 --random-output 128 --num-prompt 32
Prefill Decode Total
455.14 446.89 902.03

One Prompt

python3 -m sglang.test.send_one

Acc Length = 3.03
Throughput = 74.70 tokens/s

Checklist

@Fridge003 Fridge003 self-assigned this Mar 18, 2025
@zhyncs zhyncs merged commit 90532b7 into sgl-project:main Mar 19, 2025
@Fridge003 Fridge003 deleted the deepseek branch March 23, 2025 15:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants