docs: enable MiMo V2.5 MTP cookbook path by JustinTong0323 · Pull Request #23945 · sgl-project/sglang

JustinTong0323 · 2026-04-28T15:00:14Z

Summary

Enable EAGLE MTP for MiMo-V2.5 in the cookbook command generator.
Update the MiMo-V2.5 deployment notes to describe the checkpoint MTP path and the required Hopper flags.
Replace the MiMo-V2.5 speed benchmark results with runs collected from the EAGLE MTP configuration.

Serving configuration used for the MiMo-V2.5 benchmark

Image: lmsysorg/sglang:dev-mimo-v2.5
Model: XiaomiMiMo/MiMo-V2.5
Parallelism: --tp 8 --dp 2 --enable-dp-attention --enable-dp-lm-head --mm-enable-dp-encoder
MTP: SGLANG_ENABLE_SPEC_V2=1 --speculative-algorithm EAGLE --speculative-num-steps 3 --speculative-eagle-topk 1 --speculative-num-draft-tokens 4 --enable-multi-layer-eagle

Benchmark results

Latency, 10 prompts, concurrency 1: 0.68 req/s, 190.09 output tok/s, accept length 3.08.
Throughput, 1000 prompts, concurrency 100: 10.71 req/s, 2095.97 output tok/s, accept length 2.95.
Image, 10 prompts, 2 random 720p images per request: 0.39 req/s, 164.03 output tok/s, accept length 2.94.

Validation

pre-commit run --files docs_new/src/snippets/autoregressive/mimo-v25-deployment.jsx docs_new/cookbook/autoregressive/Xiaomi/MiMo-V2.5.mdx
cd docs_new && mint validate
git diff --check

gemini-code-assist

Code Review

This pull request updates the Xiaomi MiMo-V2.5 documentation and deployment logic to reflect that EAGLE speculative decoding is supported on both the base and Pro variants. It also updates benchmark results for the H200 GPU and corrects the speculative algorithm flag name. Feedback was provided regarding the clarity of the DP configuration command and the validity of the multimodal benchmark data, which currently contains zeroed-out metrics.

gemini-code-assist · 2026-04-28T15:40:47Z


 **MiMo-V2.5 (310B):**
- The checkpoint has a TP=4-interleaved fused `qkv_proj`; attention-TP per DP group **must** be 4. So DP-attention is always required (`--dp = TP / 4`), and total GPUs must be a multiple of 4. A bare `--tp 8` without `--dp 2` will fail to load with `MiMoV2Omni fused qkv_proj checkpoint is TP=4-interleaved; got tp_size=8`.
+- The checkpoint has a TP=4-interleaved fused `qkv_proj`; attention-TP per DP group **must** be 4. Use `--dp = TP / 4`; for TP > 4 this also requires DP-attention. Total GPUs must be a multiple of 4. A bare `--tp 8` without `--dp 2` will fail to load with `MiMoV2 fused qkv_proj checkpoint is TP=4-interleaved; got attention tp_size=8`.


The expression --dp = TP / 4 might be misinterpreted as a literal command-line argument including the equals sign and spaces. It is clearer to state "Set --dp to TP / 4" or use a placeholder like --dp <TP/4> to avoid confusion.

- The checkpoint has a TP=4-interleaved fused qkv_proj; attention-TP per DP group must be 4. Set --dp to TP / 4; for TP > 4 this also requires DP-attention. Total GPUs must be a multiple of 4. A bare --tp 8 without --dp 2 will fail to load with MiMoV2 fused qkv_proj checkpoint is TP=4-interleaved; got attention tp_size=8.

gemini-code-assist · 2026-04-28T15:40:47Z

+Total generated tokens (retokenized):    0
+Request throughput (req/s):              0.39
+Input token throughput (tok/s):          25.69
+Output token throughput (tok/s):         164.03
+Peak output token throughput (tok/s):    1.00
 Peak concurrent requests:                2
-Total token throughput (tok/s):          542.76
+Total token throughput (tok/s):          189.73
 Concurrency:                             1.00
+Accept length:                           2.94
 ----------------End-to-End Latency----------------
-Mean E2E Latency (ms):                   4186.79
-Median E2E Latency (ms):                 3366.20
-P90 E2E Latency (ms):                    7545.54
-P99 E2E Latency (ms):                    9180.85
+Mean E2E Latency (ms):                   2570.74
+Median E2E Latency (ms):                 2411.92
+P90 E2E Latency (ms):                    3711.62
+P99 E2E Latency (ms):                    4949.74
 ---------------Time to First Token----------------
-Mean TTFT (ms):                          1284.90
-Median TTFT (ms):                        622.81
-P99 TTFT (ms):                           5030.79
+Mean TTFT (ms):                          0.00
+Median TTFT (ms):                        0.00
+P99 TTFT (ms):                           0.00
 -----Time per Output Token (excl. 1st token)------
-Mean TPOT (ms):                          7.36
-Median TPOT (ms):                        8.45
-P99 TPOT (ms):                           10.94
+Mean TPOT (ms):                          7.31
+Median TPOT (ms):                        6.17
+P99 TPOT (ms):                           17.18
 ---------------Inter-Token Latency----------------
-Mean ITL (ms):                           9.54
-Median ITL (ms):                         9.45
-P95 ITL (ms):                            9.58
-P99 ITL (ms):                            11.12
-Max ITL (ms):                            37.67
+Mean ITL (ms):                           0.00
+Median ITL (ms):                         0.00
+P95 ITL (ms):                            0.00
+P99 ITL (ms):                            0.00
+Max ITL (ms):                            0.00


The benchmark results for the multimodal image run (Section 5.3.3) contain several 0.00 or 0 values for critical metrics such as Total generated tokens (retokenized), Mean TTFT, and Inter-Token Latency. Additionally, the Peak output token throughput is reported as 1.00. These values suggest that the benchmark data was not captured correctly or is incomplete. Please update this section with valid benchmark results.

JustinTong0323 requested a review from wisclmy0611 as a code owner April 28, 2026 15:00

docs: enable MiMo V2.5 MTP cookbook path

e701d30

JustinTong0323 force-pushed the docs/mimo-v25-mtp-cookbook branch from 89226e7 to e701d30 Compare April 28, 2026 15:01

gemini-code-assist Bot reviewed Apr 28, 2026

View reviewed changes

wisclmy0611 approved these changes Apr 28, 2026

View reviewed changes

wisclmy0611 merged commit e458a92 into sgl-project:main Apr 28, 2026
42 checks passed

vguduruTT pushed a commit to vguduruTT/sglang that referenced this pull request May 2, 2026

docs: enable MiMo V2.5 MTP cookbook path (sgl-project#23945)

ad05d33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: enable MiMo V2.5 MTP cookbook path#23945

docs: enable MiMo V2.5 MTP cookbook path#23945
wisclmy0611 merged 1 commit intosgl-project:mainfrom
JustinTong0323:docs/mimo-v25-mtp-cookbook

JustinTong0323 commented Apr 28, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JustinTong0323 commented Apr 28, 2026

Summary

Serving configuration used for the MiMo-V2.5 benchmark

Benchmark results

Validation

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants