[MiMoV2Flash] [feat]: support two batch overlap by TZHelloWorld · Pull Request #17634 · sgl-project/sglang

TZHelloWorld · 2026-01-23T06:52:02Z

Motivation

support mimo_v2_flash two batch overlap:
p:

python3 -m sglang.launch_server \
    --model-path /mnt/mify-gw-model-alicn3/models/global_step_84-FP8-Block \
    --pp-size 1 --dp-size 2 --tp-size 8 \
    --enable-dp-attention \
    --moe-a2a-backend deepep \
    --deepep-mode normal \
    --disaggregation-mode prefill \
    --page-size 1 \
    --host 0.0.0.0 \
    --port 30010 \
    --trust-remote-code \
    --moe-dense-tp-size 1 \
    --enable-dp-lm-head \
    --mem-fraction-static 0.7 \
    --max-running-requests 32 \
    --reasoning-parser qwen3 \
    --tool-call-parser mimo \
    --context-length 262144 \
    --model-loader-extra-config '{"enable_multithread_load": "true","num_threads": 64}' \
    --attention-backend fa3 \
    --allow-auto-truncate \
    --chunked-prefill-size 16384 \
    --enable-two-batch-overlap

d:

SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK=1024 \
python3 -m sglang.launch_server \
    --model-path /mnt/mify-gw-model-alicn3/models/global_step_84-FP8-Block \
    --pp-size 1 --dp-size 2 --tp-size 8 \
    --enable-dp-attention \
    --moe-a2a-backend deepep \
    --deepep-mode low_latency \
    --decode-log-interval 1 \
    --page-size 1 \
    --host 0.0.0.0 --port 30020 \
    --trust-remote-code \
    --watchdog-timeout 1000000 \
    --mem-fraction-static 0.7 \
    --max-running-requests 32 \
    --reasoning-parser qwen3 \
    --tool-call-parser mimo \
    --context-length 262144 \
    --model-loader-extra-config '{"enable_multithread_load": "true","num_threads": 64}' \
    --attention-backend fa3 \
    --disaggregation-mode decode \
    --moe-dense-tp-size 1 \
    --enable-dp-lm-head \
    --enable-two-batch-overlap

lb:

python -m sglang_router.launch_router \
        --pd-disaggregation \
        --prefill http://127.x.x.1:30010 \
        --decode http://127.x.x.1:30020 \               
        --host 0.0.0.0 \               
        --port 30000 \
        --mini-lb

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-01-23T06:52:06Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copilot

Pull request overview

This PR adds support for two-batch overlap (TBO) optimization to the MiMoV2Flash model, enabling overlapped execution of prefill and decode batches for improved throughput in disaggregated serving scenarios.

Changes:

Added TBO operation methods to MiMoV2MoE, MiMoV2Attention, and MiMoV2DecoderLayer classes for batch overlap support
Integrated TBO into MiMoV2Model's forward pass with conditional execution based on can_run_tbo flag
Added MiMoV2DecoderLayer-specific operation strategies for prefill and decode modes in the batch overlap system

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File	Description
python/sglang/srt/models/mimo_v2_flash.py	Implements TBO operation methods and integrates TBO into model forward pass
python/sglang/srt/batch_overlap/operations_strategy.py	Defines operation scheduling strategies for MiMoV2 layers in TBO mode

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TZHelloWorld · 2026-01-23T09:03:41Z

/tag-and-rerun-ci

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

acelyc111 · 2026-01-26T10:01:18Z

Add a new test to enable TBO in test/srt/models/test_mimo_models.py?

Copilot

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

TZHelloWorld · 2026-01-26T11:27:08Z

/tag-and-rerun-ci

acelyc111 · 2026-01-26T13:07:11Z

/tag-and-rerun-ci

TZHelloWorld · 2026-01-27T02:30:49Z

/rerun-failed-ci

Kangyan-Zhou · 2026-02-01T07:36:07Z

@TZHelloWorld could you please resolve conflicts?

acelyc111 · 2026-02-02T07:44:46Z

/rerun-failed-ci

TZHelloWorld · 2026-02-02T12:17:48Z

/rerun-failed-ci

acelyc111 requested a review from Copilot January 23, 2026 06:57

Copilot started reviewing on behalf of acelyc111 January 23, 2026 06:57 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

Comment thread python/sglang/srt/models/mimo_v2_flash.py Outdated

acelyc111 requested a review from Copilot January 23, 2026 09:05

Copilot started reviewing on behalf of acelyc111 January 23, 2026 09:05 View session

Copilot AI reviewed Jan 23, 2026

View reviewed changes

Comment thread python/sglang/srt/models/mimo_v2_flash.py Outdated

acelyc111 changed the title ~~[feat] support mimo_v2_flash two batch overlap~~ [MiMoV2Flash] [feat]: support two batch overlap Jan 26, 2026

acelyc111 reviewed Jan 26, 2026

View reviewed changes

Comment thread python/sglang/srt/batch_overlap/operations_strategy.py Outdated

acelyc111 requested a review from Copilot January 26, 2026 09:55

Copilot started reviewing on behalf of acelyc111 January 26, 2026 09:56 View session

acelyc111 reviewed Jan 26, 2026

View reviewed changes

Comment thread python/sglang/srt/models/mimo_v2_flash.py Outdated

acelyc111 mentioned this pull request Jan 26, 2026

[Tracking] MiMo-V2-Flash Day 0 Support and Continuous Optimization #15263

Open

17 tasks

Copilot AI reviewed Jan 26, 2026

View reviewed changes

TZHelloWorld force-pushed the dev/support_mimo_v2_flash_tbo branch from 298cace to 5ac9cc0 Compare January 26, 2026 10:09

TZHelloWorld force-pushed the dev/support_mimo_v2_flash_tbo branch from cd5b227 to 32abf33 Compare January 26, 2026 12:13

acelyc111 approved these changes Jan 26, 2026

View reviewed changes

acelyc111 added the run-ci label Jan 26, 2026

TZHelloWorld force-pushed the dev/support_mimo_v2_flash_tbo branch from 20f597d to 5f12e0f Compare February 2, 2026 02:14

TZHelloWorld added 3 commits February 2, 2026 20:52

[feat] support mimo_v2_flash two batch overlap

ecac401

fix tbo pp

5dde718

rename tbo start layer

c1f60f9

TZHelloWorld added 5 commits February 2, 2026 20:52

add mimo_v2_flash tbo ci test

4946f30

remove todo

abf7337

fix input_data_scatter_mode when pp

45395e8

remove unless & reformat

69577f8

remove mimo_v2_flash tbo test ci

5edf37c

TZHelloWorld force-pushed the dev/support_mimo_v2_flash_tbo branch from 94aca22 to 5edf37c Compare February 2, 2026 12:52

Kangyan-Zhou merged commit cbf1500 into sgl-project:main Feb 2, 2026
147 of 161 checks passed

charlesHsuGG pushed a commit to charlesHsuGG/sglang that referenced this pull request Feb 5, 2026

[MiMoV2Flash] [feat]: support two batch overlap (sgl-project#17634)

44351ca

sfiisf pushed a commit to sfiisf/sglang that referenced this pull request Feb 5, 2026

[MiMoV2Flash] [feat]: support two batch overlap (sgl-project#17634)

98cd9be

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

[MiMoV2Flash] [feat]: support two batch overlap (sgl-project#17634)

96c5ac8

Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026

[MiMoV2Flash] [feat]: support two batch overlap (sgl-project#17634)

77692af

Conversation

TZHelloWorld commented Jan 23, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Jan 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

TZHelloWorld commented Jan 23, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

acelyc111 commented Jan 26, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

TZHelloWorld commented Jan 26, 2026

Uh oh!

acelyc111 commented Jan 26, 2026

Uh oh!

TZHelloWorld commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kangyan-Zhou commented Feb 1, 2026

Uh oh!

acelyc111 commented Feb 2, 2026

Uh oh!

TZHelloWorld commented Feb 2, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

TZHelloWorld commented Jan 27, 2026 •

edited

Loading