Skip to content

[MiMoV2Flash] [feat]: support two batch overlap#17634

Merged
Kangyan-Zhou merged 8 commits intosgl-project:mainfrom
TZHelloWorld:dev/support_mimo_v2_flash_tbo
Feb 2, 2026
Merged

[MiMoV2Flash] [feat]: support two batch overlap#17634
Kangyan-Zhou merged 8 commits intosgl-project:mainfrom
TZHelloWorld:dev/support_mimo_v2_flash_tbo

Conversation

@TZHelloWorld
Copy link
Copy Markdown
Contributor

Motivation

support mimo_v2_flash two batch overlap:
p:

python3 -m sglang.launch_server \
    --model-path /mnt/mify-gw-model-alicn3/models/global_step_84-FP8-Block \
    --pp-size 1 --dp-size 2 --tp-size 8 \
    --enable-dp-attention \
    --moe-a2a-backend deepep \
    --deepep-mode normal \
    --disaggregation-mode prefill \
    --page-size 1 \
    --host 0.0.0.0 \
    --port 30010 \
    --trust-remote-code \
    --moe-dense-tp-size 1 \
    --enable-dp-lm-head \
    --mem-fraction-static 0.7 \
    --max-running-requests 32 \
    --reasoning-parser qwen3 \
    --tool-call-parser mimo \
    --context-length 262144 \
    --model-loader-extra-config '{"enable_multithread_load": "true","num_threads": 64}' \
    --attention-backend fa3 \
    --allow-auto-truncate \
    --chunked-prefill-size 16384 \
    --enable-two-batch-overlap

d:

SGLANG_DEEPEP_NUM_MAX_DISPATCH_TOKENS_PER_RANK=1024 \
python3 -m sglang.launch_server \
    --model-path /mnt/mify-gw-model-alicn3/models/global_step_84-FP8-Block \
    --pp-size 1 --dp-size 2 --tp-size 8 \
    --enable-dp-attention \
    --moe-a2a-backend deepep \
    --deepep-mode low_latency \
    --decode-log-interval 1 \
    --page-size 1 \
    --host 0.0.0.0 --port 30020 \
    --trust-remote-code \
    --watchdog-timeout 1000000 \
    --mem-fraction-static 0.7 \
    --max-running-requests 32 \
    --reasoning-parser qwen3 \
    --tool-call-parser mimo \
    --context-length 262144 \
    --model-loader-extra-config '{"enable_multithread_load": "true","num_threads": 64}' \
    --attention-backend fa3 \
    --disaggregation-mode decode \
    --moe-dense-tp-size 1 \
    --enable-dp-lm-head \
    --enable-two-batch-overlap

lb:

python -m sglang_router.launch_router \
        --pd-disaggregation \
        --prefill http://127.x.x.1:30010 \
        --decode http://127.x.x.1:30020 \               
        --host 0.0.0.0 \               
        --port 30000 \
        --mini-lb 

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for two-batch overlap (TBO) optimization to the MiMoV2Flash model, enabling overlapped execution of prefill and decode batches for improved throughput in disaggregated serving scenarios.

Changes:

  • Added TBO operation methods to MiMoV2MoE, MiMoV2Attention, and MiMoV2DecoderLayer classes for batch overlap support
  • Integrated TBO into MiMoV2Model's forward pass with conditional execution based on can_run_tbo flag
  • Added MiMoV2DecoderLayer-specific operation strategies for prefill and decode modes in the batch overlap system

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.

File Description
python/sglang/srt/models/mimo_v2_flash.py Implements TBO operation methods and integrates TBO into model forward pass
python/sglang/srt/batch_overlap/operations_strategy.py Defines operation scheduling strategies for MiMoV2 layers in TBO mode

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/sglang/srt/models/mimo_v2_flash.py Outdated
@TZHelloWorld
Copy link
Copy Markdown
Contributor Author

/tag-and-rerun-ci

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/sglang/srt/models/mimo_v2_flash.py Outdated
@acelyc111 acelyc111 changed the title [feat] support mimo_v2_flash two batch overlap [MiMoV2Flash] [feat]: support two batch overlap Jan 26, 2026
Comment thread python/sglang/srt/batch_overlap/operations_strategy.py Outdated
Comment thread python/sglang/srt/models/mimo_v2_flash.py Outdated
@acelyc111
Copy link
Copy Markdown
Collaborator

Add a new test to enable TBO in test/srt/models/test_mimo_models.py?

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread python/sglang/srt/batch_overlap/operations_strategy.py
Comment thread python/sglang/srt/batch_overlap/operations_strategy.py Outdated
Comment thread python/sglang/srt/batch_overlap/operations_strategy.py Outdated
Comment thread python/sglang/srt/models/mimo_v2_flash.py Outdated
Comment thread python/sglang/srt/models/mimo_v2_flash.py Outdated
@TZHelloWorld TZHelloWorld force-pushed the dev/support_mimo_v2_flash_tbo branch from 298cace to 5ac9cc0 Compare January 26, 2026 10:09
@TZHelloWorld
Copy link
Copy Markdown
Contributor Author

/tag-and-rerun-ci

@TZHelloWorld TZHelloWorld force-pushed the dev/support_mimo_v2_flash_tbo branch from cd5b227 to 32abf33 Compare January 26, 2026 12:13
@acelyc111
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@TZHelloWorld
Copy link
Copy Markdown
Contributor Author

TZHelloWorld commented Jan 27, 2026

/rerun-failed-ci

@Kangyan-Zhou
Copy link
Copy Markdown
Collaborator

@TZHelloWorld could you please resolve conflicts?

@TZHelloWorld TZHelloWorld force-pushed the dev/support_mimo_v2_flash_tbo branch from 20f597d to 5f12e0f Compare February 2, 2026 02:14
@acelyc111
Copy link
Copy Markdown
Collaborator

/rerun-failed-ci

1 similar comment
@TZHelloWorld
Copy link
Copy Markdown
Contributor Author

/rerun-failed-ci

@TZHelloWorld TZHelloWorld force-pushed the dev/support_mimo_v2_flash_tbo branch from 94aca22 to 5edf37c Compare February 2, 2026 12:52
@Kangyan-Zhou Kangyan-Zhou merged commit cbf1500 into sgl-project:main Feb 2, 2026
147 of 161 checks passed
charlesHsuGG pushed a commit to charlesHsuGG/sglang that referenced this pull request Feb 5, 2026
sfiisf pushed a commit to sfiisf/sglang that referenced this pull request Feb 5, 2026
Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026
Wangzheee pushed a commit to Wangzheee/sglang that referenced this pull request Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants