[Tracking] `MiMo-V2-Flash` Day 0 Support and Continuous Optimization

### Checklist

- [x] If this is not a feature request but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Motivation

MiMo-V2-Flash is a Mixture-of-Experts (MoE) language model with 309B total parameters and 15B active parameters. Designed for high-speed reasoning and agentic workflows, it utilizes a novel hybrid attention architecture and Multi-Token Prediction (MTP) to achieve state-of-the-art performance while significantly reducing inference costs.

See it on HF: https://huggingface.co/XiaomiMiMo/MiMo-V2-Flash
LMSys blog: https://lmsys.org/blog/2025-12-16-mimo-v2-flash/

- [ ] Day0 support https://github.com/sgl-project/sglang/pull/15207
  - [x] Basic model adaption
  - [x] Enhance Sliding Window Attention (SWA)
  - [x] Introduce multi-layer MTP
  - [x] Fix bugs related to PDD ans SWA KV cache management
- [ ] All modifications based on v0.5.5 https://github.com/sgl-project/sglang/pull/15208
  - [ ] support TBO https://github.com/sgl-project/sglang/pull/17634
- [ ] Improve Accuracy
- [ ] Improve SWA KV cache management
- [ ] Improve multi-layer MTP

## Installation

### Docker

```bash
# Pull the docker image
docker pull lmsysorg/sglang:dev-pr-15207

# Launch the container
docker run -it --gpus all \
  --shm-size=32g \
  --ipc=host \
  --network=host \
  lmsysorg/sglang:dev-pr-15207 bash
```

### Pip Installation

```bash
# On a machine with SGLang dependencies installed or inside a SGLang nightly container
# Start an SGLang nightly container
docker run -it --gpus all \
  --shm-size=32g \
  --ipc=host \
  --network=host \
  lmsysorg/sglang:nightly-dev-20251215-4449c170 bash

# If you already have SGLang installed, uninstall the current SGLang version
pip uninstall sglang -y

# Install the PyPI Package
pip install sglang==0.5.6.post2.dev8005+pr.15207.g39d5bd57a \
  --index-url https://sgl-project.github.io/whl/pr/ \
  --extra-index-url https://pypi.org/simple
```

## Launch Command

```bash
SGLANG_ENABLE_SPEC_V2=1 python3 -m sglang.launch_server \
        --model-path XiaomiMiMo/MiMo-V2-Flash \
        --dp-size 2 \
        --enable-dp-attention \
        --tp-size 8 \
        --trust-remote-code \
        --mem-fraction-static 0.75 \
        --max-running-requests 128 \
        --chunked-prefill-size 16384 \
        --reasoning-parser qwen3 \
        --tool-call-parser mimo \
        --model-loader-extra-config '{"enable_multithread_load": "true","num_threads": 64}' \
        --attention-backend fa3 \
        --speculative-algorithm EAGLE \
        --speculative-num-steps=3 \
        --speculative-eagle-topk=1 \
        --speculative-num-draft-tokens=4 \
        --enable-mtp
```

## Future Plan

- [ ] Basic model adaption: https://github.com/sgl-project/sglang/pull/15207
- [ ] All features support: https://github.com/sgl-project/sglang/pull/15208
- [ ] Improve Accuracy.
- [ ] Improve SWA KV cache management.
- [ ] Improve multi-layer MTP.

### Related resources

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking] `MiMo-V2-Flash` Day 0 Support and Continuous Optimization #15263

Checklist

Motivation

Installation

Docker

Pip Installation

Launch Command

Future Plan

Related resources

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Tracking] MiMo-V2-Flash Day 0 Support and Continuous Optimization #15263

Description

Checklist

Motivation

Installation

Docker

Pip Installation

Launch Command

Future Plan

Related resources

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[Tracking] `MiMo-V2-Flash` Day 0 Support and Continuous Optimization #15263