[Tracking] SGLang CI/CD Test Coverage Improvements - Q2 2026 Roadmap

# SGLang CI/CD Test Coverage Improvements - Q2 2026 Roadmap

This issue tracks the Q2 2026 (April–June) work plan for improving SGLang's CI test coverage on Blackwell GPUs. Part of the broader CI improvement initiative tracked in #20514.

## Summary

The Q2 plan focuses on three workstreams:

- **Full E2E Accuracy + Disagg**: Build disaggregated inference accuracy testing from scratch on GB200
- **Full E2E Accuracy + Agg**: Ensure aggregated accuracy tests cover key models/configs on B200/GB200
- **Reduced Layers Functional Tests**: Introduce lightweight reduced-layers tests for broad config coverage

## Q2 Roadmap

### Test Hierarchy

<img width="916" height="672" alt="Image" src="https://github.com/user-attachments/assets/49d2e616-eb62-4f65-b7f8-145027c76651" />

### Full E2E Accuracy + Disagg

We can extend from [single-node Disagg tests](https://github.com/sgl-project/sglang/blob/main/test/registered/distributed/test_disaggregation_different_tp.py) but run with the "key models/configs".

| Month | Milestone | Models | Configs | Status |
|---|---|---|---|---|
| April 2026 | First Disagg accuracy test in nightly GB200 | DSR1 | FP8, 1P1D, DEP8, no-MTP | 🔲 NOT STARTED |
| May 2026 | Expand to more features | DSR1 | {FP8, NVFP4} x {1P1D} x {wideEP, narrowEP, low-latency} x {no-MTP, MTPv2} | 🔲 NOT STARTED |
| June 2026 | Expand to more models | {DSR1, Qwen3.5, GLM5} | {FP8, NVFP4} x {1P1D} x {wideEP, narrowEP, low-latency} x {no-MTP, MTPv2} x {Mooncake, NIXL} | 🔲 NOT STARTED |

### Full E2E Accuracy + Agg

Capacity is limited — only sparse selected models/configs can be added.

There are already some existed tests. (e.g. [Qwen3.5](https://github.com/sgl-project/sglang/blob/main/test/registered/8-gpu-models/test_qwen35.py), but only TP8 and TP8+MTPv2). We need to ensure coverage for different precisions and DEP.

| Month | Milestone | Models | Configs | Status |
|---|---|---|---|---|
| April 2026 | Key models/configs on B200/GB200 | {DSR1, Qwen3.5, GLM5} | {FP8, NVFP4} x {max-tput, low-latency} x {no-MTP, MTPv2} | 🔲 NOT STARTED |
| May 2026 | More models on B200/GB200 | {Minimax-M2.5, Qwen3-Coder} | {FP8, NVFP4} x {max-tput, low-latency} x {no-MTP, MTPv2 if applicable} | 🔲 NOT STARTED |
| June 2026 | TBD — new models if any | TBD | TBD | 🔲 NOT STARTED |

### Reduced Layers Functional Tests

Coverage-driven methodology: lightweight tests (~100x cheaper than full E2E per model/config, ideally) enabling broad config sweeps.

Please refer to https://github.com/sgl-project/sglang/issues/20512 for details about the new "Reduced Layers Tests".

| Month | Milestone | Models | Configs | Status |
|---|---|---|---|---|
| April 2026 | Proof-of-concept in nightly H200/B200 | DSR1 (4 layers: 2 MLP + 2 MoE) | Baseline | 🔲 NOT STARTED |
| May 2026 | Expand to key models with many configs | {DSR1, Qwen3.5, GLM5} | {FP8, NVFP4} x {DEP, DTP, TP, TEP} x {no-MTP, MTPv2} x etc. | 🔲 NOT STARTED |
| June 2026 | Expand to more models | {Minimax-M2.5, Qwen3-Coder} | TBD | 🔲 NOT STARTED |

### Kernel-Level Tests

No Q2 plan. Known Hopper vs Blackwell gaps are tracked in #20507. No high-priority kernel tests need to be ported to Blackwell immediately.

## Status Legend

- 🔲 NOT STARTED — not yet started
- 🔄 IN PROGRESS — work in progress
- ✅ DONE — completed
- ⏳ BLOCKED — waiting on a dependency
- ❌ SKIP — determined to not be applicable

---

## Weekly Progress

### 2026-03-18

- Created Q2 2026 roadmap tracking issue

---

## Related Issues

- #20514 — SGLang CI/CD Test Coverage Improvements (initiative)
- #20507 — Add Existing Hopper Tests to Blackwell CI (Part 1)
- #20510 — Add Kernel-Level Tests for External Backends (Part 2)
- #20512 — Add Reduced-Layers E2E Tests (Part 3)
- #20513 — Ensure Full E2E Tests for Key Models/Configs (Part 4)

[WIP slides](https://github.com/user-attachments/files/26207786/SGLang.CI.Improvement.-.Q2.2026.Roadmap.pdf)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking] SGLang CI/CD Test Coverage Improvements - Q2 2026 Roadmap #20847

SGLang CI/CD Test Coverage Improvements - Q2 2026 Roadmap

Summary

Q2 Roadmap

Test Hierarchy

Full E2E Accuracy + Disagg

Full E2E Accuracy + Agg

Reduced Layers Functional Tests

Kernel-Level Tests

Status Legend

Weekly Progress

2026-03-18

Related Issues

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Month	Milestone	Models	Configs	Status
April 2026	First Disagg accuracy test in nightly GB200	DSR1	FP8, 1P1D, DEP8, no-MTP	🔲 NOT STARTED
May 2026	Expand to more features	DSR1	{FP8, NVFP4} x {1P1D} x {wideEP, narrowEP, low-latency} x {no-MTP, MTPv2}	🔲 NOT STARTED
June 2026	Expand to more models	{DSR1, Qwen3.5, GLM5}	{FP8, NVFP4} x {1P1D} x {wideEP, narrowEP, low-latency} x {no-MTP, MTPv2} x {Mooncake, NIXL}	🔲 NOT STARTED

Month	Milestone	Models	Configs	Status
April 2026	Key models/configs on B200/GB200	{DSR1, Qwen3.5, GLM5}	{FP8, NVFP4} x {max-tput, low-latency} x {no-MTP, MTPv2}	🔲 NOT STARTED
May 2026	More models on B200/GB200	{Minimax-M2.5, Qwen3-Coder}	{FP8, NVFP4} x {max-tput, low-latency} x {no-MTP, MTPv2 if applicable}	🔲 NOT STARTED
June 2026	TBD — new models if any	TBD	TBD	🔲 NOT STARTED

Month	Milestone	Models	Configs	Status
April 2026	Proof-of-concept in nightly H200/B200	DSR1 (4 layers: 2 MLP + 2 MoE)	Baseline	🔲 NOT STARTED
May 2026	Expand to key models with many configs	{DSR1, Qwen3.5, GLM5}	{FP8, NVFP4} x {DEP, DTP, TP, TEP} x {no-MTP, MTPv2} x etc.	🔲 NOT STARTED
June 2026	Expand to more models	{Minimax-M2.5, Qwen3-Coder}	TBD	🔲 NOT STARTED

[Tracking] SGLang CI/CD Test Coverage Improvements - Q2 2026 Roadmap #20847

Description

SGLang CI/CD Test Coverage Improvements - Q2 2026 Roadmap

Summary

Q2 Roadmap

Test Hierarchy

Full E2E Accuracy + Disagg

Full E2E Accuracy + Agg

Reduced Layers Functional Tests

Kernel-Level Tests

Status Legend

Weekly Progress

2026-03-18

Related Issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions