SGLang CI/CD Test Coverage Improvements - Q2 2026 Roadmap
This issue tracks the Q2 2026 (April–June) work plan for improving SGLang's CI test coverage on Blackwell GPUs. Part of the broader CI improvement initiative tracked in #20514.
Summary
The Q2 plan focuses on three workstreams:
- Full E2E Accuracy + Disagg: Build disaggregated inference accuracy testing from scratch on GB200
- Full E2E Accuracy + Agg: Ensure aggregated accuracy tests cover key models/configs on B200/GB200
- Reduced Layers Functional Tests: Introduce lightweight reduced-layers tests for broad config coverage
Q2 Roadmap
Test Hierarchy
Full E2E Accuracy + Disagg
We can extend from single-node Disagg tests but run with the "key models/configs".
| Month |
Milestone |
Models |
Configs |
Status |
| April 2026 |
First Disagg accuracy test in nightly GB200 |
DSR1 |
FP8, 1P1D, DEP8, no-MTP |
🔲 NOT STARTED |
| May 2026 |
Expand to more features |
DSR1 |
{FP8, NVFP4} x {1P1D} x {wideEP, narrowEP, low-latency} x {no-MTP, MTPv2} |
🔲 NOT STARTED |
| June 2026 |
Expand to more models |
{DSR1, Qwen3.5, GLM5} |
{FP8, NVFP4} x {1P1D} x {wideEP, narrowEP, low-latency} x {no-MTP, MTPv2} x {Mooncake, NIXL} |
🔲 NOT STARTED |
Full E2E Accuracy + Agg
Capacity is limited — only sparse selected models/configs can be added.
There are already some existed tests. (e.g. Qwen3.5, but only TP8 and TP8+MTPv2). We need to ensure coverage for different precisions and DEP.
| Month |
Milestone |
Models |
Configs |
Status |
| April 2026 |
Key models/configs on B200/GB200 |
{DSR1, Qwen3.5, GLM5} |
{FP8, NVFP4} x {max-tput, low-latency} x {no-MTP, MTPv2} |
🔲 NOT STARTED |
| May 2026 |
More models on B200/GB200 |
{Minimax-M2.5, Qwen3-Coder} |
{FP8, NVFP4} x {max-tput, low-latency} x {no-MTP, MTPv2 if applicable} |
🔲 NOT STARTED |
| June 2026 |
TBD — new models if any |
TBD |
TBD |
🔲 NOT STARTED |
Reduced Layers Functional Tests
Coverage-driven methodology: lightweight tests (~100x cheaper than full E2E per model/config, ideally) enabling broad config sweeps.
Please refer to #20512 for details about the new "Reduced Layers Tests".
| Month |
Milestone |
Models |
Configs |
Status |
| April 2026 |
Proof-of-concept in nightly H200/B200 |
DSR1 (4 layers: 2 MLP + 2 MoE) |
Baseline |
🔲 NOT STARTED |
| May 2026 |
Expand to key models with many configs |
{DSR1, Qwen3.5, GLM5} |
{FP8, NVFP4} x {DEP, DTP, TP, TEP} x {no-MTP, MTPv2} x etc. |
🔲 NOT STARTED |
| June 2026 |
Expand to more models |
{Minimax-M2.5, Qwen3-Coder} |
TBD |
🔲 NOT STARTED |
Kernel-Level Tests
No Q2 plan. Known Hopper vs Blackwell gaps are tracked in #20507. No high-priority kernel tests need to be ported to Blackwell immediately.
Status Legend
- 🔲 NOT STARTED — not yet started
- 🔄 IN PROGRESS — work in progress
- ✅ DONE — completed
- ⏳ BLOCKED — waiting on a dependency
- ❌ SKIP — determined to not be applicable
Weekly Progress
2026-03-18
- Created Q2 2026 roadmap tracking issue
Related Issues
WIP slides
SGLang CI/CD Test Coverage Improvements - Q2 2026 Roadmap
This issue tracks the Q2 2026 (April–June) work plan for improving SGLang's CI test coverage on Blackwell GPUs. Part of the broader CI improvement initiative tracked in #20514.
Summary
The Q2 plan focuses on three workstreams:
Q2 Roadmap
Test Hierarchy
Full E2E Accuracy + Disagg
We can extend from single-node Disagg tests but run with the "key models/configs".
Full E2E Accuracy + Agg
Capacity is limited — only sparse selected models/configs can be added.
There are already some existed tests. (e.g. Qwen3.5, but only TP8 and TP8+MTPv2). We need to ensure coverage for different precisions and DEP.
Reduced Layers Functional Tests
Coverage-driven methodology: lightweight tests (~100x cheaper than full E2E per model/config, ideally) enabling broad config sweeps.
Please refer to #20512 for details about the new "Reduced Layers Tests".
Kernel-Level Tests
No Q2 plan. Known Hopper vs Blackwell gaps are tracked in #20507. No high-priority kernel tests need to be ported to Blackwell immediately.
Status Legend
Weekly Progress
2026-03-18
Related Issues
WIP slides