Skip to content

[Tracking] SGLang CI/CD Test Coverage Improvements - Q2 2026 Roadmap #20847

@nvpohanh

Description

@nvpohanh

SGLang CI/CD Test Coverage Improvements - Q2 2026 Roadmap

This issue tracks the Q2 2026 (April–June) work plan for improving SGLang's CI test coverage on Blackwell GPUs. Part of the broader CI improvement initiative tracked in #20514.

Summary

The Q2 plan focuses on three workstreams:

  • Full E2E Accuracy + Disagg: Build disaggregated inference accuracy testing from scratch on GB200
  • Full E2E Accuracy + Agg: Ensure aggregated accuracy tests cover key models/configs on B200/GB200
  • Reduced Layers Functional Tests: Introduce lightweight reduced-layers tests for broad config coverage

Q2 Roadmap

Test Hierarchy

Image

Full E2E Accuracy + Disagg

We can extend from single-node Disagg tests but run with the "key models/configs".

Month Milestone Models Configs Status
April 2026 First Disagg accuracy test in nightly GB200 DSR1 FP8, 1P1D, DEP8, no-MTP 🔲 NOT STARTED
May 2026 Expand to more features DSR1 {FP8, NVFP4} x {1P1D} x {wideEP, narrowEP, low-latency} x {no-MTP, MTPv2} 🔲 NOT STARTED
June 2026 Expand to more models {DSR1, Qwen3.5, GLM5} {FP8, NVFP4} x {1P1D} x {wideEP, narrowEP, low-latency} x {no-MTP, MTPv2} x {Mooncake, NIXL} 🔲 NOT STARTED

Full E2E Accuracy + Agg

Capacity is limited — only sparse selected models/configs can be added.

There are already some existed tests. (e.g. Qwen3.5, but only TP8 and TP8+MTPv2). We need to ensure coverage for different precisions and DEP.

Month Milestone Models Configs Status
April 2026 Key models/configs on B200/GB200 {DSR1, Qwen3.5, GLM5} {FP8, NVFP4} x {max-tput, low-latency} x {no-MTP, MTPv2} 🔲 NOT STARTED
May 2026 More models on B200/GB200 {Minimax-M2.5, Qwen3-Coder} {FP8, NVFP4} x {max-tput, low-latency} x {no-MTP, MTPv2 if applicable} 🔲 NOT STARTED
June 2026 TBD — new models if any TBD TBD 🔲 NOT STARTED

Reduced Layers Functional Tests

Coverage-driven methodology: lightweight tests (~100x cheaper than full E2E per model/config, ideally) enabling broad config sweeps.

Please refer to #20512 for details about the new "Reduced Layers Tests".

Month Milestone Models Configs Status
April 2026 Proof-of-concept in nightly H200/B200 DSR1 (4 layers: 2 MLP + 2 MoE) Baseline 🔲 NOT STARTED
May 2026 Expand to key models with many configs {DSR1, Qwen3.5, GLM5} {FP8, NVFP4} x {DEP, DTP, TP, TEP} x {no-MTP, MTPv2} x etc. 🔲 NOT STARTED
June 2026 Expand to more models {Minimax-M2.5, Qwen3-Coder} TBD 🔲 NOT STARTED

Kernel-Level Tests

No Q2 plan. Known Hopper vs Blackwell gaps are tracked in #20507. No high-priority kernel tests need to be ported to Blackwell immediately.

Status Legend

  • 🔲 NOT STARTED — not yet started
  • 🔄 IN PROGRESS — work in progress
  • ✅ DONE — completed
  • ⏳ BLOCKED — waiting on a dependency
  • ❌ SKIP — determined to not be applicable

Weekly Progress

2026-03-18

  • Created Q2 2026 roadmap tracking issue

Related Issues

WIP slides

Metadata

Metadata

Assignees

Labels

cicontinue integration relatednvidia

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions