Sandbox: verify full main CI is green on latest main (do not merge)#25647
Sandbox: verify full main CI is green on latest main (do not merge)#25647fzyzcjy wants to merge 1 commit into
Conversation
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
|
/tag-and-rerun-ci |
CI failure:
|
CI failure:
|
CI failure:
|
|
/rerun-test test/registered/spec/eagle/test_eagle_infer_b.py |
|
🚀 |
|
/rerun-test test/registered/lora/test_lora_qwen3_8b_logprob_diff.py |
|
/rerun-test test/registered/core/test_srt_endpoint.py |
|
🚀 |
|
🚀 |
|
|
| File | Original lane | Rerun verdict |
|---|---|---|
test/registered/spec/eagle/test_eagle_infer_b.py (test_radix_attention) |
base-b-test-1-gpu-large (1) |
✅ PASS — flake |
test/registered/core/test_srt_endpoint.py (test_get_server_info_concurrent) |
base-b-test-1-gpu-small (5) |
✅ PASS — flake |
test/registered/lora/test_lora_qwen3_8b_logprob_diff.py (test_lora_qwen3_8b_logprob_accuracy) |
extra-a-test-1-gpu-large (0) |
❌ FAIL same CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS during CUDA graph capture — real bug (bisecting next) |
Bisect probe:
|
Bisect probes:
|
| SHA | Date | Subject | rerun-test verdict |
|---|---|---|---|
ba214ef3d3 |
2026-05-14 | tag-gated nightly migration — 40 whole-file moves | PASS |
229cadec04 |
2026-05-16 | logging update for inplace setting in MoE layer | PASS |
c58b47bc86 |
2026-05-18 | PoolStats dataclass move | (in flight) |
d90bc65e30 |
2026-05-19 | [NPU] Fix TypeError in MLA index_head_dim |
FAIL |
| current HEAD | 2026-05-19 | (Tom's chain + 5 unrelated) | FAIL |
Bisect probe:
|
| SHA | Date | Verdict |
|---|---|---|
ba214ef3d3 |
2026-05-14 | PASS |
229cadec04 |
2026-05-16 | PASS |
c58b47bc86 |
2026-05-18 | PASS ✅ |
f04c522534 |
2026-05-18 | (in flight) |
d90bc65e30 |
2026-05-19 | FAIL |
Bisect probe:
|
| SHA | Date | Verdict |
|---|---|---|
ba214ef3d3 |
2026-05-14 | PASS |
229cadec04 |
2026-05-16 | PASS |
c58b47bc86 |
2026-05-18 | PASS |
f04c522534 |
2026-05-18 | PASS ✅ |
f5049709b3 |
2026-05-18 | (in flight) |
d90bc65e30 |
2026-05-19 | FAIL |
Bisect probe:
|
| SHA | Date | Verdict |
|---|---|---|
f5049709b3 |
2026-05-18 | PASS ✅ (last good lower bound) |
878e6b8886 |
2026-05-18 | (in flight) |
d90bc65e30 |
2026-05-19 | FAIL (first bad upper bound) |
Bisect probe:
|
| SHA | Date | Verdict |
|---|---|---|
878e6b8886 |
2026-05-18 | PASS ✅ (last good) |
b79e4b1e68 |
2026-05-18 | (in flight — prime suspect) |
d90bc65e30 |
2026-05-19 | FAIL (first bad) |
Bisect probe:
|
| SHA | Date | Verdict |
|---|---|---|
878e6b8886 |
2026-05-18 | PASS ✅ (last good) |
745abd6cc0 |
2026-05-18 | (untested) |
314dedf7c6 |
2026-05-18 | (in flight) |
b79e4b1e68 |
2026-05-18 | FAIL ❌ (first bad upper bound) |
d90bc65e30 |
2026-05-19 | FAIL |
Bisect result:
|
| SHA | Date | Subject | Verdict |
|---|---|---|---|
ba214ef3d3 |
2026-05-14 | tag-gated nightly migration — 40 whole-file moves | PASS |
229cadec04 |
2026-05-16 | logging update for inplace setting in MoE layer | PASS |
c58b47bc86 |
2026-05-18 | PoolStats dataclass move | PASS |
f04c522534 |
2026-05-18 | [PD] Add conclude_state to fake KV backend | PASS |
f5049709b3 |
2026-05-18 | eagle3 aux-layer-ids +1 offset fix | PASS |
878e6b8886 |
2026-05-18 | [SP] Fix runtime_max_tokens_per_rank | PASS |
314dedf7c6 |
2026-05-18 | Use SGLANG_CACHE_DIR env for gpu_p2p_access_cache path | PASS ✅ (last good) |
b79e4b1e68 |
2026-05-18 | [Fix] Try to fix error caused by latest cutedsl packages (#25690) | FAIL ❌ (first bad) |
d90bc65e30 |
2026-05-19 | [NPU] Fix TypeError in MLA index_head_dim |
FAIL |
| current HEAD | 2026-05-19 | (Tom's chain + a handful of unrelated) | FAIL |
Offending change
- PR: [Fix] Try to fix error caused by latest cutedsl packages #25690 — [Fix] Try to fix error caused by latest cutedsl packages
- Author: @Fridge003 (Co-authored-by @hnyls2002)
- Merged: 2026-05-18 23:51 UTC
- Diff: 21 +, 4 -. Touches
python/pyproject.toml(switchesflashinfer_pythonandnvidia-cutlass-dslto the[cu13]extras variant) andscripts/ci/cuda/ci_install_dependency.sh(regex-update for[extras]notation + newpurge_cutlass_libs_base()step that uninstallsnvidia-cutlass-dsl-libs-basethen force-reinstallsnvidia-cutlass-dsl-libs-cu13).
The PR's own commit message explains the original bug it was fixing:
nvidia-cutlass-dsl[cu13] extras are additive on PyPI: requires_dist always pulls -libs-base AND -libs-cu13 when [cu13] is requested. Both wheels write to the same site-packages paths with different content, leaving the wrapper (cutlass.py, cu13 style) mismatched with the binding (_gpu_ops_gen.py, base style) -> GPUModuleOp signature TypeError.
The fix correctly purges -libs-base in the install script, but the LoRA Qwen3-8B forward path with CUDA graph capture now hits a kernel-side illegal address — so either the cu13 wheel's compiled kernel is broken for this path, or the purge_cutlass_libs_base step doesn't actually win in all install orderings.
Failure fingerprint (every FAIL probe + current HEAD)
coredump: Detected an exception of type CUDBG_EXCEPTION_WARP_ILLEGAL_ADDRESS (14)
Fatal Python error: Aborted
RuntimeError: Rank 0 scheduler died during initialization (exit code: -6).
Python call stack at the abort thread:
File ".../python/sglang/srt/layers/quantization/unquant.py", line 161 in apply
File ".../python/sglang/srt/lora/layers.py", line 724 in forward
...
File ".../python/sglang/srt/model_executor/cuda_graph_runner.py", line 1112 in run_once
File ".../python/sglang/srt/model_executor/cuda_graph_runner.py", line 1134 in capture_one_batch_size
File ".../python/sglang/srt/model_executor/cuda_graph_runner.py", line 707 in __init__
File ".../python/sglang/srt/model_executor/model_runner.py", line 2776 in init_device_graphs
Reproduce
# Probe latest good (PASS):
git push upstream 314dedf7c6:refs/heads/tmp-good
gh workflow run rerun-test.yml --repo sgl-project/sglang --ref tmp-good \
-f mode=cuda -f test_command="registered/lora/test_lora_qwen3_8b_logprob_diff.py" \
-f runs_on="1-gpu-h100" -f install_script="scripts/ci/cuda/ci_install_dependency.sh"
# Probe first bad (FAIL):
git push upstream b79e4b1e68:refs/heads/tmp-bad
gh workflow run rerun-test.yml --repo sgl-project/sglang --ref tmp-bad \
-f mode=cuda -f test_command="registered/lora/test_lora_qwen3_8b_logprob_diff.py" \
-f runs_on="1-gpu-h100" -f install_script="scripts/ci/cuda/ci_install_dependency.sh"cc @Fridge003 @hnyls2002 — could you take a look? This regression has been on main since 2026-05-18 and is currently surfacing as extra-a-test-1-gpu-large (0) on the main-CI sandbox.
Diagnostic revert PR opened for verification: #25743 — /rerun-test of the failing LoRA file is pending there.
Bisect confirmed via paired diagnostic PRsTwo sibling PRs were opened to nail down
Together with the per-commit bisect probes above, that's three independent lines of evidence:
The regression is unambiguously cc @Fridge003 @hnyls2002 — could you take a look? Closing the two diagnostic PRs now. |
Flake confirmed:
|
Non-CUDA lane:
|
✅ CUDA gate GREEN — main verification completeHead SHA The only CUDA red was a confirmed flake:
Remaining red lanes are non-gating (non-CUDA / chronic / cascade), none related to the landed chain:
Conclusion: the KV-canary feature, landed on |
c6e27e0 to
96c5c6e
Compare
Round status (head
Remaining ~95 jobs still running; will batch any reruns after the round lands. |
Other reds this round: AMD lane (27 jobs — ongoing repo-wide AMD outage), NPU a2 (recurring perf flake), XPU (chronic runner infra). None CUDA, none code-related. Plan: wait for the ~13 still-running jobs to land, then |
Round summary (running=0): CUDA reds = this + Next: one batched |
|
/rerun-failed-ci |
96c5c6e to
ffbe2e8
Compare
stage-a-test-1-gpu-xpu / finish (job): runner-level infra failure during workspace cleanup, before any test ran: Classification: infra (self-hosted XPU runner permission residue), non-CUDA lane, unrelated to main's code. Not chasing per babysit policy; CUDA lanes remain the hard gate. |
Non-CUDA failures (not chasing per babysit policy — none CUDA, none related to the merged code):
The merged PRs touch no XPU/NPU/AMD/Xeon code, no sampling backends, no quantization or fused-residual kernels. Continuing to watch CUDA lanes (the hard gate) to completion. |
CUDA failure:
|
| Branch | Run | test_mimo_v2 | Fingerprint |
|---|---|---|---|
sandbox (main 0a190d1c9 + sentinel) |
27088945685 | ✗ FAIL | VocabParallelEmbedding input id out of range |
main scheduled (a07d813ec, independent runner) |
27091400009 | ✗ FAIL | byte-identical |
main pre-#27445/#27446 (a39c428d3) |
27093698014 (probe dispatched) | pending | — |
Classification
Pre-existing main regression, deterministic (2/2 independent runs), unrelated to #27445/#27446: the merged PRs touch only scripted-runtime test harness files and PP idle-gating in is_fully_idle (short-circuited at pp_size==1; this server is pp=1). The failing path is the model-side out-of-range-token-id async assert (same family as the tp=1 fix in #27482) on MiMo-V2.5's first warmup forward. Will report the pre-merge probe result when it completes.
ffbe2e8 to
d164810
Compare
CI triage — head
|
CI triage round 2 — head
|
Round 3 — head
|
Round 4 — head
|
Round 1 complete — head
|
| Lane / test | Failure | Path |
|---|---|---|
base-c-b200 (3) · test_gpt_oss_4gpu_mxfp4.py |
RuntimeError: shape '[4096, 3072]' is invalid for input of size 11796480 (gpt_oss.py:320) |
core gpt-oss MXFP4 |
extra-b-b200 (0) · test_lora_gpt_oss_20b_logprob_diff.py |
AttributeError: 'FusedMoEWithLoRA' object has no attribute 'hidden_size' (gpt_oss.py:294) |
gpt-oss + LoRA |
Both reported on #27063. They will re-fail on rerun (deterministic).
🟡 Transient install 504 — 11 CUDA jobs whose tests never ran: extra-a, base-b-small (0/4/5/7), base-b-large (8), base-b-b200 (1), base-c-b200 (1), base-c-h100 (0), base-c-gb300 (0), extra-b-b200 (1). Same Failed to download sglang-kernel==0.4.3+cu130 → 504 across all; 107 other jobs pulled the same wheel fine, so the CDN incident is over. Re-running these to actually exercise their suites.
⚪ h20 ignored. 🔵 Non-CUDA (not ours): Xeon req_lens int64, XPU exit-137, NPU container, AMD mi325 / mi35x / mi35x-disagg. ⚫ Cancelled (not failures): base-c-h100 (4), notify-pr-states.
Triggering /rerun-failed-ci next. After the rerun settles I'll confirm the 504 suites pass, leave the two #27063 regressions as the standing verdict, and close this sandbox PR (do not merge).
|
/rerun-failed-ci |
Final verdict — head
|
| Lane / test | Failure | Confirmed |
|---|---|---|
base-c-b200 (3) · test_gpt_oss_4gpu_mxfp4.py |
RuntimeError: shape '[4096, 3072]' is invalid for input of size 11796480 (gpt_oss.py:320) — core gpt-oss MXFP4 path |
failed 2/2 runs |
extra-b-b200 (0) · test_lora_gpt_oss_20b_logprob_diff.py |
AttributeError: 'FusedMoEWithLoRA' object has no attribute 'hidden_size' (gpt_oss.py:294) — gpt-oss + LoRA |
failed 2/2 runs |
Both reported on #27063. Root cause: #27063's hidden_dim_unpadded = self.experts.hidden_size assumes that attribute is the experts' unpadded output width, but it's the padded width for the B200 MXFP4 FusedMoE (3072 vs 2880), and the LoRA wrapper (FusedMoEWithLoRA) doesn't expose it at all. pr-test-finish (Base) and pr-test-extra-finish are red only because of these two.
✅ Transient infra, fully resolved: 11 CUDA jobs initially died at install with Failed to download sglang-kernel==0.4.3+cu130 → 504 Gateway Timeout (GitHub-releases CDN incident). All 11 passed on rerun (including base-c-h20), confirming their suites are green on main and the 504 was purely transient.
🔵 Non-CUDA lanes (pre-existing, not gpt-oss-related, not chased):
PR Test (Xeon)base-b-test-cpu:RuntimeError: decode: expect req_lens to be int64, got Int(test_external_models.py).PR Test (NPU)stage-b-test-1-npu-a2: self-hosted-runner/container failure.PR Test (AMD)stage-c mi325+stage-c mi35x+stage-b mi35x-disaggregation(+ AMD/NPU finish aggregates).
Net: main CUDA is green except the 2 #27063 B200 regressions. Closing this sandbox PR — not merged.
d164810 to
03cde0e
Compare
Summary
Sandbox PR — do not merge. Touches
python/sglang/version.pywith a no-op comment so paths-filter flipsmain_package=trueand the full PR Test Base + PR Test Extra matrix dispatches.Carries three labels so the workflow gates all pass:
run-cipr-gate.yml'srequire-run-cigaterun-ci-extrapr-test-extra.ymlto run on thispull_requesteventbypass-fastfailcheck-pr-test-healthaction no-op (no cascade fast-fail when a single sibling fails on infra flake)Purpose: verify upstream/main (
f04c522534) is green end-to-end with the full CI surface (base stages + extra stages, no fast-fail cascade). This is the PR-side equivalent of the dispatched main CI; cleaner thangh workflow runbecause the dispatch interface cannot passskip_pr_test_health_check.Close this PR after the run completes — no source change is intended to land.
Test plan
pre-commit run --files python/sglang/version.pycheck-pr-test-healthcascade failuresCI States
Latest PR Test (Base): ⏳ Run #27205975942
Latest PR Test (Extra): ✅ Run #27205975115