Revert #23533 (Hy3 preview) + re-enable test_nvidia_nemotron_3_nano by alisonshao · Pull Request #23758 · sgl-project/sglang

alisonshao · 2026-04-26T08:28:09Z

Reverts #23533 and re-enables test_nvidia_nemotron_3_nano, which #23720 disabled as a stop-gap when scheduled pr-test started failing.

Bisected the failure (Fatal Python error: Aborted from piecewise_cuda_graph_runner.py:794 during FP8 nemotron decode, surfaced as Triton Error [CUDA]: an illegal memory access in _static_quant_fp8) on a 2x H200 against TestNvidiaNemotron3Nano30BFP8.test_lm_eval. First-bad commit is 6d0386147 (#23533); the parent (6344b546c) ran gsm8k=0.850 cleanly. Failure example: https://github.com/sgl-project/sglang/actions/runs/24936337295/job/73022450777.

#23533 added a new grouped_topk_single_group_kernel and wired it in for any single-group MoE with ≤512 experts and topk≤8 (python/sglang/srt/layers/moe/topk.py). Nemotron-3-Nano-A3B falls into that gate. The kernel corrupts CUDA state, and the next sync point — _static_quant_fp8 in the FP8 path — surfaces the illegal access. The reason #23533's own CI was green is that its branch predated #22218 (Breakable Piecewise Cuda Graph) — each PR works alone, the combination on main does not.

Conflicts during revert:

python/sglang/srt/models/hunyuan_v3.py: deleted (Apply should_use_dp_reduce_scatterv guard to remaining MoE models (follow-up to #23731) #23732 added a one-line guard here; the file only existed because of support Hy3 preview #23533).
docs/basic_usage/hy3_preview.md: left as an orphan (pre-commit blocks deletions in legacy docs/).
topk.py, server_args.py: auto-merged.

Reland #23533 once the new kernel is audited against the breakable-PCG runner.

Test plan

stage-b-test-2-gpu-large (2) runs the re-enabled test and reports gsm8k≈0.85.
Other partitions stay green; no Hy3 imports remain (grepped hunyuan_v3, hunyuan_v3_nextn, grouped_topk, hunyuan_detector).

This reverts commit 6d03861.

Removes the `disabled="Temporarily disabled; failing on main."` flag added in #23720. The underlying crash is gone now that #23533 has been reverted in the previous commit, so the test should once again run as part of stage-b-test-2-gpu-large.

gemini-code-assist

Code Review

This pull request removes the implementation and support for the Hunyuan-V3 (HYV3) model architecture, including its specialized MoE routing kernels, function call detectors, reasoning parsers, and model configurations. Additionally, it re-enables a test for the Nemotron-3-Nano model that was previously disabled. I have no feedback to provide as there are no review comments to assess.

alisonshao · 2026-04-26T08:37:59Z

/rerun-test test_nvidia_nemotron_3_nano.py

github-actions · 2026-04-26T08:38:25Z

✅ 2-gpu-h100 (1 test): View workflow run

cd test/ && python3 registered/models/test_nvidia_nemotron_3_nano.py

…n test The Phase-3 renormalize block in `grouped_topk_single_group_kernel` called `warp_sum_f32` (which uses `__shfl_xor_sync(0xffffffff, ...)`) from inside `if (lane_id < topk)`. With `topk` < 32 (e.g. nemotron-3-nano: topk=6), only lanes 0..topk-1 reached the intrinsic, but the mask 0xffffffff named all 32 lanes. CUDA spec: every lane named in the mask must execute the intrinsic at the same site, otherwise the result is undefined. Empirically the UB returned values from the absent lanes' registers, producing wrong renormalized weights — 2 of 6 weights per token were unnormalized (~1.5x too large). The wrong values were tolerated in eager inference, but under piecewise CUDA graph replay they cascaded into a downstream OOB that surfaced as IMA at `piecewise_cuda_graph_runner.py:794` on `TestNvidiaNemotron3Nano30BFP8.test_lm_eval`. Fix: move the warp_sum out of the divergent `if`, have all 32 lanes participate, with inactive lanes contributing the additive identity (0). Output writes remain gated by `if (lane_id < topk)`. Validated: - Unit sweep across E in {16..512}, K in {1..8}, N in {1..128}: matches reference biased_grouped_topk_impl with max diff < 1e-7. - 2x H200 e2e: TestNvidiaNemotron3Nano30BFP8.test_lm_eval passes (gsm8k strict=0.839, flexible=0.542, both within rtol=0.08). - Buggy kernel + eager (no graphs) also passes — confirming the kernel itself doesn't fault, only the cascade-under-graph-replay does. This is the surgical alternative to #23758, which reverts the entire #23533 (~4000 lines). The model code, tool/reasoning parsers, and tuned MoE configs from #23533 are not part of the bug. Also re-enables `test_nvidia_nemotron_3_nano` (the stop-gap disable was added in #23720 when this IMA started showing up). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alisonshao added 2 commits April 26, 2026 01:24

Revert "support Hy3 preview (#23533)"

5b19a9f

This reverts commit 6d03861.

alisonshao requested review from BBuf, CatherineSue, DarkSharpness, Edwardf0t1, Fridge003, HaiShaw, HydraQYH, JustinTong0323, Ying1123, celve, ch-wan, ispobock, merrymercy, slin1237 and yuan-luo as code owners April 26, 2026 08:28

github-actions Bot added the jit-kernel label Apr 26, 2026

gemini-code-assist Bot reviewed Apr 26, 2026

View reviewed changes

Kangyan-Zhou closed this Apr 26, 2026

Kangyan-Zhou mentioned this pull request Apr 26, 2026

[MoE] Fix warp-shfl UB in grouped_topk renormalize #23774

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert #23533 (Hy3 preview) + re-enable test_nvidia_nemotron_3_nano#23758

Revert #23533 (Hy3 preview) + re-enable test_nvidia_nemotron_3_nano#23758
alisonshao wants to merge 2 commits intomainfrom
revert-hy3-preview-reenable-nemotron

alisonshao commented Apr 26, 2026 •

edited

Loading

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

alisonshao commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alisonshao commented Apr 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test plan

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

alisonshao commented Apr 26, 2026

Uh oh!

github-actions Bot commented Apr 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alisonshao commented Apr 26, 2026 •

edited

Loading