Skip to content

fix gpt-oss launch failure with piecewise cuda graph#17532

Merged
hebiao064 merged 2 commits intosgl-project:mainfrom
zminglei:fix-gpt-pcg
Jan 23, 2026
Merged

fix gpt-oss launch failure with piecewise cuda graph#17532
hebiao064 merged 2 commits intosgl-project:mainfrom
zminglei:fix-gpt-pcg

Conversation

@zminglei
Copy link
Copy Markdown
Collaborator

Motivation

recent NPU support introduced a small bug which make the gpt-oss fail to launch with piecewise cuda graph.
This one line change is to fix the bug

Modifications

Accuracy Tests

python3 -m sglang.launch_server --model-path /shared/public/elr-models/openai/gpt-oss-120b-new/ --trust-remote-code --tp 4 --reasoning-parser gpt-oss --enable-piecewise-cuda-graph

Before:

    combine_input = self.run_moe_core(
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 980, in run_moe_core
    return self.quant_method.apply(
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/quantization/mxfp4.py", line 716, in apply
    return self.runner.run(dispatch_output, quant_info)
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/moe/moe_runner/runner.py", line 78, in run
    return self.fused_func(dispatch_output, quant_info, self.config)
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/moe/moe_runner/triton.py", line 339, in fused_experts_none_to_triton
    output = fused_experts(
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 213, in fused_experts
    inplace_fused_experts(
  File "/home/jobuser/zminglei/sglang/venv/lib/python3.10/site-packages/torch/_ops.py", line 1255, in __call__
    return self._op(*args, **kwargs)
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 91, in inplace_fused_experts
    fused_experts_impl(
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 526, in fused_experts_impl
    raise ValueError(f"Unsupported activation: {activation=}, with {is_gated=}")
ValueError: Unsupported activation: activation='npu_swiglu_oai', with is_gated=True

After:

[2026-01-21 22:09:40] INFO:     Application startup complete.
[2026-01-21 22:09:40] INFO:     Uvicorn running on http://127.0.0.1:30000 (Press CTRL+C to quit)
[2026-01-21 22:09:41] INFO:     127.0.0.1:54606 - "GET /model_info HTTP/1.1" 200 OK
[2026-01-21 22:09:41 TP0] Prefill batch, #new-seq: 1, #new-token: 6, #cached-token: 0, full token usage: 0.00, swa token usage: 0.00, #running-req: 0, #queue-req: 0,
[2026-01-21 22:09:42] INFO:     127.0.0.1:54614 - "POST /generate HTTP/1.1" 200 OK
[2026-01-21 22:09:42] The server is fired up and ready to roll!

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@zminglei zminglei marked this pull request as ready for review January 21, 2026 22:27
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@zminglei
Copy link
Copy Markdown
Collaborator Author

zminglei commented Jan 21, 2026

/tag-and-rerun-ci retry again

@yuan-luo yuan-luo self-requested a review January 22, 2026 02:50
@zminglei
Copy link
Copy Markdown
Collaborator Author

/rerun-stage stage-b-test-large-1-gpu

@github-actions
Copy link
Copy Markdown
Contributor

✅ Triggered stage-b-test-large-1-gpu to run independently (skipping dependencies).

@github-actions
Copy link
Copy Markdown
Contributor

🔗 View workflow run

@hebiao064 hebiao064 merged commit 2b2f317 into sgl-project:main Jan 23, 2026
300 of 339 checks passed
caitengwei pushed a commit to caitengwei/sglang that referenced this pull request Jan 30, 2026
Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants