fix gpt-oss launch failure with piecewise cuda graph by zminglei · Pull Request #17532 · sgl-project/sglang

zminglei · 2026-01-21T22:26:52Z

Motivation

recent NPU support introduced a small bug which make the gpt-oss fail to launch with piecewise cuda graph.
This one line change is to fix the bug

Modifications

Accuracy Tests

python3 -m sglang.launch_server --model-path /shared/public/elr-models/openai/gpt-oss-120b-new/ --trust-remote-code --tp 4 --reasoning-parser gpt-oss --enable-piecewise-cuda-graph

Before:

    combine_input = self.run_moe_core(
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/moe/fused_moe_triton/layer.py", line 980, in run_moe_core
    return self.quant_method.apply(
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/quantization/mxfp4.py", line 716, in apply
    return self.runner.run(dispatch_output, quant_info)
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/moe/moe_runner/runner.py", line 78, in run
    return self.fused_func(dispatch_output, quant_info, self.config)
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/moe/moe_runner/triton.py", line 339, in fused_experts_none_to_triton
    output = fused_experts(
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 213, in fused_experts
    inplace_fused_experts(
  File "/home/jobuser/zminglei/sglang/venv/lib/python3.10/site-packages/torch/_ops.py", line 1255, in __call__
    return self._op(*args, **kwargs)
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 91, in inplace_fused_experts
    fused_experts_impl(
  File "/home/jobuser/zminglei/sglang/python/sglang/srt/layers/moe/fused_moe_triton/fused_moe.py", line 526, in fused_experts_impl
    raise ValueError(f"Unsupported activation: {activation=}, with {is_gated=}")
ValueError: Unsupported activation: activation='npu_swiglu_oai', with is_gated=True

After:

[2026-01-21 22:09:40] INFO:     Application startup complete.
[2026-01-21 22:09:40] INFO:     Uvicorn running on http://127.0.0.1:30000 (Press CTRL+C to quit)
[2026-01-21 22:09:41] INFO:     127.0.0.1:54606 - "GET /model_info HTTP/1.1" 200 OK
[2026-01-21 22:09:41 TP0] Prefill batch, #new-seq: 1, #new-token: 6, #cached-token: 0, full token usage: 0.00, swa token usage: 0.00, #running-req: 0, #queue-req: 0,
[2026-01-21 22:09:42] INFO:     127.0.0.1:54614 - "POST /generate HTTP/1.1" 200 OK
[2026-01-21 22:09:42] The server is fired up and ready to roll!

Benchmarking and Profiling

Checklist

Format your code according to the Format code with pre-commit.
Add unit tests according to the Run and add unit tests.
Update documentation according to Write documentations.
Provide accuracy and speed benchmark results according to Test the accuracy and Benchmark the speed.
Follow the SGLang code style guidance.

Review Process

Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
Get approvals from CODEOWNERS and other reviewers.
Trigger CI tests with comments or contact authorized users to do so.
- /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
After green CI and required approvals, ask Merge Oncalls to merge.

gemini-code-assist · 2026-01-21T22:26:56Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

gemini-code-assist · 2026-01-21T22:27:34Z

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

zminglei · 2026-01-21T22:27:39Z

/tag-and-rerun-ci retry again

zminglei · 2026-01-22T18:57:53Z

/rerun-stage stage-b-test-large-1-gpu

github-actions · 2026-01-22T18:58:15Z

✅ Triggered stage-b-test-large-1-gpu to run independently (skipping dependencies).

github-actions · 2026-01-22T18:58:21Z

🔗 View workflow run

fix gpt-oss launch failure with piecewise cuda graph

1d7096c

zminglei marked this pull request as ready for review January 21, 2026 22:27

github-actions Bot added the run-ci label Jan 21, 2026

Merge branch 'main' into fix-gpt-pcg

f1a442b

yuan-luo self-requested a review January 22, 2026 02:50

yuan-luo approved these changes Jan 22, 2026

View reviewed changes

hebiao064 approved these changes Jan 22, 2026

View reviewed changes

ispobock approved these changes Jan 23, 2026

View reviewed changes

hebiao064 merged commit 2b2f317 into sgl-project:main Jan 23, 2026
300 of 339 checks passed

caitengwei pushed a commit to caitengwei/sglang that referenced this pull request Jan 30, 2026

fix gpt-oss launch failure with piecewise cuda graph (sgl-project#17532)

a52c9c4

Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026

fix gpt-oss launch failure with piecewise cuda graph (sgl-project#17532)

bb7a341

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix gpt-oss launch failure with piecewise cuda graph#17532

fix gpt-oss launch failure with piecewise cuda graph#17532
hebiao064 merged 2 commits intosgl-project:mainfrom
zminglei:fix-gpt-pcg

zminglei commented Jan 21, 2026

Uh oh!

gemini-code-assist Bot commented Jan 21, 2026

Uh oh!

gemini-code-assist Bot commented Jan 21, 2026

Uh oh!

zminglei commented Jan 21, 2026 •

edited

Loading

Uh oh!

zminglei commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

zminglei commented Jan 21, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

Uh oh!

gemini-code-assist Bot commented Jan 21, 2026

Uh oh!

gemini-code-assist Bot commented Jan 21, 2026

Uh oh!

zminglei commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zminglei commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

github-actions Bot commented Jan 22, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

zminglei commented Jan 21, 2026 •

edited

Loading