Skip to content

Add aiter bias moe support in gpt-oss mxfp4 model#17735

Merged
HaiShaw merged 4 commits intosgl-project:mainfrom
HaiShaw:gpt-oss-with-aiter-moe
Jan 29, 2026
Merged

Add aiter bias moe support in gpt-oss mxfp4 model#17735
HaiShaw merged 4 commits intosgl-project:mainfrom
HaiShaw:gpt-oss-with-aiter-moe

Conversation

@kkHuang-amd
Copy link
Copy Markdown
Collaborator

@kkHuang-amd kkHuang-amd commented Jan 26, 2026

Motivation

Optimized MoE performance in ROCm platform when running the gpt-oss mxfp4 model

Modifications

mxfp4.py => Add aiter path for weight processing and use aiter fusedMoe function to handle MoE op

Accuracy Tests

sglang# python3 benchmark/gsm8k/bench_sglang.py --num-questions 2000 --parallel 2000 --port 8000 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:44<00:00, 29.78it/s] Accuracy: 0.882 Invalid: 0.007 Latency: 44.526 s Output throughput: 9263.069 token/s

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@kkHuang-amd kkHuang-amd added run-ci amd aiter AI Tensor Engine ROCm labels Jan 26, 2026
@HaiShaw
Copy link
Copy Markdown
Collaborator

HaiShaw commented Jan 26, 2026

/tag-and-rerun-ci

@kkHuang-amd kkHuang-amd marked this pull request as ready for review January 29, 2026 02:36
@HaiShaw HaiShaw merged commit ef1c512 into sgl-project:main Jan 29, 2026
95 of 102 checks passed
charlesHsuGG pushed a commit to charlesHsuGG/sglang that referenced this pull request Jan 30, 2026
Chen-0210 pushed a commit to Chen-0210/sglang that referenced this pull request Jan 30, 2026
sfiisf pushed a commit to sfiisf/sglang that referenced this pull request Feb 5, 2026
Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

aiter AI Tensor Engine ROCm amd run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants