Skip to content

Support gpt oss moe lora#21375

Open
gongyisheng wants to merge 4 commits intosgl-project:mainfrom
gongyisheng:sglang-gpt-oss-moe-lora
Open

Support gpt oss moe lora#21375
gongyisheng wants to merge 4 commits intosgl-project:mainfrom
gongyisheng:sglang-gpt-oss-moe-lora

Conversation

@gongyisheng
Copy link
Copy Markdown
Contributor

@gongyisheng gongyisheng commented Mar 25, 2026

Motivation

Support MoE LoRA for gpt-oss in miles

Modifications

This PR includes two parts

  • adapt to gpt-oss specific weight name
    • python/sglang/srt/server_args.py: use triton as moe backend when use gpt-oss model + lora adapter
    • python/sglang/srt/lora/mem_pool.py: support gpt-oss moe arch, which do not have shared experts
    • python/sglang/srt/lora/utils.py: gpt-oss moe layer name
    • python/sglang/srt/utils/hf_transformers_utils.py: gpt-oss specific configs
  • fix for load_lora
    • python/sglang/srt/managers/tp_worker.py: support loading lora in tp environment
    • python/sglang/srt/managers/io_struct.py: type hint change, since serialized_tensors is already a list[str]

todo:

  • add unittest

Accuracy Tests

N/A

Benchmarking and Profiling

N/A

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@gongyisheng
Copy link
Copy Markdown
Contributor Author

@yushengsu-thu review request

@yushengsu-thu yushengsu-thu self-assigned this Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants