Skip to content

[MUSA][1/N] sglang.check_env#16959

Merged
Kangyan-Zhou merged 1 commit intosgl-project:mainfrom
yeahdongcn:xd/musa_check_env
Jan 23, 2026
Merged

[MUSA][1/N] sglang.check_env#16959
Kangyan-Zhou merged 1 commit intosgl-project:mainfrom
yeahdongcn:xd/musa_check_env

Conversation

@yeahdongcn
Copy link
Copy Markdown
Collaborator

@yeahdongcn yeahdongcn commented Jan 12, 2026

Motivation

This PR is the first in a series of pull requests (tracked in #16565) to add full support for Moore Threads GPUs, leveraging MUSA (Meta-computing Unified System Architecture) to accelerate LLM inference.

Modifications

  1. Added is_musa to check the basic runtime environment
  2. Updated check_env.py to fetch the device info, driver version, topology, etc.
  3. bidict is added in both pyproject_other.toml and pyproject.toml for futher handling cuda_wrapper.py and pynccl_wrapper.py

Testing Done

Tested in a clean torch_musa container.

root@worker3218:/ws# rm -f python/pyproject.toml && mv python/pyproject_other.toml python/pyproject.toml
root@worker3218:/ws# pip install -e "python[all_musa]"
root@worker3218:/ws# python3 -m sglang.check_env
Python: 3.10.12 (main, Nov  4 2025, 08:48:33) [GCC 11.4.0]
MUSA available: True
GPU 0,1,2,3,4,5,6,7: MTT S5000
GPU 0,1,2,3,4,5,6,7 Compute Capability: 3.1
MUSA_HOME: /usr/local/musa
MCC: mcc version 4.3.4
MUSA Driver Version: 3.3.3-server
PyTorch: 2.7.1
sglang: 0.1.dev8833+g2dadf6356.d20260112
sgl_kernel: Module Not Found
flashinfer_python: Module Not Found
flashinfer_cubin: Module Not Found
flashinfer_jit_cache: Module Not Found
triton: 3.1.0
transformers: 4.57.1
torchao: 0.9.0
numpy: 1.26.4
aiohttp: 3.13.3
fastapi: 0.123.5
hf_transfer: 0.1.9
huggingface_hub: 0.36.0
interegular: 0.3.3
modelscope: 1.33.0
orjson: 3.11.5
outlines: 0.1.11
packaging: 25.0
psutil: 7.2.1
pydantic: 2.12.5
python-multipart: 0.0.21
pyzmq: 27.1.0
uvicorn: 0.38.0
uvloop: 0.22.1
vllm: Module Not Found
xgrammar: 0.1.27
openai: 2.6.1
tiktoken: 0.12.0
anthropic: 0.75.0
litellm: Module Not Found
decord2: 3.0.0
MTHREADS Topology: 
         GPU0     GPU1     GPU2     GPU3     GPU4     GPU5     GPU6     GPU7     NIC0     NIC1     NIC2     NIC3     NIC4     NIC5     NIC6     NIC7     NIC8     NIC9     NIC10    CPU Affinity   NUMA Affinity  
GPU0     X        MT2      MT2      MT2      MT2      MT2      MT2      MT2      MPB      MPB      NODE     NODE     SYS      SYS      SYS      SYS      SYS      SYS      NODE     0-31,64-95     0              
GPU1     MT2      X        MT2      MT2      MT2      MT2      MT2      MT2      NODE     NODE     NODE     NODE     SYS      SYS      SYS      SYS      SYS      SYS      NODE     0-31,64-95     0              
GPU2     MT2      MT2      X        MT2      MT2      MT2      MT2      MT2      NODE     NODE     MPB      MPB      SYS      SYS      SYS      SYS      SYS      SYS      NODE     0-31,64-95     0              
GPU3     MT2      MT2      MT2      X        MT2      MT2      MT2      MT2      NODE     NODE     NODE     NODE     SYS      SYS      SYS      SYS      SYS      SYS      NODE     0-31,64-95     0              
GPU4     MT2      MT2      MT2      MT2      X        MT2      MT2      MT2      SYS      SYS      SYS      SYS      NODE     NODE     MPB      MPB      NODE     NODE     SYS      32-63,96-127   1              
GPU5     MT2      MT2      MT2      MT2      MT2      X        MT2      MT2      SYS      SYS      SYS      SYS      NODE     NODE     NODE     NODE     NODE     NODE     SYS      32-63,96-127   1              
GPU6     MT2      MT2      MT2      MT2      MT2      MT2      X        MT2      SYS      SYS      SYS      SYS      NODE     NODE     NODE     NODE     MPB      MPB      SYS      32-63,96-127   1              
GPU7     MT2      MT2      MT2      MT2      MT2      MT2      MT2      X        SYS      SYS      SYS      SYS      NODE     NODE     NODE     NODE     NODE     NODE     SYS      32-63,96-127   1              
NIC0     MPB      NODE     NODE     NODE     SYS      SYS      SYS      SYS      X        SPB      NODE     NODE     SYS      SYS      SYS      SYS      SYS      SYS      NODE     
NIC1     MPB      NODE     NODE     NODE     SYS      SYS      SYS      SYS      SPB      X        NODE     NODE     SYS      SYS      SYS      SYS      SYS      SYS      NODE     
NIC2     NODE     NODE     MPB      NODE     SYS      SYS      SYS      SYS      NODE     NODE     X        SPB      SYS      SYS      SYS      SYS      SYS      SYS      NODE     
NIC3     NODE     NODE     MPB      NODE     SYS      SYS      SYS      SYS      NODE     NODE     SPB      X        SYS      SYS      SYS      SYS      SYS      SYS      NODE     
NIC4     SYS      SYS      SYS      SYS      NODE     NODE     NODE     NODE     SYS      SYS      SYS      SYS      X        SPB      NODE     NODE     NODE     NODE     SYS      
NIC5     SYS      SYS      SYS      SYS      NODE     NODE     NODE     NODE     SYS      SYS      SYS      SYS      SPB      X        NODE     NODE     NODE     NODE     SYS      
NIC6     SYS      SYS      SYS      SYS      MPB      NODE     NODE     NODE     SYS      SYS      SYS      SYS      NODE     NODE     X        SPB      NODE     NODE     SYS      
NIC7     SYS      SYS      SYS      SYS      MPB      NODE     NODE     NODE     SYS      SYS      SYS      SYS      NODE     NODE     SPB      X        NODE     NODE     SYS      
NIC8     SYS      SYS      SYS      SYS      NODE     NODE     MPB      NODE     SYS      SYS      SYS      SYS      NODE     NODE     NODE     NODE     X        SPB      SYS      
NIC9     SYS      SYS      SYS      SYS      NODE     NODE     MPB      NODE     SYS      SYS      SYS      SYS      NODE     NODE     NODE     NODE     SPB      X        SYS      
NIC10    NODE     NODE     NODE     NODE     SYS      SYS      SYS      SYS      NODE     NODE     NODE     NODE     SYS      SYS      SYS      SYS      SYS      SYS      X        

Legend:
    X = Self
  SYS = Topology path that contains PCIe switches/bridges as well as multiple host bridges across NUMA nodes.
 NODE = Topology path that contains PCIe switches/bridges as well as multiple host bridges within a NUMA node.
  HPB = Topology path that contains PCIe switches/bridges as well as a single host bridge.
  MPB = Topology path that contains multiple PCIe switches/bridges (but no host bridge).
  SPB = Topology path that contains at most one PCIe switch/bridge.
  INT = Topology path that is created internally, for example 2 devices on a single S2000 card.
  MTx = Topology path that is a bonded set of x MTLinks.

NIC Legend:
  NIC0: mlx5_0
  NIC1: mlx5_1
  NIC2: mlx5_2
  NIC3: mlx5_3
  NIC4: mlx5_4
  NIC5: mlx5_5
  NIC6: mlx5_6
  NIC7: mlx5_7
  NIC8: mlx5_8
  NIC9: mlx5_9
 NIC10: mlx5_bond_0


ulimit soft: 1048576
root@worker3218:/ws#

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@github-actions github-actions Bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file labels Jan 12, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello @yeahdongcn, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request lays the groundwork for integrating Moore Threads GPUs into the SGLang ecosystem by establishing robust environment detection and dependency management for MUSA. It ensures that SGLang can correctly identify and configure itself for MUSA-enabled hardware, which is a critical prerequisite for future work on accelerating large language model inference on these new GPU architectures. The changes are focused on system-level compatibility rather than direct performance optimizations at this stage.

Highlights

  • MUSA GPU Support Initialization: This pull request introduces the foundational elements for supporting Moore Threads (MUSA) GPUs within SGLang, marking the first step in a series to enable LLM inference acceleration on this new hardware platform.
  • Enhanced Environment Checking: The sglang.check_env utility has been significantly updated to detect and report detailed MUSA environment information, including GPU device specifics, MUSA_HOME path, MCC version, MUSA driver version, and a comprehensive GPU topology.
  • New MUSA Detection Utility: A new is_musa() function has been added to sglang.srt.utils.common.py to provide a programmatic way to determine if a MUSA environment is active and available.
  • Dependency Updates for MUSA: The pyproject.toml files have been modified to include bidict and a new srt_musa dependency group, which specifies essential packages like torch_musa and mthreads-ml-py required for MUSA compatibility.
  • MUSA Installation Documentation: New documentation has been created at docs/platforms/mthreads_gpu.md to guide users through the process of installing SGLang from source with MUSA support.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for Moore Threads GPUs (MUSA) by adding environment checking capabilities and updating dependency configurations. The changes are well-structured and follow the existing pattern for platform support.

My review includes suggestions to improve security by avoiding shell=True in subprocess calls, enhance code clarity by removing unused variables and redundant checks, and fix a potential UnboundLocalError in the environment check script. I've also suggested an improvement to the new documentation to make it clearer for users.

Comment thread python/sglang/check_env.py
Comment thread python/sglang/check_env.py
Comment thread docs/platforms/mthreads_gpu.md Outdated
Comment thread python/sglang/check_env.py Outdated
Comment thread python/sglang/check_env.py
@ispobock
Copy link
Copy Markdown
Collaborator

/tag-and-rerun-ci

@yeahdongcn
Copy link
Copy Markdown
Collaborator Author

@ispobock Thanks for reviewing this! I noticed there are 7 failing cases. After checking the logs, the failures are mainly due to OOM, timeouts, and connection issues, which don’t appear to be related to this PR.

@yeahdongcn
Copy link
Copy Markdown
Collaborator Author

Rebased onto upstream/main to resolve conflicts.

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
@yeahdongcn
Copy link
Copy Markdown
Collaborator Author

Rebased onto upstream/main.

@Kangyan-Zhou Kangyan-Zhou merged commit a77729a into sgl-project:main Jan 23, 2026
105 of 109 checks passed
Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation mthreads run-ci

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants