Skip to content

[MUSA][2/N] sgl-kernel build#17053

Merged
Kangyan-Zhou merged 1 commit intosgl-project:mainfrom
yeahdongcn:xd/musa_sgl_kernel
Jan 23, 2026
Merged

[MUSA][2/N] sgl-kernel build#17053
Kangyan-Zhou merged 1 commit intosgl-project:mainfrom
yeahdongcn:xd/musa_sgl_kernel

Conversation

@yeahdongcn
Copy link
Copy Markdown
Collaborator

Motivation

This PR is the second in a series of pull requests (tracked in #16565) to add full support for Moore Threads GPUs, leveraging MUSA (Meta-computing Unified System Architecture) to accelerate LLM inference.

Modifications

Following the AMD approach, we add a small set of MUSA-specific files:

  1. pyproject_musa.toml: used later during the Docker build.
  2. setup_musa.py: builds the MUSA extension.
  3. common_extension_musa.cc: provides Python bindings for the C++ sources.

Testing Done

Tested in a clean torch_musa container:

root@worker3218:/ws/sgl-kernel# python setup_musa.py install
2026-01-14 10:33:33 | dist | 140527655466112 | INFO : running install
2026-01-14 10:33:33 | warnings | 140527655466112 | WARNING : /usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: SetuptoolsDeprecationWarning: setup.py install is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` directly.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://blog.ganssle.io/articles/2021/10/setup-py-deprecated.html for details.
        ********************************************************************************

!!
  self.initialize_options()

2026-01-14 10:33:33 | warnings | 140527655466112 | WARNING : /usr/local/lib/python3.10/dist-packages/setuptools/_distutils/cmd.py:66: EasyInstallDeprecationWarning: easy_install command is deprecated.
!!

        ********************************************************************************
        Please avoid running ``setup.py`` and ``easy_install``.
        Instead, use pypa/build, pypa/installer or other
        standards-based tools.

        See https://github.com/pypa/setuptools/issues/917 for details.
        ********************************************************************************

!!
  self.initialize_options()

2026-01-14 10:33:33 | dist | 140527655466112 | INFO : running bdist_egg
2026-01-14 10:33:33 | dist | 140527655466112 | INFO : running egg_info
2026-01-14 10:33:33 | egg_info | 140527655466112 | INFO : writing python/sgl_kernel.egg-info/PKG-INFO
2026-01-14 10:33:33 | egg_info | 140527655466112 | INFO : writing dependency_links to python/sgl_kernel.egg-info/dependency_links.txt
2026-01-14 10:33:33 | egg_info | 140527655466112 | INFO : writing top-level names to python/sgl_kernel.egg-info/top_level.txt
2026-01-14 10:33:33 | egg_info | 140527655466112 | INFO : adding license file 'LICENSE'
2026-01-14 10:33:33 | util | 140527655466112 | INFO : writing manifest file 'python/sgl_kernel.egg-info/SOURCES.txt'
2026-01-14 10:33:33 | bdist_egg | 140527655466112 | INFO : installing library code to build/bdist.linux-x86_64/egg
2026-01-14 10:33:33 | dist | 140527655466112 | INFO : running install_lib
2026-01-14 10:33:33 | dist | 140527655466112 | INFO : running build_py
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/attention.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/spatial.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/gemm.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/__init__.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/sampling.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/sparse_flash_attn.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/cutlass_moe.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/test_utils.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/memory.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/fused_moe.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/elementwise.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/flash_mla.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/_fa4_interface.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/expert_specialization.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/marlin.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/flash_attn.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/utils.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/kvcacheio.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/hadamard.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/version.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/mamba.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/top_k.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/moe.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/allreduce.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/load_utils.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/grammar.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/speculative.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/scalar_type.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/testing/rotary_embedding.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel/testing
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/testing/__init__.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel/testing
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/quantization/__init__.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel/quantization
2026-01-14 10:33:33 | file_util | 140527655466112 | INFO : copying python/sgl_kernel/quantization/gguf.py -> build/lib.linux-x86_64-cpython-310/sgl_kernel/quantization
2026-01-14 10:33:33 | dist | 140527655466112 | INFO : running build_ext
Cloning third-party repositories...
Fetching origin
HEAD is now at 3abd6a72 update minimum compiler version
Fetching origin
HEAD is now at bc29697b ci: collect module status and update flashinfer-cli (#1676)
Third-party repositories ready.
Emitting ninja build file /ws/sgl-kernel/build/temp.linux-x86_64-cpython-310/build.ninja...
Compiling objects...
Using envvar MAX_JOBS (128) as the number of workers...
[1/4] /usr/local/musa/bin/mcc -MD -MF /ws/sgl-kernel/build/temp.linux-x86_64-cpython-310/ws/sgl-kernel/third_party/flashinfer/csrc_musa/norm.o.d -I/ws/sgl-kernel/include_musa -I/ws/sgl-kernel/include -I/ws/sgl-kernel/include/impl -I/ws/sgl-kernel/csrc_musa -I/ws/sgl-kernel/csrc -I/ws/sgl-kernel/third_party/flashinfer/include_musa -I/ws/sgl-kernel/third_party/flashinfer/include -I/ws/sgl-kernel/third_party/flashinfer/csrc_musa -I/ws/sgl-kernel/third_party/flashinfer/csrc -I/ws/sgl-kernel/third_party/mutlass/include_musa -I/ws/sgl-kernel/third_party/mutlass/include -I/usr/local/musa/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/aten/src -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/torch_musa_codegen -I/usr/local/lib/python3.10/dist-packages -I/usr/local/musa/include -I/usr/include/python3.10 -c -c /ws/sgl-kernel/third_party/flashinfer/csrc_musa/norm.mu -o /ws/sgl-kernel/build/temp.linux-x86_64-cpython-310/ws/sgl-kernel/third_party/flashinfer/csrc_musa/norm.o -fPIC -DNDEBUG -DOPERATOR_NAMESPACE=sgl_kernel -O3 -fPIC -std=c++17 --cuda-gpu-arch=mp_31 -x musa -mtgpu -Od3 -ffast-math -fmusa-flush-denormals-to-zero -fno-strict-aliasing -DUSE_MUSA -DENABLE_BF16 -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DENABLE_FP8 -DFLASHINFER_ENABLE_FP8 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 --offload-arch=mp_31 -march=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=common_ops -D_GLIBCXX_USE_CXX11_ABI=1
[2/4] /usr/local/musa/bin/mcc -MD -MF /ws/sgl-kernel/build/temp.linux-x86_64-cpython-310/ws/sgl-kernel/third_party/flashinfer/csrc_musa/renorm.o.d -I/ws/sgl-kernel/include_musa -I/ws/sgl-kernel/include -I/ws/sgl-kernel/include/impl -I/ws/sgl-kernel/csrc_musa -I/ws/sgl-kernel/csrc -I/ws/sgl-kernel/third_party/flashinfer/include_musa -I/ws/sgl-kernel/third_party/flashinfer/include -I/ws/sgl-kernel/third_party/flashinfer/csrc_musa -I/ws/sgl-kernel/third_party/flashinfer/csrc -I/ws/sgl-kernel/third_party/mutlass/include_musa -I/ws/sgl-kernel/third_party/mutlass/include -I/usr/local/musa/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/aten/src -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/torch_musa_codegen -I/usr/local/lib/python3.10/dist-packages -I/usr/local/musa/include -I/usr/include/python3.10 -c -c /ws/sgl-kernel/third_party/flashinfer/csrc_musa/renorm.mu -o /ws/sgl-kernel/build/temp.linux-x86_64-cpython-310/ws/sgl-kernel/third_party/flashinfer/csrc_musa/renorm.o -fPIC -DNDEBUG -DOPERATOR_NAMESPACE=sgl_kernel -O3 -fPIC -std=c++17 --cuda-gpu-arch=mp_31 -x musa -mtgpu -Od3 -ffast-math -fmusa-flush-denormals-to-zero -fno-strict-aliasing -DUSE_MUSA -DENABLE_BF16 -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DENABLE_FP8 -DFLASHINFER_ENABLE_FP8 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 --offload-arch=mp_31 -march=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=common_ops -D_GLIBCXX_USE_CXX11_ABI=1
[3/4] /usr/local/musa/bin/mcc -x musa -MMD -MF /ws/sgl-kernel/build/temp.linux-x86_64-cpython-310/ws/sgl-kernel/csrc_musa/common_extension_musa.o.d -I/ws/sgl-kernel/include_musa -I/ws/sgl-kernel/include -I/ws/sgl-kernel/include/impl -I/ws/sgl-kernel/csrc_musa -I/ws/sgl-kernel/csrc -I/ws/sgl-kernel/third_party/flashinfer/include_musa -I/ws/sgl-kernel/third_party/flashinfer/include -I/ws/sgl-kernel/third_party/flashinfer/csrc_musa -I/ws/sgl-kernel/third_party/flashinfer/csrc -I/ws/sgl-kernel/third_party/mutlass/include_musa -I/ws/sgl-kernel/third_party/mutlass/include -I/usr/local/musa/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/aten/src -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/torch_musa_codegen -I/usr/local/lib/python3.10/dist-packages -I/usr/local/musa/include -I/usr/include/python3.10 -c -c /ws/sgl-kernel/csrc_musa/common_extension_musa.cc -o /ws/sgl-kernel/build/temp.linux-x86_64-cpython-310/ws/sgl-kernel/csrc_musa/common_extension_musa.o -fPIC -DNDEBUG -DOPERATOR_NAMESPACE=sgl_kernel -O3 -fPIC -std=c++17 --cuda-gpu-arch=mp_31 -x musa -mtgpu -Od3 -ffast-math -fmusa-flush-denormals-to-zero -fno-strict-aliasing -DUSE_MUSA -DENABLE_BF16 -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DENABLE_FP8 -DFLASHINFER_ENABLE_FP8 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 --offload-arch=mp_31 -march=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=common_ops -D_GLIBCXX_USE_CXX11_ABI=1
[4/4] /usr/local/musa/bin/mcc -MD -MF /ws/sgl-kernel/build/temp.linux-x86_64-cpython-310/ws/sgl-kernel/third_party/flashinfer/csrc_musa/sampling.o.d -I/ws/sgl-kernel/include_musa -I/ws/sgl-kernel/include -I/ws/sgl-kernel/include/impl -I/ws/sgl-kernel/csrc_musa -I/ws/sgl-kernel/csrc -I/ws/sgl-kernel/third_party/flashinfer/include_musa -I/ws/sgl-kernel/third_party/flashinfer/include -I/ws/sgl-kernel/third_party/flashinfer/csrc_musa -I/ws/sgl-kernel/third_party/flashinfer/csrc -I/ws/sgl-kernel/third_party/mutlass/include_musa -I/ws/sgl-kernel/third_party/mutlass/include -I/usr/local/musa/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/aten/src -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/generated_cuda_compatible/include/torch/csrc/api/include -I/usr/local/lib/python3.10/dist-packages/torch_musa/share/torch_musa_codegen -I/usr/local/lib/python3.10/dist-packages -I/usr/local/musa/include -I/usr/include/python3.10 -c -c /ws/sgl-kernel/third_party/flashinfer/csrc_musa/sampling.mu -o /ws/sgl-kernel/build/temp.linux-x86_64-cpython-310/ws/sgl-kernel/third_party/flashinfer/csrc_musa/sampling.o -fPIC -DNDEBUG -DOPERATOR_NAMESPACE=sgl_kernel -O3 -fPIC -std=c++17 --cuda-gpu-arch=mp_31 -x musa -mtgpu -Od3 -ffast-math -fmusa-flush-denormals-to-zero -fno-strict-aliasing -DUSE_MUSA -DENABLE_BF16 -DFLASHINFER_ENABLE_F16 -DFLASHINFER_ENABLE_BF16 -DENABLE_FP8 -DFLASHINFER_ENABLE_FP8 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 --offload-arch=mp_31 -march=native -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1016"' -DTORCH_EXTENSION_NAME=common_ops -D_GLIBCXX_USE_CXX11_ABI=1
2026-01-14 10:35:12 | easy_install | 140527655466112 | INFO : Adding sgl-kernel 0.3.20 to easy-install.pth file
2026-01-14 10:35:12 | easy_install | 140527655466112 | INFO : 
Installed /usr/local/lib/python3.10/dist-packages/sgl_kernel-0.3.20-py3.10-linux-x86_64.egg
2026-01-14 10:35:12 | easy_install | 140527655466112 | INFO : Processing dependencies for sgl-kernel==0.3.20
2026-01-14 10:35:12 | easy_install | 140527655466112 | INFO : Finished processing dependencies for sgl-kernel==0.3.20
root@worker3218:/ws/sgl-kernel# 

Accuracy Tests

Benchmarking and Profiling

Checklist

Review Process

  1. Ping Merge Oncalls to start the PR flow. See the PR Merge Process.
  2. Get approvals from CODEOWNERS and other reviewers.
  3. Trigger CI tests with comments or contact authorized users to do so.
    • /tag-run-ci-label, /rerun-failed-ci, /tag-and-rerun-ci
  4. After green CI and required approvals, ask Merge Oncalls to merge.

@gemini-code-assist
Copy link
Copy Markdown
Contributor

Warning

You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again!

@github-actions github-actions Bot added documentation Improvements or additions to documentation dependencies Pull requests that update a dependency file sgl-kernel labels Jan 14, 2026
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
@sglang-bot
Copy link
Copy Markdown
Member

/tag-and-rerun-ci

@Kangyan-Zhou Kangyan-Zhou merged commit 628ab5d into sgl-project:main Jan 23, 2026
133 of 145 checks passed
Comment thread .gitignore
Johnsonms pushed a commit to Johnsonms/sglang that referenced this pull request Feb 14, 2026
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file documentation Improvements or additions to documentation mthreads run-ci sgl-kernel

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants