Skip to content

Disable compiling arch below sm_90 in aarch64 by default#6380

Merged
zhyncs merged 4 commits intosgl-project:mainfrom
Qiaolin-Yu:kernel_build
May 27, 2025
Merged

Disable compiling arch below sm_90 in aarch64 by default#6380
zhyncs merged 4 commits intosgl-project:mainfrom
Qiaolin-Yu:kernel_build

Conversation

@Qiaolin-Yu
Copy link
Copy Markdown
Collaborator

@Qiaolin-Yu Qiaolin-Yu commented May 18, 2025

Motivation

For aarch64, it doesn't need to use arch like sm_7x or sm_8x in most cases, so we disable it by default to make the compile more fast.

Modifications

Checklist

Comment thread sgl-kernel/CMakeLists.txt Outdated
@zhyncs zhyncs self-assigned this May 18, 2025
@Qiaolin-Yu Qiaolin-Yu marked this pull request as ready for review May 27, 2025 21:59
@zhyncs
Copy link
Copy Markdown
Collaborator

zhyncs commented May 27, 2025

/gemini review

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new CMake option ENABLE_BELOW_SM90 to control the compilation of CUDA architectures older than sm_90. This is primarily aimed at speeding up compilation times on aarch64 platforms by disabling these older architectures by default. The changes are well-implemented and the logic is clear.

Key changes include:

  • Introduction of the ENABLE_BELOW_SM90 option, defaulting to ON but set to OFF for aarch64.
  • Conditional inclusion of -gencode flags for sm_7x and sm_8x architectures in SGL_KERNEL_CUDA_FLAGS and SGL_FLASH_KERNEL_CUDA_FLAGS based on this new option.
  • Conditional inclusion of FA3_SM8X_GEN_SRCS (sources for sm8x FlashAttention3) based on this option.
  • Conditional definition of FLASHATTENTION_DISABLE_SM8x based on this option, which is a good safeguard.
  • A nice refactoring of how compile definitions for flash_ops are handled, improving clarity.

The motivation to improve build times on aarch64 is well-addressed. The code appears correct and robust for its intended purpose.

Summary of Findings

  • New Configurable Build Option: A new CMake option ENABLE_BELOW_SM90 has been introduced. It defaults to ON, but is set to OFF on aarch64 systems to disable compilation for CUDA architectures below sm_90, thereby speeding up builds.
  • Conditional CUDA Architecture Compilation: The PR correctly modifies CMake logic to conditionally include CUDA gencode flags and relevant source files (for FlashAttention3 SM8x) based on the ENABLE_BELOW_SM90 option.
  • Improved Compile Definition Clarity: The management of compile definitions for the flash_ops target has been refactored using an intermediate variable FLASH_OPS_COMPILE_DEFS, which makes the logic for conditionally adding FLASHATTENTION_DISABLE_SM8x clearer and more maintainable.
  • Documentation for New Build Option: A suggestion was made to document the new ENABLE_BELOW_SM90 option, either via a comment in the CMakeLists.txt or in external build documentation, to guide users who might want to override its default behavior when compiling from source. (Severity: medium)

Merge Readiness

The pull request is well-implemented and achieves its stated goal of optimizing build times for aarch64 by default, while providing flexibility through the new ENABLE_BELOW_SM90 option. The code changes are clear, correct, and include a nice refactoring for compile definitions.

One medium severity suggestion has been made regarding the documentation of the new CMake option to assist users building from source. Addressing this would enhance the user experience.

Overall, the PR significantly improves the build process for aarch64 and is in good shape. As a reviewer, I am not authorized to approve pull requests. This PR should be reviewed and approved by authorized maintainers before merging.

Comment thread sgl-kernel/CMakeLists.txt
endif()

# Enable gencode below SM90
option(ENABLE_BELOW_SM90 "Enable below SM90" ON)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This new option ENABLE_BELOW_SM90 is a useful addition for controlling build configurations. Its default behavior (ON, but OFF for aarch64) is clear from the subsequent logic (lines 89-92).

To further help users who compile from source and might want to customize this behavior (e.g., an aarch64 user needing older SM targets, or an x86_64 user wanting to disable them for faster builds), would it be beneficial to add a comment here or nearby, briefly mentioning that this option can be overridden via the CMake command line (e.g., cmake -DENABLE_BELOW_SM90=OFF ..)?

Alternatively, ensuring this option is documented in any relevant build-from-source guides would also be valuable for discoverability.

@zhyncs zhyncs merged commit 0b9557f into sgl-project:main May 27, 2025
18 of 24 checks passed
Layssy pushed a commit to Layssy/sglang-iaas that referenced this pull request Jun 9, 2025
xwu-intel pushed a commit to xwu-intel/sglang that referenced this pull request Jun 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants