Disable compiling arch below sm_90 in aarch64 by default#6380
Disable compiling arch below sm_90 in aarch64 by default#6380zhyncs merged 4 commits intosgl-project:mainfrom
Conversation
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a new CMake option ENABLE_BELOW_SM90 to control the compilation of CUDA architectures older than sm_90. This is primarily aimed at speeding up compilation times on aarch64 platforms by disabling these older architectures by default. The changes are well-implemented and the logic is clear.
Key changes include:
- Introduction of the
ENABLE_BELOW_SM90option, defaulting toONbut set toOFFforaarch64. - Conditional inclusion of
-gencodeflags for sm_7x and sm_8x architectures inSGL_KERNEL_CUDA_FLAGSandSGL_FLASH_KERNEL_CUDA_FLAGSbased on this new option. - Conditional inclusion of
FA3_SM8X_GEN_SRCS(sources for sm8x FlashAttention3) based on this option. - Conditional definition of
FLASHATTENTION_DISABLE_SM8xbased on this option, which is a good safeguard. - A nice refactoring of how compile definitions for
flash_opsare handled, improving clarity.
The motivation to improve build times on aarch64 is well-addressed. The code appears correct and robust for its intended purpose.
Summary of Findings
- New Configurable Build Option: A new CMake option
ENABLE_BELOW_SM90has been introduced. It defaults toON, but is set toOFFonaarch64systems to disable compilation for CUDA architectures below sm_90, thereby speeding up builds. - Conditional CUDA Architecture Compilation: The PR correctly modifies CMake logic to conditionally include CUDA gencode flags and relevant source files (for FlashAttention3 SM8x) based on the
ENABLE_BELOW_SM90option. - Improved Compile Definition Clarity: The management of compile definitions for the
flash_opstarget has been refactored using an intermediate variableFLASH_OPS_COMPILE_DEFS, which makes the logic for conditionally addingFLASHATTENTION_DISABLE_SM8xclearer and more maintainable. - Documentation for New Build Option: A suggestion was made to document the new
ENABLE_BELOW_SM90option, either via a comment in theCMakeLists.txtor in external build documentation, to guide users who might want to override its default behavior when compiling from source. (Severity: medium)
Merge Readiness
The pull request is well-implemented and achieves its stated goal of optimizing build times for aarch64 by default, while providing flexibility through the new ENABLE_BELOW_SM90 option. The code changes are clear, correct, and include a nice refactoring for compile definitions.
One medium severity suggestion has been made regarding the documentation of the new CMake option to assist users building from source. Addressing this would enhance the user experience.
Overall, the PR significantly improves the build process for aarch64 and is in good shape. As a reviewer, I am not authorized to approve pull requests. This PR should be reviewed and approved by authorized maintainers before merging.
| endif() | ||
|
|
||
| # Enable gencode below SM90 | ||
| option(ENABLE_BELOW_SM90 "Enable below SM90" ON) |
There was a problem hiding this comment.
This new option ENABLE_BELOW_SM90 is a useful addition for controlling build configurations. Its default behavior (ON, but OFF for aarch64) is clear from the subsequent logic (lines 89-92).
To further help users who compile from source and might want to customize this behavior (e.g., an aarch64 user needing older SM targets, or an x86_64 user wanting to disable them for faster builds), would it be beneficial to add a comment here or nearby, briefly mentioning that this option can be overridden via the CMake command line (e.g., cmake -DENABLE_BELOW_SM90=OFF ..)?
Alternatively, ensuring this option is documented in any relevant build-from-source guides would also be valuable for discoverability.
Motivation
For aarch64, it doesn't need to use arch like sm_7x or sm_8x in most cases, so we disable it by default to make the compile more fast.
Modifications
Checklist