-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Documentation for FLASH_ATTENTION_SKIP_CUDA_BUILD is misleading and causes silent installation of broken packages. #17794
Description
Summary
The documentation for installing flash-attn (specifically the section on extra-build-variables) is misleading and can lead to a "silent failure" state where uv reports a successful installation, but the installed package is a "hollow" shell containing no compiled CUDA extensions.
The documentation states:
"The FLASH_ATTENTION_SKIP_CUDA_BUILD environment variable ensures that flash-attn is installed from a compatible, pre-built wheel..."
However, this variable only disables local compilation. If uv resolves to a version combination (e.g., latest Torch + Flash Attn) for which no official pre-built wheel exists, passing this variable causes setup.py to simply skip compilation and install a pure-Python package without errors. This results in a broken runtime environment.
Reproduction
I have created a minimal reproduction case that demonstrates how following the documentation can lead to a broken installation when version pinning is not strict.
1. pyproject.toml
Note: I am intentionally NOT pinning versions to simulate a scenario where uv picks a newer Torch version that flash-attn does not yet have a wheel for.
[project]
name = "flash-attn-repro"
version = "0.1.0"
requires-python = ">=3.10,<3.13"
dependencies = [
"torch>=2.4.0",
"flash-attn",
]
[tool.uv.extra-build-dependencies]
flash-attn = [{ requirement = "torch", match-runtime = true }]
[tool.uv.extra-build-variables]
# The docs suggest this ensures a wheel install, but it actually forces a hollow install if no wheel is found.
flash-attn = { FLASH_ATTENTION_SKIP_CUDA_BUILD = "TRUE" }
2. Commands run
uv sync -v
uv run python -c "import flash_attn"
Output
uv sync completes successfully, giving a false sense of security:
DEBUG Guessing wheel URL: https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.10cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
DEBUG Precompiled wheel not found. Building from source...
...
Installed 2 package in 10ms
+ flash-attn==2.8.3
+ torch==2.10.0
However, running the verification script reveals the package is broken:
ModuleNotFoundError: No module named 'flash_attn_2_cuda'
Analysis
This issue seems related to the edge cases of the build dependency functionality introduced in #13959 and #6437.
Upon analyzing flash-attn's setup.py, it appears that the current documentation advises an anti-pattern that makes the installation process less robust.
The FLASH_ATTENTION_SKIP_CUDA_BUILD variable is redundant for success and harmful for failure:
- Redundant when things work:
flash-attn'ssetup.py(viaCachedWheelsCommand) already prioritizes downloading wheels before attempting any compilation. If a valid wheel exists, it is installed regardless of this variable. Setting it provides no benefit here. - Harmful when things fail: If
uvresolves to a version combination (e.g., a newer Torch) for which no pre-built wheel exists:
- Without this variable: The setup falls back to local compilation, fails due to missing CUDA/nvcc (in a clean build env), and raises a loud, helpful error. This is the desired fail-fast behavior.
- With this variable: The setup falls back to local compilation, sees the flag, silently skips all CUDA extensions, and successfully installs a broken, pure-Python "hollow" package.
The current recommended configuration creates a dangerous trap where a "Build Failure" (which is easy to diagnose) is suppressed and converted into a "Runtime Failure" (which is confusing and occurs later).
Expected Behavior
- Documentation Update: The documentation should stop recommending
FLASH_ATTENTION_SKIP_CUDA_BUILDas a standard practice for ensuring wheel installation, as it doesn't actually "ensure" anything that the script doesn't already do by default. - Warning Added: If the variable is mentioned, the docs must warn that enabling it disables the safety mechanism (compilation failure) and can result in silent installation of non-functional packages if version pinning is not strict.
- Best Practice: The docs should instead emphasize explicitly pinning both
torchandflash-attnto known-good combinations to ensure the resolver picks versions that have matching wheels.
Platform
Linux 5.15.0-160-generic x86_64 GNU/Linux
Version
uv 0.9.28
Python version
Python 3.12.12