Documentation for FLASH_ATTENTION_SKIP_CUDA_BUILD is misleading and causes silent installation of broken packages.

### Summary

The documentation for installing `flash-attn` (specifically the section on `extra-build-variables`) is misleading and can lead to a "silent failure" state where `uv` reports a successful installation, but the installed package is a "hollow" shell containing no compiled CUDA extensions.

The documentation states:

> *"The FLASH_ATTENTION_SKIP_CUDA_BUILD environment variable **ensures** that flash-attn is installed from a compatible, pre-built wheel..."*

However, this variable only disables local compilation. If `uv` resolves to a version combination (e.g., latest Torch + Flash Attn) for which **no official pre-built wheel exists**, passing this variable causes `setup.py` to simply skip compilation and install a pure-Python package without errors. This results in a broken runtime environment.

## Reproduction

I have created a minimal reproduction case that demonstrates how following the documentation can lead to a broken installation when version pinning is not strict.

### 1. `pyproject.toml`

Note: I am intentionally NOT pinning versions to simulate a scenario where `uv` picks a newer Torch version that `flash-attn` does not yet have a wheel for.

```toml
[project]
name = "flash-attn-repro"
version = "0.1.0"
requires-python = ">=3.10,<3.13"
dependencies = [
    "torch>=2.4.0",
    "flash-attn",
]

[tool.uv.extra-build-dependencies]
flash-attn = [{ requirement = "torch", match-runtime = true }]

[tool.uv.extra-build-variables]
# The docs suggest this ensures a wheel install, but it actually forces a hollow install if no wheel is found.
flash-attn = { FLASH_ATTENTION_SKIP_CUDA_BUILD = "TRUE" }

```

### 2. Commands run

```bash
uv sync -v
uv run python -c "import flash_attn"

```

## Output

`uv sync` completes successfully, giving a false sense of security:

```text
DEBUG Guessing wheel URL:  https://github.com/Dao-AILab/flash-attention/releases/download/v2.8.3/flash_attn-2.8.3+cu12torch2.10cxx11abiTRUE-cp312-cp312-linux_x86_64.whl
DEBUG Precompiled wheel not found. Building from source...
...
Installed 2 package in 10ms
 + flash-attn==2.8.3
 + torch==2.10.0
```

However, running the verification script reveals the package is broken:

```text
ModuleNotFoundError: No module named 'flash_attn_2_cuda'
```

## Analysis

This issue seems related to the edge cases of the build dependency functionality introduced in #13959 and #6437.

Upon analyzing `flash-attn`'s `setup.py`, it appears that the current documentation advises an **anti-pattern** that makes the installation process less robust.

The `FLASH_ATTENTION_SKIP_CUDA_BUILD` variable is **redundant for success** and **harmful for failure**:

1. **Redundant when things work**: `flash-attn`'s `setup.py` (via `CachedWheelsCommand`) *already* prioritizes downloading wheels before attempting any compilation. If a valid wheel exists, it is installed regardless of this variable. Setting it provides no benefit here.
2. **Harmful when things fail**: If `uv` resolves to a version combination (e.g., a newer Torch) for which no pre-built wheel exists:
* **Without this variable**: The setup falls back to local compilation, fails due to missing CUDA/nvcc (in a clean build env), and raises a loud, helpful error. **This is the desired fail-fast behavior.**
* **With this variable**: The setup falls back to local compilation, sees the flag, silently skips all CUDA extensions, and successfully installs a broken, pure-Python "hollow" package.



The current recommended configuration creates a dangerous trap where a "Build Failure" (which is easy to diagnose) is suppressed and converted into a "Runtime Failure" (which is confusing and occurs later).

## Expected Behavior

1. **Documentation Update**: The documentation should stop recommending `FLASH_ATTENTION_SKIP_CUDA_BUILD` as a standard practice for ensuring wheel installation, as it doesn't actually "ensure" anything that the script doesn't already do by default.
2. **Warning Added**: If the variable is mentioned, the docs must warn that enabling it disables the safety mechanism (compilation failure) and can result in silent installation of non-functional packages if version pinning is not strict.
3. **Best Practice**: The docs should instead emphasize **explicitly pinning** both `torch` and `flash-attn` to known-good combinations to ensure the resolver picks versions that have matching wheels.

### Platform

Linux 5.15.0-160-generic x86_64 GNU/Linux

### Version

uv 0.9.28

### Python version

Python 3.12.12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Documentation for FLASH_ATTENTION_SKIP_CUDA_BUILD is misleading and causes silent installation of broken packages. #17794

Summary

Reproduction

1. `pyproject.toml`

2. Commands run

Output

Analysis

Expected Behavior

Platform

Version

Python version

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Documentation for FLASH_ATTENTION_SKIP_CUDA_BUILD is misleading and causes silent installation of broken packages. #17794

Description

Summary

Reproduction

1. pyproject.toml

2. Commands run

Output

Analysis

Expected Behavior

Platform

Version

Python version

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `pyproject.toml`