Add Flash Attention support to FlexAttention#161118

Closed

drisspg wants to merge 34 commits intogh/drisspg/187/basefrom

gh/drisspg/187/head

Contributor

drisspg commented Aug 21, 2025 •

edited by pytorch-bot bot

Loading

Stack from ghstack (oldest at bottom):

Relies on this PR in Flash Attention: Dao-AILab/flash-attention#1840

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben


          Update

5fcfe9d

[ghstack-poisoned]

drisspg requested review from albanD, jbschlosser and mikaylagawarecki as code owners

August 21, 2025 00:07

pytorch-bot bot commented Aug 21, 2025 •

edited

Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/161118

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit a092ae4 with merge base 086dec3 ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / inductor-test / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
vision_maskrcnn

This comment was automatically generated by Dr. CI and updates every 15 minutes.

drisspg added a commit that referenced this pull request


          Add Flash Attention support to FlexAttention

4126d61

ghstack-source-id: 437b74d
Pull-Request: #161118

pytorch-bot bot added ciflow/inductor module: inductor labels

drisspg mentioned this pull request

Updates to CuTe DSL template renderer #161117

Closed

drisspg marked this pull request as draft

August 21, 2025 00:08


          Update

9ed7a24

[ghstack-poisoned]

drisspg added a commit that referenced this pull request


          Add Flash Attention support to FlexAttention

be1e718

ghstack-source-id: 4051212
Pull-Request: #161118


          Update

ca5b586

[ghstack-poisoned]

drisspg added a commit that referenced this pull request


          Add Flash Attention support to FlexAttention

1c536d8

ghstack-source-id: c6e7e71
Pull-Request: #161118


          Update

4be3cd1

[ghstack-poisoned]

drisspg added a commit that referenced this pull request


          Add Flash Attention support to FlexAttention

a3477cd

ghstack-source-id: 25d3c8b
Pull-Request: #161118


          Update

ff6fd62

[ghstack-poisoned]

drisspg added a commit that referenced this pull request


          Add Flash Attention support to FlexAttention

f0c5c4a

ghstack-source-id: c534d0f
Pull-Request: #161118


          Update on "Add Flash Attention support to FlexAttention"

bcd5779


Working Branch: drisspg#2

[ghstack-poisoned]

drisspg added a commit that referenced this pull request


          Add Flash Attention support to FlexAttention

6601f8a

ghstack-source-id: 969df14
Pull-Request: #161118


          Update on "Add Flash Attention support to FlexAttention"

7c86c31


Working Branch: drisspg#2

[ghstack-poisoned]

drisspg added a commit that referenced this pull request


          Add Flash Attention support to FlexAttention

534dc6f

ghstack-source-id: 2670b86
Pull-Request: #161118


          Update on "Add Flash Attention support to FlexAttention"

40b47f6


Working Branch: drisspg#2

[ghstack-poisoned]

drisspg added a commit that referenced this pull request


          Add Flash Attention support to FlexAttention

da5b7ce

ghstack-source-id: d30fd54
Pull-Request: #161118


          Update on "Add Flash Attention support to FlexAttention"

85256a4


Working Branch: drisspg#2

[ghstack-poisoned]

drisspg added a commit that referenced this pull request


          Add Flash Attention support to FlexAttention

15cf5d1

ghstack-source-id: 54d1393
Pull-Request: #161118


          Update on "Add Flash Attention support to FlexAttention"

4f546cf


Working Branch: drisspg#2

[ghstack-poisoned]

drisspg added a commit that referenced this pull request


          Add Flash Attention support to FlexAttention

89691e1

ghstack-source-id: a6b6061
Pull-Request: #161118


          Update on "Add Flash Attention support to FlexAttention"

c39b1e5


Working Branch: drisspg#2

[ghstack-poisoned]

drisspg added a commit that referenced this pull request


          Add Flash Attention support to FlexAttention

9885b2e

ghstack-source-id: d6d6b61
Pull-Request: #161118

drisspg added 6 commits

September 11, 2025 16:26


          Update

33086b9

[ghstack-poisoned]


          Update

5bda8a6

[ghstack-poisoned]


          Update

b6abfe6

[ghstack-poisoned]


          Update

5bd41fc

[ghstack-poisoned]


          Update

faba0ae

[ghstack-poisoned]


          Update

104ee89

[ghstack-poisoned]

drisspg requested review from Chillee and removed request for albanD, jbschlosser and mikaylagawarecki

September 20, 2025 16:32

drisspg commented

View reviewed changes

torch/_inductor/kernel/flex/flex_flash_attention.py Outdated Show resolved Hide resolved

drisspg added 2 commits

October 2, 2025 19:39


          Update

[ghstack-poisoned]


          Update

7d65b7b

[ghstack-poisoned]

drisspg marked this pull request as ready for review

October 7, 2025 03:28


          Update

c554174

[ghstack-poisoned]

drisspg requested a review from v0i0

October 8, 2025 17:47

v0i0 approved these changes

View reviewed changes

torch/_inductor/kernel/flex/flex_flash_attention.py Outdated Show resolved Hide resolved

drisspg added 2 commits

October 8, 2025 22:34


          Update

87bb9b0

[ghstack-poisoned]


          Update

a90bade

[ghstack-poisoned]

Collaborator

pytorchmergebot commented Oct 9, 2025

Starting merge as part of PR stack under #162031

drisspg added 2 commits

October 9, 2025 19:15


          Update

[ghstack-poisoned]


          Update

a092ae4

[ghstack-poisoned]

Collaborator

pytorchmergebot commented Oct 10, 2025

Starting merge as part of PR stack under #162031

pytorchmergebot closed this in

0a2cde2

pytorchmergebot added the Merged label

pytorchmergebot pushed a commit that referenced this pull request


          Add Loads from fixed inputs (#162031)

0747d95

## TODO
Check on multi indices
```Python

    @cute.jit
    def score_mod(tSrS_ssa, b_idx, h_idx, q_idx, kv_idx, buffers):
        in_ptr4 = buffers[0]
        tmp0 = tSrS_ssa
        tmp1 = b_idx
        tmp2 = h_idx
        tmp3 = cute.make_fragment(1, cutlass.Int32)
        tmp4 = tmp3.store(32*tmp1 + tmp2)
        tmp5 = cute.make_fragment(1, cutlass.BFloat16)
        tmp6 = tmp3[0]
        tmp7 = tmp5[0] = (in_ptr4[tmp6])
        tmp8 = (tmp5.load()).to(cutlass.Float32)
        tmp9 = (tmp0 + tmp8)
        tSrS_ssa = tmp9

        return tSrS_ssa

 ```

I dont think that
```
        tmp4 = tmp3.store(32*tmp1 + tmp2)
        tmp5 = cute.make_fragment(1, cutlass.BFloat16)
        tmp6 = tmp3[0]
        tmp7 = tmp5[0] = (in_ptr4[tmp6]

```

 is right since this tmp6 value will be larger than the actual index dim int his case its B -> see if its possible to 1d index

Pull Request resolved: #162031
Approved by: https://github.com/v0i0
ghstack dependencies: #161118

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request


          Add Flash Attention support to FlexAttention (pytorch#161118)

15bf9bc

Relies on this PR in Flash Attention: Dao-AILab/flash-attention#1840

Pull Request resolved: pytorch#161118
Approved by: https://github.com/v0i0

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request


          Add Loads from fixed inputs (pytorch#162031)

3792fed

## TODO
Check on multi indices
```Python

    @cute.jit
    def score_mod(tSrS_ssa, b_idx, h_idx, q_idx, kv_idx, buffers):
        in_ptr4 = buffers[0]
        tmp0 = tSrS_ssa
        tmp1 = b_idx
        tmp2 = h_idx
        tmp3 = cute.make_fragment(1, cutlass.Int32)
        tmp4 = tmp3.store(32*tmp1 + tmp2)
        tmp5 = cute.make_fragment(1, cutlass.BFloat16)
        tmp6 = tmp3[0]
        tmp7 = tmp5[0] = (in_ptr4[tmp6])
        tmp8 = (tmp5.load()).to(cutlass.Float32)
        tmp9 = (tmp0 + tmp8)
        tSrS_ssa = tmp9

        return tSrS_ssa

 ```

I dont think that
```
        tmp4 = tmp3.store(32*tmp1 + tmp2)
        tmp5 = cute.make_fragment(1, cutlass.BFloat16)
        tmp6 = tmp3[0]
        tmp7 = tmp5[0] = (in_ptr4[tmp6]

```

 is right since this tmp6 value will be larger than the actual index dim int his case its B -> see if its possible to 1d index

Pull Request resolved: pytorch#162031
Approved by: https://github.com/v0i0
ghstack dependencies: pytorch#161118

github-actions bot deleted the gh/drisspg/187/head branch

November 9, 2025 02:18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor Merged module: inductor topic: not user facing