[Submodule] Turning flash-attention integration into 3rd party submod by drisspg · Pull Request #144120 · pytorch/pytorch

drisspg · 2025-01-03T00:40:03Z

Stack from ghstack (oldest at bottom):

-> [Submodule] Turning flash-attention integration into 3rd party submod #144120

Summary

Sticky points

Cuda-graph rng handling has changed / deviated from original implementation. We will be left with a dangling 'offset' val and confusing naming due to BC

Dependencies

Flash PR: Add a macro for namespace Dao-AILab/flash-attention#1419

Other Points

The BC linter is complaining about losing generate.py and its functions which is not real BC surface
cc @albanD

Differential Revision: D68502879

[ghstack-poisoned]

pytorch-bot · 2025-01-03T00:40:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144120

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 14 New Failures, 8 Unrelated Failures

As of commit 3ab6395 with merge base 40e27fb ():

NEW FAILURES - The following jobs have failed:

inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_huggingface, 1, 1, linux.g5.4xlarge.nvidia.gpu) (gh)
YituTechConvBert
inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
deit_base_distilled_patch16_224
inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_timm, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
xcit_large_24_p8_224
inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
hf_distil_whisper
inductor / cuda12.4-py3.10-gcc9-sm86 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
torch_multimodal_clip
inductor / unit-test / cuda12.4-py3.10-gcc9-sm86 / test (inductor_cpp_wrapper, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
inductor/test_torchinductor.py::TritonCodeGenTests::test_donated_buffer_inplace_gpt
inductor-rocm / rocm6.3-py3.10-inductor / test (inductor, 1, 2, linux.rocm.gpu.2) (gh)
inductor/test_torchinductor.py::TritonCodeGenTests::test_donated_buffer_inplace_gpt
Lint / lintrunner-noclang / linux-job (gh)
>>> Lint for torch/_meta_registrations.py:
pull / linux-jammy-py3.9-gcc11 / test (backwards_compat, 1, 1, lf.linux.2xlarge) (gh)
test_modules_can_be_imported
rocm / linux-focal-rocm6.3-py3.10 / test (default, 2, 6, linux.rocm.gpu.2) (gh)
Process completed with exit code 1.
rocm / linux-focal-rocm6.3-py3.10 / test (default, 3, 6, linux.rocm.gpu.2) (gh)
inductor/test_fused_attention.py::SDPAPatternRewriterCudaTests::test_sdpa_rewriter_10_cuda
rocm / linux-focal-rocm6.3-py3.10 / test (default, 5, 6, linux.rocm.gpu.2) (gh)
test_meta.py::TestMetaCUDA::test_dispatch_meta_outplace_torch_ops_aten__flash_attention_forward_cuda_bfloat16
rocm / linux-focal-rocm6.3-py3.10 / test (default, 6, 6, linux.rocm.gpu.2) (gh)
test_meta.py::TestMetaCUDA::test_dispatch_symbolic_meta_outplace_torch_ops_aten__flash_attention_forward_cuda_bfloat16
trunk / linux-focal-rocm6.3-py3.10 / test (default, 2, 2, linux.rocm.gpu.2) (gh)
inductor/test_torchinductor.py::TritonCodeGenTests::test_donated_buffer_inplace_gpt

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_timm, 1, 2, linux.8xlarge.amx) (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 1
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 1, 2, linux.8xlarge.amx) (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 1
inductor / linux-jammy-cpu-py3.9-gcc11-inductor / test (dynamic_cpu_inductor_torchbench, 2, 2, linux.8xlarge.amx) (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 1
inductor / unit-test / cuda12.4-py3.10-gcc9-sm86 / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 1
inductor / unit-test / linux-jammy-cpu-py3.12-gcc11-inductor-triton-cpu / test (inductor-triton-cpu, 1, 1, linux.12xlarge) (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 1
inductor / unit-test / linux-jammy-cpu-py3.9-gcc11-inductor / test (inductor_amx, 1, 2, linux.8xlarge.amx) (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 1
linux-aarch64 / linux-jammy-aarch64-py3.10 / test (default, 4, 4, linux.arm64.2xlarge) (gh) (matched linux rule in flaky-rules.json)
The process '/usr/bin/git' failed with exit code 1
linux-binary-manywheel / manywheel-py3_9-cuda12_4-test / test (gh) (similar failure)
Can't find 'action.yml', 'action.yaml' or 'Dockerfile' under '/home/ec2-user/actions-runner/_work/pytorch/pytorch/pytorch/.github/actions/chown-workspace'. Did you forget to run actions/checkout before running your local action?

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: 0494ca4 Pull Request resolved: #144120

aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.h

CMakeLists.txt

[ghstack-poisoned]

ghstack-source-id: ab6ce91 Pull Request resolved: #144120

[ghstack-poisoned]

ghstack-source-id: 4d9655d Pull Request resolved: #144120

[ghstack-poisoned]

ghstack-source-id: 393f416 Pull Request resolved: #144120

drisspg · 2025-01-22T18:00:30Z

@drisspg has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

…145502) # Context Prototyped here: #144120, we are going to make flash-attention a 3rd party submodule. We will then use the c++ sources and include into our build of libtorch.so This requires various changes to work including external and internal changes. Since these require internal changes we need to co-dev and in the co-dev environment I haven't found a way to sync submodule changes + internal only changes. This is unused for now Pull Request resolved: #145502 Approved by: https://github.com/Skylion007

drisspg · 2025-01-24T19:47:53Z

CMakeLists.txt

 # feature by default We dont currently document this feature because we don't
 # Suspect users building from source will need this
 add_definitions(-DFLASHATTENTION_DISABLE_ALIBI)
+add_definitions(-DFLASHATTENTION_DISABLE_SOFTCAP)


move these here:

pytorch/caffe2/CMakeLists.txt

Line 989 in 3ab6395

target_compile_definitions(torch_cuda PRIVATE USE_MEM_EFF_ATTENTION)

…#144120) Summary: Pull Request resolved: #144120 # Summary ### Sticky points Cuda-graph rng handling has changed / deviated from original implementation. We will be left with a dangling 'offset' val and confusing naming due to BC ## Dependencies - Flash PR: Dao-AILab/flash-attention#1419 ### Other Points - The BC linter is complaining about losing generate.py and its functions which is not real BC surface cc albanD imported-using-ghimport Test Plan: Imported from OSS Building in dev `buck build @//mode/dev-nosan -c fbcode.nvcc_arch=h100a //caffe2:ATen-cu --show-full-output ` I and Nming the .so I do see that the flash symbols are correctly named: ``` 0000000001c3dfb0 t pytorch_flash::run_mha_bwd(pytorch_flash::Flash_bwd_params&, CUstream_st*)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const 0000000001c36080 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const::{lambda()#6}::operator()() const 0000000001c360e0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const 0000000001c35fc0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#6}::operator()() const 0000000001c36020 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()#7}::operator()() const ``` Reviewed By: vkuzo Differential Revision: D68502879 Pulled By: drisspg

…pytorch#146372) Summary: Pull Request resolved: pytorch#146372 Pull Request resolved: pytorch#144120 # Summary ### Sticky points Cuda-graph rng handling has changed / deviated from original implementation. We will be left with a dangling 'offset' val and confusing naming due to BC ## Dependencies - Flash PR: Dao-AILab/flash-attention#1419 ### Other Points - The BC linter is complaining about losing generate.py and its functions which is not real BC surface cc albanD imported-using-ghimport Test Plan: Imported from OSS Building in dev `buck build @//mode/dev-nosan -c fbcode.nvcc_arch=h100a //caffe2:ATen-cu --show-full-output ` I and Nming the .so I do see that the flash symbols are correctly named: ``` 0000000001c3dfb0 t pytorch_flash::run_mha_bwd(pytorch_flash::Flash_bwd_params&, CUstream_st*)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()pytorch#7}::operator()() const 0000000001c36080 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const::{lambda()pytorch#6}::operator()() const 0000000001c360e0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#2}::operator()() const::{lambda()#1}::operator()() const::{lambda()pytorch#7}::operator()() const 0000000001c35fc0 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()pytorch#6}::operator()() const 0000000001c36020 t pytorch_flash::run_mha_fwd(pytorch_flash::Flash_fwd_params&, CUstream_st*, bool)::$_0::operator()() const::{lambda()#1}::operator()() const::{lambda()#1}::operator()() const::{lambda()pytorch#7}::operator()() const ``` Reviewed By: vkuzo Differential Revision: D68502879 Pulled By: drisspg

Update

7abbb5e

[ghstack-poisoned]

drisspg mentioned this pull request Jan 3, 2025

working #144119

Closed

drisspg marked this pull request as draft January 3, 2025 00:40

Update

6dfc9ca

[ghstack-poisoned]

drisspg added a commit that referenced this pull request Jan 3, 2025

Trying to reduce flash-deps

6ff12cd

ghstack-source-id: 0494ca4 Pull Request resolved: #144120

drisspg commented Jan 3, 2025

View reviewed changes

aten/src/ATen/native/transformers/cuda/flash_attn/flash_api.h Outdated Show resolved Hide resolved

drisspg added topic: not user facing topic category module: sdpa All things related to torch.nn.functional.scaled_dot_product_attentiion labels Jan 3, 2025

Skylion007 reviewed Jan 5, 2025

View reviewed changes

CMakeLists.txt Show resolved Hide resolved

drisspg changed the title ~~Trying to reduce flash-deps~~ [Submodule] Turning flash-attention integration into 3rd party submod Jan 7, 2025

Update

fe86baf

[ghstack-poisoned]

Update

ffc6d78

[ghstack-poisoned]

Update

65950ea

[ghstack-poisoned]

Update

4c7e8ea

[ghstack-poisoned]

Update

d116661

[ghstack-poisoned]

drisspg added a commit that referenced this pull request Jan 7, 2025

Trying to reduce flash-deps

4e3a00a

ghstack-source-id: ab6ce91 Pull Request resolved: #144120

Update

cc571fb

[ghstack-poisoned]

drisspg added a commit that referenced this pull request Jan 7, 2025

Trying to reduce flash-deps

7b10ed6

ghstack-source-id: 4d9655d Pull Request resolved: #144120

Update

e734467

[ghstack-poisoned]

Update

abc18f0

[ghstack-poisoned]

pytorch-bot bot had a problem deploying to upload-benchmark-results January 22, 2025 00:13 Failure

pytorch-bot bot temporarily deployed to upload-benchmark-results January 22, 2025 00:13 Inactive

pytorch-bot bot had a problem deploying to upload-benchmark-results January 22, 2025 00:13 Failure

pytorch-bot bot temporarily deployed to upload-benchmark-results January 22, 2025 00:13 Inactive

pytorch-bot bot had a problem deploying to upload-benchmark-results January 22, 2025 00:13 Failure

pytorch-bot bot temporarily deployed to upload-benchmark-results January 22, 2025 00:13 Inactive

Update

3ab6395

[ghstack-poisoned]

drisspg added a commit that referenced this pull request Jan 22, 2025

Trying to reduce flash-deps

9f5e9be

ghstack-source-id: 393f416 Pull Request resolved: #144120

pytorch-bot bot temporarily deployed to upload-benchmark-results January 22, 2025 18:35 Inactive

pytorch-bot bot had a problem deploying to upload-benchmark-results January 22, 2025 18:35 Failure

pytorch-bot bot temporarily deployed to upload-benchmark-results January 22, 2025 18:35 Inactive

pytorch-bot bot had a problem deploying to upload-benchmark-results January 22, 2025 18:36 Failure

pytorch-bot bot temporarily deployed to upload-benchmark-results January 22, 2025 18:36 Inactive

pytorch-bot bot temporarily deployed to upload-benchmark-results January 22, 2025 18:38 Inactive

pytorch-bot bot had a problem deploying to upload-benchmark-results January 22, 2025 18:38 Failure

pytorch-bot bot temporarily deployed to upload-benchmark-results January 22, 2025 18:38 Inactive

pytorch-bot bot had a problem deploying to upload-benchmark-results January 22, 2025 18:38 Failure

drisspg mentioned this pull request Jan 23, 2025

[Submodule] Add flash as third-party submodule [Prep for later PRs] #145502

Closed

drisspg commented Jan 24, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Submodule] Turning flash-attention integration into 3rd party submod#144120

[Submodule] Turning flash-attention integration into 3rd party submod#144120
drisspg wants to merge 22 commits intogh/drisspg/111/basefrom
gh/drisspg/111/head

drisspg commented Jan 3, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jan 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

drisspg commented Jan 22, 2025

Uh oh!

drisspg Jan 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

drisspg commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Sticky points

Dependencies

Other Points

Uh oh!

pytorch-bot bot commented Jan 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/144120

❌ 14 New Failures, 8 Unrelated Failures

Uh oh!

Uh oh!

Uh oh!

drisspg commented Jan 22, 2025

Uh oh!

drisspg Jan 24, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

drisspg commented Jan 3, 2025 •

edited

Loading

pytorch-bot bot commented Jan 3, 2025 •

edited

Loading