[WIP][cuDNN][cuDNN V8 API] Add experimental cuDNN MHA/Flash Attention support by eqy · Pull Request #101916 · pytorch/pytorch

eqy · 2023-05-19T23:45:49Z

Initial implementation of forward pass cuDNN Flash Attention; current major restrictions are:

Only packed data layout (e.g., QKV tensors as chunks of the same tensor) supported
Only SM 9.0 and SM 8.0 support
Only head dim and sequence length divisible by 64 supported

Gated by TORCH_CUDNN_MHA_ENABLED=1 environment variable.
The plan is to eventually avoid pattern matching against strides as the support matrix from the cuDNN side is improved.

CC @ngimel @ptrblck

cc @csarofeen @ptrblck @xwang233 @ngimel

pytorch-bot · 2023-05-19T23:45:51Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/101916

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 18 New Failures, 2 Pending

As of commit 32a8ef8 with merge base 38e73b3 ():

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

build wip debug checkin debug wip some caching and philox seed stuff caching boilerplate broken wip fix refactor a bit check in cleanup

eellison · 2023-06-12T16:45:31Z

Do you have any benchmarks of this compared to any of the other implementations ?

drisspg · 2023-06-30T00:06:50Z

@eqy was also curious if you managed to get any benchmarks for this work?

eqy · 2023-06-30T00:34:01Z

Sorry, haven't had a chance to benchmark this due to other cuDNN issues being prioritized at the moment, but the expectation is that it is not expected to offer higher performance for now.

drisspg · 2023-06-30T00:38:47Z

@eqy no worries, I saw that the cuddn implementation was mentioned here: https://developer.nvidia.com/blog/breaking-mlperf-training-records-with-nvidia-h100-gpus/ and was very curious how it compares

github-actions · 2023-08-29T00:48:23Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

eqy added module: cudnn Related to torch.backends.cudnn, and CuDNN support module: cuda Related to torch.cuda, and CUDA support in general labels May 19, 2023

pytorchbot added the open source label May 19, 2023

eqy added topic: not user facing topic category open source module: multi-headed-attention and removed open source labels May 19, 2023

eqy added 8 commits May 23, 2023 01:11

check in wip

c2f2e2d

build wip debug checkin debug wip some caching and philox seed stuff caching boilerplate broken wip fix refactor a bit check in cleanup

why

35baaaa

data layout debugging

3780449

fix

5240fda

add layout check

fe6ad1e

cleanup

cdf7d35

more cleanup

34c960b

fix for namespace, cleanup

32a8ef8

eqy force-pushed the cudnnmha3 branch from 222097b to 32a8ef8 Compare May 23, 2023 01:38

ngimel requested a review from drisspg June 12, 2023 20:13

github-actions Bot added the Stale label Aug 29, 2023

drisspg mentioned this pull request Aug 30, 2023

[REDO][WIP][cuDNN][cuDNN V8 API] Add experimental cuDNN MHA/Flash Attention support #108247

Closed

drisspg mentioned this pull request Sep 25, 2023

Support for Fused Attention + FP8 meta-pytorch/float8_experimental#111

Closed

github-actions Bot closed this Sep 29, 2023

vkuzo mentioned this pull request Jul 30, 2024

Support for Fused Attention + FP8 pytorch/ao#560

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP][cuDNN][cuDNN V8 API] Add experimental cuDNN MHA/Flash Attention support#101916

[WIP][cuDNN][cuDNN V8 API] Add experimental cuDNN MHA/Flash Attention support#101916
eqy wants to merge 8 commits intopytorch:mainfrom
eqy:cudnnmha3

eqy commented May 19, 2023 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented May 19, 2023 •

edited

Loading

Uh oh!

eellison commented Jun 12, 2023

Uh oh!

drisspg commented Jun 30, 2023

Uh oh!

eqy commented Jun 30, 2023

Uh oh!

drisspg commented Jun 30, 2023

Uh oh!

github-actions Bot commented Aug 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

eqy commented May 19, 2023 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented May 19, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/101916

❌ 18 New Failures, 2 Pending

Uh oh!

eellison commented Jun 12, 2023

Uh oh!

drisspg commented Jun 30, 2023

Uh oh!

eqy commented Jun 30, 2023

Uh oh!

drisspg commented Jun 30, 2023

Uh oh!

github-actions Bot commented Aug 29, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

eqy commented May 19, 2023 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented May 19, 2023 •

edited

Loading