Added benchmark file for FA3 SDPA#173026
Added benchmark file for FA3 SDPA#173026howardzhang-cv wants to merge 4 commits intogh/howardzhang-cv/9/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/173026
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 2216e3a with merge base cee5acf ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
drisspg
left a comment
There was a problem hiding this comment.
can we just update this one: https://github.com/pytorch/pytorch/blob/main/benchmarks/transformer/sdpa.py to support Impls for fa e.g. fa2 fa3 and fa4
Yeah I originally made a new file because I didn't want to modify existing benchmarks, but this makes more sense for DRY. I just added flash_impl as a config to sdpa.py. Could not add FA4 as an option because there seems to be some backwards pass issues, not sure if it's just my build. |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Stack from ghstack (oldest at bottom):
Summary:
Added benchmark file for FA3 SDPA
Compares FA3 to FA2 fp16, bf16
Compares FA3 bf16 to FA3 fp8
Test Plan: python benchmarks/transformer/sdpa_fa3.py