[FlexFlash] Blackwell fwd support#167040
[FlexFlash] Blackwell fwd support#167040drisspg wants to merge 26 commits intogh/drisspg/218/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/167040
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (2 Unrelated Failures)As of commit 9308f29 with merge base 39ebab1 ( BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
| def _supports_nontrivial_mask_graphs() -> bool: | ||
| """Currently only supported on Hopper (SM90) GPUs.""" | ||
| return torch.cuda.get_device_capability()[0] == 9 | ||
| return torch.cuda.get_device_capability()[0] in [9, 10] |
There was a problem hiding this comment.
What about consumer Blackwell? Ie. 12? Guessing no since CUDA8 isn't supported either.
There was a problem hiding this comment.
Just these 2, I though allow A100
|
@driss btw CUDNN_FRONTEND just came out with proper block mask bindings so we may end supporting that soon too |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
ghstack-source-id: 1db64e0 Pull-Request: pytorch/pytorch#167040
Stack from ghstack (oldest at bottom):
Need to land: Dao-AILab/flash-attention#1985
^^First^^
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben