[export] set enable_gqa in export flash->math decomp by pianpwk · Pull Request #158604 · pytorch/pytorch

pianpwk · 2025-07-17T23:14:58Z

Differential Revision: D78524147

For scaled_dot_product_attention(..., enable_gqa=True):

the Math backend passes the flag through, performing the extra KV broadcast if set to True
the Flash backend has no flag, and relies on correct indexing in the C++ kernel
Export used to default to Math for enable_gqa=True, but [CPU] Support GQA for flash attention #157893 landed and enabled Flash. At the same time, there's an export-only decomp redirecting flash -> math, calling with enable_gqa unset, because that info isn't available. This led to https://fb.workplace.com/groups/1028545332188949/posts/1264609398582540 crashing, calling the Math non-GQA variant, with GQA inputs.

This assumes GQA for seqlen mismatches in the export decomp, setting enable_gqa = <q seqlen> != <kv seqlen>, relying on prior backend checks to raise on invalid input shapes.

pytorch-bot · 2025-07-17T23:15:02Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158604

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit ebeb0a9 with merge base 82f8e04 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / win-vs2022-cpu-py3 / test (default, 1, 3, windows.4xlarge.nonephemeral) (gh) (similar failure)
'Test'

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable) (gh) (#158876)
sccache: error: couldn't connect to server

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2025-07-17T23:15:10Z

This pull request was exported from Phabricator. Differential Revision: D78524147

Summary: For `scaled_dot_product_attention(..., enable_gqa=True)`: - the Math backend passes the flag through, performing the extra [KV broadcast](https://github.com/pytorch/pytorch/blob/6e07d6a0ff386d99d8c2f1d25978b0683988a4cb/aten/src/ATen/native/transformers/attention.cpp#L902) if set to True - the Flash backend has no flag, and relies on correct indexing in the C++ kernel - Export used to default to Math for `enable_gqa=True`, but #157893 landed and enabled Flash. At the same time, there's an export-only [decomp](https://github.com/pytorch/pytorch/blob/6e07d6a0ff386d99d8c2f1d25978b0683988a4cb/torch/_decomp/decompositions.py#L4968) redirecting flash -> math, calling with `enable_gqa` unset, because that info isn't available. This led to https://fb.workplace.com/groups/1028545332188949/posts/1264609398582540 crashing, calling the Math non-GQA variant, with GQA inputs. This assumes GQA for seqlen mismatches in the export decomp, setting `enable_gqa = <q seqlen> != <kv seqlen>`, relying on prior backend checks to raise on invalid input shapes. Test Plan: test_export Rollback Plan: Differential Revision: D78524147

facebook-github-bot · 2025-07-18T18:38:33Z

This pull request was exported from Phabricator. Differential Revision: D78524147

Summary: For `scaled_dot_product_attention(..., enable_gqa=True)`: - the Math backend passes the flag through, performing the extra [KV broadcast](https://github.com/pytorch/pytorch/blob/6e07d6a0ff386d99d8c2f1d25978b0683988a4cb/aten/src/ATen/native/transformers/attention.cpp#L902) if set to True - the Flash backend has no flag, and relies on correct indexing in the C++ kernel - Export used to default to Math for `enable_gqa=True`, but #157893 landed and enabled Flash. At the same time, there's an export-only [decomp](https://github.com/pytorch/pytorch/blob/6e07d6a0ff386d99d8c2f1d25978b0683988a4cb/torch/_decomp/decompositions.py#L4968) redirecting flash -> math, calling with `enable_gqa` unset, because that info isn't available. This led to https://fb.workplace.com/groups/1028545332188949/posts/1264609398582540 crashing, calling the Math non-GQA variant, with GQA inputs. This assumes GQA for seqlen mismatches in the export decomp, setting `enable_gqa = <q seqlen> != <kv seqlen>`, relying on prior backend checks to raise on invalid input shapes. Test Plan: test_export Rollback Plan: Reviewed By: angelayi Differential Revision: D78524147

facebook-github-bot · 2025-07-23T21:59:42Z

This pull request was exported from Phabricator. Differential Revision: D78524147

facebook-github-bot · 2025-07-24T14:38:28Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2025-07-24T14:40:28Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Differential Revision: D78524147 For `scaled_dot_product_attention(..., enable_gqa=True)`: - the Math backend passes the flag through, performing the extra [KV broadcast](https://github.com/pytorch/pytorch/blob/6e07d6a0ff386d99d8c2f1d25978b0683988a4cb/aten/src/ATen/native/transformers/attention.cpp#L902) if set to True - the Flash backend has no flag, and relies on correct indexing in the C++ kernel - Export used to default to Math for `enable_gqa=True`, but #157893 landed and enabled Flash. At the same time, there's an export-only [decomp](https://github.com/pytorch/pytorch/blob/6e07d6a0ff386d99d8c2f1d25978b0683988a4cb/torch/_decomp/decompositions.py#L4968) redirecting flash -> math, calling with `enable_gqa` unset, because that info isn't available. This led to https://fb.workplace.com/groups/1028545332188949/posts/1264609398582540 crashing, calling the Math non-GQA variant, with GQA inputs. This assumes GQA for seqlen mismatches in the export decomp, setting `enable_gqa = <q seqlen> != <kv seqlen>`, relying on prior backend checks to raise on invalid input shapes. Pull Request resolved: #158604 Approved by: https://github.com/angelayi, https://github.com/drisspg

pytorch-bot bot added the ciflow/inductor label Jul 17, 2025

facebook-github-bot added the fb-exported label Jul 17, 2025

pianpwk added the release notes: export label Jul 18, 2025

pianpwk requested review from StellarrZ, angelayi, digantdesai, drisspg and larryliu0820 July 18, 2025 18:37

facebook-github-bot force-pushed the export-D78524147 branch from 92e4661 to 8e1d5d9 Compare July 18, 2025 18:38

angelayi approved these changes Jul 23, 2025

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jul 23, 2025

facebook-github-bot force-pushed the export-D78524147 branch from 8e1d5d9 to ebeb0a9 Compare July 23, 2025 21:59

drisspg approved these changes Jul 24, 2025

View reviewed changes

pytorchmergebot added the merging label Jul 24, 2025

pytorchmergebot closed this in 48fe4ff Jul 24, 2025

pytorchmergebot added Merged and removed merging labels Jul 24, 2025

github-actions bot deleted the export-D78524147 branch August 24, 2025 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[export] set enable_gqa in export flash->math decomp#158604

[export] set enable_gqa in export flash->math decomp#158604
pianpwk wants to merge 1 commit intomainfrom
export-D78524147

pianpwk commented Jul 17, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jul 17, 2025 •

edited

Loading

Uh oh!

facebook-github-bot commented Jul 17, 2025

Uh oh!

facebook-github-bot commented Jul 18, 2025

Uh oh!

facebook-github-bot commented Jul 23, 2025

Uh oh!

facebook-github-bot commented Jul 24, 2025

Uh oh!

pytorchmergebot commented Jul 24, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

pianpwk commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/158604

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

facebook-github-bot commented Jul 17, 2025

Uh oh!

facebook-github-bot commented Jul 18, 2025

Uh oh!

facebook-github-bot commented Jul 23, 2025

Uh oh!

facebook-github-bot commented Jul 24, 2025

Uh oh!

pytorchmergebot commented Jul 24, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

pianpwk commented Jul 17, 2025 •

edited

Loading

pytorch-bot bot commented Jul 17, 2025 •

edited

Loading