Skip to content

[flex attention][triton pin] use new TMA API#155771

Closed
davidberard98 wants to merge 3 commits intogh/davidberard98/374/basefrom
gh/davidberard98/374/head
Closed

[flex attention][triton pin] use new TMA API#155771
davidberard98 wants to merge 3 commits intogh/davidberard98/374/basefrom
gh/davidberard98/374/head

Conversation

@davidberard98
Copy link
Contributor

@davidberard98 davidberard98 commented Jun 12, 2025

Stack from ghstack (oldest at bottom):

Triton 3.4 will remove the experimental TMA APIs: triton-lang/triton#6488. Ahead of this, we are replacing the experimental TMA API usage with the stable TMA API in flex attention. This means that flex attention TMA will stop working with Triton 3.2 or Triton 3.3/3.3.1 for now (but it should work for Triton 3.4 in the PyTorch 2.8 release, and Meta-internal triton 3.3.1fb, which have the new TMA API).

This PR does the following:

  • replace the experimental TMA APIs with the stable TMA APIs
  • remove the workspace args.

Testing: I ran test/inductor/test_flex_attention.py on a H100 with @mandroid6's PR #153662 patched in to turn on TMA [TODO: confirm results once all the local tests pass, but from the first 100 tests I ran locally, all the failing tests were also failing on #153662 alone]

Note: When #153662 lands, turning on TMA support by default, it should be checking specifically for stable TMA API support (commented on PR)

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @chauhang @aakhundov

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 12, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/155771

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 9c7b915 with merge base 132babe (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

davidberard98 added a commit that referenced this pull request Jun 12, 2025
ghstack-source-id: 288a925
Pull Request resolved: #155771
@davidberard98 davidberard98 changed the title [flex attention][triton pin] use new TMA AP [flex attention][triton pin] use new TMA API Jun 12, 2025
@davidberard98 davidberard98 marked this pull request as draft June 12, 2025 05:02
Triton 3.4 will remove the experimental TMA APIs: triton-lang/triton#6488. Ahead of this, we are **replacing the experimental TMA API usage with the stable TMA API** in flex attention. This means that **flex attention TMA will stop working with Triton 3.2 or Triton 3.3/3.3.1** for now (but it should work for Triton 3.4 in the PyTorch 2.8 release, and Meta-internal triton 3.3.1fb, which have the new TMA API).

This PR does the following:
* replace the experimental TMA APIs with the stable TMA APIs
* remove the workspace args.

Testing: I ran test/inductor/test_flex_attention.py on a H100, [TODO confirm results]

TODO: When #153662 lands, turning on TMA support by default, it should be modified slightly to check specifically for stable TMA API support.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
Triton 3.4 will remove the experimental TMA APIs: triton-lang/triton#6488. Ahead of this, we are **replacing the experimental TMA API usage with the stable TMA API** in flex attention. This means that **flex attention TMA will stop working with Triton 3.2 or Triton 3.3/3.3.1** for now (but it should work for Triton 3.4 in the PyTorch 2.8 release, and Meta-internal triton 3.3.1fb, which have the new TMA API).

This PR does the following:
* replace the experimental TMA APIs with the stable TMA APIs
* remove the workspace args.

Testing: I ran test/inductor/test_flex_attention.py on a H100, [TODO confirm results]

Note: When #153662 lands, turning on TMA support by default, it should be checking specifically for stable TMA API support (commented on PR)

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy chenyang78 kadeng muchulee8 amjames chauhang aakhundov

[ghstack-poisoned]
davidberard98 added a commit that referenced this pull request Jun 12, 2025
ghstack-source-id: 4ca8938
Pull Request resolved: #155771
@davidberard98 davidberard98 requested review from drisspg and mandroid6 and removed request for mandroid6 June 12, 2025 22:06
@davidberard98 davidberard98 marked this pull request as ready for review June 13, 2025 03:17
@davidberard98
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Jun 13, 2025
@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: Approvers from one of the following sets are needed:

  • superuser (pytorch/metamates)
  • Core Reviewers (mruberry, lezcano, Skylion007, ngimel, peterbell10, ...)
  • Core Maintainers (soumith, gchanan, ezyang, dzhulgakov, malfet, ...)
Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

Copy link
Contributor

@nmacchioni nmacchioni left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@davidberard98
Copy link
Contributor Author

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants