Add Triton CPU as an Inductor backend#133408
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133408
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New Failure, 2 Unrelated FailuresAs of commit 2c1a9b7 with merge base 156ca01 ( NEW FAILURE - The following job has failed:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but was present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
torch/_inductor/graph.py
Outdated
| def is_unspec_arg(self, name: str) -> bool: | ||
| # dynamo wraps unspec variable as 0d CPU tensor, | ||
| # need to convert to scalar during codegen (triton only) | ||
| return False # FIXME |
There was a problem hiding this comment.
Looking for ideas on how to tackle this; I don't really understand why unspec variables need to be represented like so
There was a problem hiding this comment.
This was introduced in #87329; cc @jansel @yanboliang
There was a problem hiding this comment.
This is because an unspecialized scalar is passed as a single-element cpu tensor input after dynamo. Should be ok to always return False for CPU-Triton.
|
I plan to add some tests as well before getting a full review, but would love to get some feedback on my overall approach first |
It will definitely make the review easier by breaking down into a stack of smaller PRs. |
I think the changes in this PR look large because of indentation changes; it looks better when you ignore whitespace: https://github.com/pytorch/pytorch/pull/133408/files?w=1. But if you still think it's too large I can break it down further. Anyway marking as draft for now while I fix things up |
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang rec Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968) [ghstack-poisoned]
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang rec Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968) [ghstack-poisoned]
|
@pytorchbot merge |
Merge failedReason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR! Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchbot revert |
|
❌ 🤖 pytorchbot command failed: Try |
|
@pytorchbot --help |
PyTorchBot HelpMergeRevertRebaseLabelDr CIcherry-pickCloseusage: @pytorchbot close Close a PR [Can be used on issues] |
|
@pytorchbot revert -m "internal tests failing" -c ghfirst |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@int3 your PR has been successfully reverted. |
|
@int3 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang rec Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968) [ghstack-poisoned]
|
@int3 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang rec Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968) [ghstack-poisoned]
|
This pull request was exported from Phabricator. Differential Revision: D63298968 |
|
@pytorchbot merge -f "test failure looks unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR! Details for Dev Infra teamRaised by workflow job |
|
@pytorchbot merge -f "test failure looks unrelated" |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@int3 I am trying to experimentally try this PR. Could I know what is the exact triton-cpu build you used ? |
Stack from ghstack (oldest at bottom):
The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend.
cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @rec
Differential Revision: D63298968