Add Triton CPU as an Inductor backend by int3 · Pull Request #133408 · pytorch/pytorch

int3 · 2024-08-14T05:15:17Z

Stack from ghstack (oldest at bottom):

-> Add Triton CPU as an Inductor backend #133408

The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend.

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @rec

Differential Revision: D63298968

[ghstack-poisoned]

pytorch-bot · 2024-08-14T05:15:20Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133408

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Unrelated Failures

As of commit 2c1a9b7 with merge base 156ca01 ():

NEW FAILURE - The following job has failed:

Lint / Test run_test.py is usable without boto3/rockset (gh)
ERROR: Could not find a version that satisfies the requirement torch (from versions: none)

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor-periodic / cuda12.1-py3.10-gcc9-sm80 / test (inductor_torchbench_smoketest_perf, 1, 1, linux.gcp.a100) (gh) (similar failure)
moco

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / cuda12.1-py3.10-gcc9-sm86 / test (aot_inductor_torchbench, 2, 2, lf.linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
sam_fast

This comment was automatically generated by Dr. CI and updates every 15 minutes.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

torch/_inductor/config.py

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: cce3dd0 Pull Request resolved: #133408

torch/_inductor/codegen/common.py

int3 · 2024-08-14T05:26:58Z

torch/_inductor/graph.py

    def is_unspec_arg(self, name: str) -> bool:
        # dynamo wraps unspec variable as 0d CPU tensor,
        # need to convert to scalar during codegen (triton only)
+        return False  # FIXME


Looking for ideas on how to tackle this; I don't really understand why unspec variables need to be represented like so

This was introduced in #87329; cc @jansel @yanboliang

This is because an unspecialized scalar is passed as a single-element cpu tensor input after dynamo. Should be ok to always return False for CPU-Triton.

int3 · 2024-08-14T05:32:06Z

I plan to add some tests as well before getting a full review, but would love to get some feedback on my overall approach first

desertfire · 2024-08-14T14:29:41Z

I plan to add some tests as well before getting a full review, but would love to get some feedback on my overall approach first

It will definitely make the review easier by breaking down into a stack of smaller PRs.

int3 · 2024-08-15T02:38:41Z

It will definitely make the review easier by breaking down into a stack of smaller PRs.

I think the changes in this PR look large because of indentation changes; it looks better when you ignore whitespace: https://github.com/pytorch/pytorch/pull/133408/files?w=1. But if you still think it's too large I can break it down further.

Anyway marking as draft for now while I fix things up

The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: c60fcfa Pull Request resolved: #133408

The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

ghstack-source-id: aaee653 Pull Request resolved: #133408

The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang rec Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968) [ghstack-poisoned]

int3 · 2024-09-26T14:21:53Z

@pytorchbot merge

pytorchmergebot · 2024-09-26T14:23:38Z

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team

Raised by workflow job

int3 · 2024-09-26T15:27:53Z

@pytorchbot merge

pytorchmergebot · 2024-09-26T15:29:58Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

int3 · 2024-09-27T16:41:51Z

@pytorchbot revert

pytorch-bot · 2024-09-27T16:41:53Z

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -m/--message, -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

int3 · 2024-09-27T16:52:11Z

@pytorchbot --help

pytorch-bot · 2024-09-27T16:52:13Z

PyTorchBot Help

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick,close} ...

In order to invoke the bot on your PR, include a line that starts with
@pytorchbot anywhere in a comment. That line will form the command; no
multi-line commands are allowed. Some commands may be used on issues as specified below.

Example:
    Some extra context, blah blah, wow this PR looks awesome

    @pytorchbot merge

optional arguments:
  -h, --help            Show this help message and exit.

command:
  {merge,revert,rebase,label,drci,cherry-pick,close}
    merge               Merge a PR
    revert              Revert a PR
    rebase              Rebase a PR
    label               Add label to a PR
    drci                Update Dr. CI
    cherry-pick         Cherry pick a PR onto a release branch
    close               Close a PR

Merge

usage: @pytorchbot merge [-f MESSAGE | -i] [-ic] [-r [{viable/strict,main}]]

Merge an accepted PR, subject to the rules in .github/merge_rules.json.
By default, this will wait for all required checks (lint, pull) to succeed before merging.

optional arguments:
  -f MESSAGE, --force MESSAGE
                        Merge without checking anything. This requires a reason for auditting purpose, for example:
                        @pytorchbot merge -f 'Minor update to fix lint. Expecting all PR tests to pass'
                        
                        Please use `-f` as last resort, prefer `--ignore-current` to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.
  -i, --ignore-current  Merge while ignoring the currently failing jobs.  Behaves like -f if there are no pending jobs.
  -ic                   Old flag for --ignore-current. Deprecated in favor of -i.
  -r [{viable/strict,main}], --rebase [{viable/strict,main}]
                        Rebase the PR to re run checks before merging.  Accepts viable/strict or main as branch options and will default to viable/strict if not specified.

Revert

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Revert a merged PR. This requires that you are a Meta employee.

Example:
  @pytorchbot revert -m="This is breaking tests on trunk. hud.pytorch.org/" -c=nosignal

optional arguments:
  -m MESSAGE, --message MESSAGE
                        The reason you are reverting, will be put in the commit message. Must be longer than 3 words.
  -c {nosignal,ignoredsignal,landrace,weird,ghfirst}, --classification {nosignal,ignoredsignal,landrace,weird,ghfirst}
                        A machine-friendly classification of the revert reason.

Rebase

usage: @pytorchbot rebase [-s | -b BRANCH]

Rebase a PR. Rebasing defaults to the stable viable/strict branch of pytorch.
Repeat contributor may use this command to rebase their PR.

optional arguments:
  -s, --stable          [DEPRECATED] Rebase onto viable/strict
  -b BRANCH, --branch BRANCH
                        Branch you would like to rebase to

Label

usage: @pytorchbot label labels [labels ...]

Adds label to a PR or Issue [Can be used on Issues]

positional arguments:
  labels  Labels to add to given Pull Request or Issue [Can be used on Issues]

Dr CI

usage: @pytorchbot drci 

Update Dr. CI. Updates the Dr. CI comment on the PR in case it's gotten out of sync with actual CI results.

cherry-pick

usage: @pytorchbot cherry-pick --onto ONTO [--fixes FIXES] -c
                               {regression,critical,fixnewfeature,docs,release}

Cherry pick a pull request onto a release branch for inclusion in a release

optional arguments:
  --onto ONTO           Branch you would like to cherry pick onto (Example: release/2.1)
  --fixes FIXES         Link to the issue that your PR fixes (Example: https://github.com/pytorch/pytorch/issues/110666)
  -c {regression,critical,fixnewfeature,docs,release}, --classification {regression,critical,fixnewfeature,docs,release}
                        A machine-friendly classification of the cherry-pick reason.

Close

usage: @pytorchbot close

Close a PR [Can be used on issues]

int3 · 2024-09-27T16:52:49Z

@pytorchbot revert -m "internal tests failing" -c ghfirst

pytorchmergebot · 2024-09-27T16:54:18Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2024-09-27T16:54:30Z

@int3 your PR has been successfully reverted.

int3 · 2024-09-27T18:06:33Z

@int3 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang rec Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968) [ghstack-poisoned]

int3 · 2024-09-30T04:37:18Z

@int3 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

The goal is to use Inductor-generated kernels to stress test the new Triton CPU backend. cc XilunWu H-Huang awgu kwen2501 wanchaol fegin fduwjj wz337 wconstab d4l3k c-p-i-o voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang rec Differential Revision: [D63298968](https://our.internmc.facebook.com/intern/diff/D63298968) [ghstack-poisoned]

facebook-github-bot · 2024-09-30T04:41:41Z

This pull request was exported from Phabricator. Differential Revision: D63298968

int3 · 2024-09-30T12:16:12Z

@pytorchbot merge -f "test failure looks unrelated"

pytorchmergebot · 2024-09-30T12:17:55Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-09-30T12:18:05Z

Merge failed

Reason: This PR has internal changes and must be landed via Phabricator! Please try reimporting/rexporting the PR!

Details for Dev Infra team

Raised by workflow job

int3 · 2024-09-30T20:22:57Z

@pytorchbot merge -f "test failure looks unrelated"

pytorchmergebot · 2024-09-30T20:24:44Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

ThisaraWeerakoon · 2025-12-15T09:16:46Z

@int3 I am trying to experimentally try this PR. Could I know what is the exact triton-cpu build you used ?

[prototype] Add Triton CPU as an Inductor backend

cd603ce

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: dynamo module: inductor labels Aug 14, 2024

Update on "[prototype] Add Triton CPU as an Inductor backend"

e6b5c89

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

int3 commented Aug 14, 2024

View reviewed changes

torch/_inductor/config.py Outdated Show resolved Hide resolved

Update on "[prototype] Add Triton CPU as an Inductor backend"

846415f

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang [ghstack-poisoned]

int3 added a commit that referenced this pull request Aug 14, 2024

[prototype] Add Triton CPU as an Inductor backend

cf6304d

ghstack-source-id: cce3dd0 Pull Request resolved: #133408

int3 commented Aug 14, 2024

View reviewed changes

torch/_inductor/codegen/common.py Outdated Show resolved Hide resolved

int3 commented Aug 14, 2024

View reviewed changes

int3 marked this pull request as draft August 14, 2024 05:30

int3 requested review from desertfire and jansel August 14, 2024 05:30

int3 marked this pull request as ready for review August 14, 2024 05:30

int3 self-assigned this Aug 15, 2024

int3 marked this pull request as draft August 15, 2024 02:38

int3 added a commit that referenced this pull request Aug 15, 2024

[prototype] Add Triton CPU as an Inductor backend

912aa32

ghstack-source-id: c60fcfa Pull Request resolved: #133408

int3 added a commit that referenced this pull request Aug 15, 2024

[prototype] Add Triton CPU as an Inductor backend

fe9fb7f

ghstack-source-id: aaee653 Pull Request resolved: #133408

int3 mentioned this pull request Sep 25, 2024

[inductor] Reduce block sizes when using Triton CPU backend #136612

Closed

int3 added 2 commits September 25, 2024 01:56

malfet approved these changes Sep 26, 2024

View reviewed changes

pytorchbot mentioned this pull request Sep 26, 2024

Make test_skip_data_serialization regex more flexible #136710

Merged

guangyey mentioned this pull request Sep 28, 2024

Use torch.Stream&torch.Event for Dynamo capature #134850

Closed

This was referenced Oct 3, 2024

Have Triton custom extension test use privateuseone device #137273

Closed

Have Triton custom extension test use privateuseone device #137611

Closed

Conversation

int3 commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/133408

❌ 1 New Failure, 2 Unrelated Failures

Uh oh!

Uh oh!

Uh oh!

int3 Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

int3 Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

desertfire Aug 14, 2024

Choose a reason for hiding this comment

Uh oh!

int3 commented Aug 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

desertfire commented Aug 14, 2024

Uh oh!

int3 commented Aug 15, 2024

Uh oh!

int3 commented Sep 26, 2024

Uh oh!

pytorchmergebot commented Sep 26, 2024

Merge failed

Uh oh!

int3 commented Sep 26, 2024

Uh oh!

pytorchmergebot commented Sep 26, 2024

Merge started

Uh oh!

int3 commented Sep 27, 2024

Uh oh!

pytorch-bot bot commented Sep 27, 2024

Uh oh!

int3 commented Sep 27, 2024

Uh oh!

pytorch-bot bot commented Sep 27, 2024

PyTorchBot Help

Merge

Revert

Rebase

Label

Dr CI

cherry-pick

Close

Uh oh!

int3 commented Sep 27, 2024

Uh oh!

pytorchmergebot commented Sep 27, 2024

Uh oh!

pytorchmergebot commented Sep 27, 2024

Uh oh!

int3 commented Sep 27, 2024

Uh oh!

int3 commented Sep 30, 2024

Uh oh!

facebook-github-bot commented Sep 30, 2024

Uh oh!

int3 commented Sep 30, 2024

Uh oh!

pytorchmergebot commented Sep 30, 2024

Merge started

Uh oh!

pytorchmergebot commented Sep 30, 2024

Merge failed

Uh oh!

int3 commented Sep 30, 2024

Uh oh!

pytorchmergebot commented Sep 30, 2024

Merge started

Uh oh!

ThisaraWeerakoon commented Dec 15, 2025

int3 commented Aug 14, 2024 •

edited

Loading

pytorch-bot bot commented Aug 14, 2024 •

edited

Loading

int3 commented Aug 14, 2024 •

edited

Loading