[halide-backend] Initial implementation of HalideKernel and HalideScheduling by jansel · Pull Request #126417 · pytorch/pytorch

jansel · 2024-05-16T17:20:36Z

Stack from ghstack (oldest at bottom):

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-05-16T17:20:39Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126417

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 20c7437 with merge base bc8883a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

ghstack-source-id: 227c314 Pull Request resolved: #126417

[ghstack-poisoned]

ghstack-source-id: 77b1d63 Pull Request resolved: #126417

[ghstack-poisoned]

ghstack-source-id: 6934e15 Pull Request resolved: #126417

[ghstack-poisoned]

ghstack-source-id: af65a4e Pull Request resolved: #126417

[ghstack-poisoned]

This puts the halide runtime in a global shared object, rather than copying it to each kernel. Having many copies of the runtime causes many issues with cuda. Pull Request resolved: #129025 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417

fbgheith · 2024-06-24T16:48:11Z

@pytorchbot revert -m "breaking internal builds" -c ghfirst

pytorchmergebot · 2024-06-24T16:50:04Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

…alideScheduling (#126417)" This reverts commit 4f9399b. Reverted #126417 on behalf of https://github.com/fbgheith due to breaking internal builds ([comment](#126417 (comment)))

pytorchmergebot · 2024-06-24T16:50:18Z

@jansel your PR has been successfully reverted.

[ghstack-poisoned]

This puts the halide runtime in a global shared object, rather than copying it to each kernel. Having many copies of the runtime causes many issues with cuda. Pull Request resolved: #129025 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417

Prior to this the generated Halide code was a rather literal translation of the Triton code, with XBLOCK/YBLOCK/RBLOCK and 1D inputs. Halide prefers dimensions, and this 1D index triggers a lot of bugs and perf issues. This PR infers dimensions and changes the indexing in the generated code. Before ```py @hl.generator(name="kernel") class Kernel: in_ptr0 = hl.InputBuffer(hl.Float(32), 1) out_ptr3 = hl.OutputBuffer(hl.Float(32), 2) def generate(g): in_ptr0 = g.in_ptr0 out_ptr3 = g.out_ptr3 xindex = hl.Var('xindex') rindex = hl.Var('rindex') r1 = rindex x0 = xindex idom = hl.RDom([hl.Range(0, 16), hl.Range(0, 32)]) odom = hl.RDom([hl.Range(0, 16)]) rdom = hl.RDom([hl.Range(0, 32)]) xindex_idom = idom.x xindex_odom = odom.x rindex_idom = idom.y r1_idom = rindex_idom x0_idom = xindex_idom x0_odom = xindex_odom tmp0 = hl.Func('tmp0') tmp0[rindex, xindex] = in_ptr0[r1 + (32*x0)] tmp1 = hl.Func('tmp1') tmp1[xindex] = hl.maximum(rdom, tmp0[rdom, xindex]) tmp2 = hl.Func('tmp2') tmp2[rindex, xindex] = tmp0[rindex, xindex] - tmp1[xindex] tmp3 = hl.Func('tmp3') tmp3[rindex, xindex] = hl.fast_exp(hl.cast(hl.Float(32), tmp2[rindex, xindex])) if tmp2.type().bits() <= 32 else hl.exp(tmp2[rindex, xindex]) tmp4 = hl.Func('tmp4') tmp4[xindex] = hl.sum(rdom, tmp3[rdom, xindex]) tmp5 = hl.Func('tmp5') tmp5[rindex, xindex] = tmp3[rindex, xindex] / tmp4[xindex] out_ptr3_i0 = hl.Var('out_ptr3_i0') out_ptr3_i1 = hl.Var('out_ptr3_i1') out_ptr3[out_ptr3_i0, out_ptr3_i1] = hl.cast(out_ptr3.type(), tmp5[out_ptr3_i0, out_ptr3_i1]) assert g.using_autoscheduler() in_ptr0.set_estimates([hl.Range(0, 512)]) out_ptr3.set_estimates([hl.Range(0, 32), hl.Range(0, 16)]) ``` After ```py @hl.generator(name="kernel") class Kernel: in_ptr0 = hl.InputBuffer(hl.Float(32), 2) out_ptr3 = hl.OutputBuffer(hl.Float(32), 2) def generate(g): in_ptr0 = g.in_ptr0 out_ptr3 = g.out_ptr3 h0 = hl.Var('h0') h1 = hl.Var('h1') rdom = hl.RDom([hl.Range(0, 32)]) hr1 = rdom[0] tmp0 = hl.Func('tmp0') tmp0[h0, h1] = in_ptr0[h0, h1,] tmp1 = hl.Func('tmp1') tmp1[h1] = hl.maximum(rdom, tmp0[hr1, h1]) tmp2 = hl.Func('tmp2') tmp2[h0, h1] = tmp0[h0, h1] - tmp1[h1] tmp3 = hl.Func('tmp3') tmp3[h0, h1] = hl.fast_exp(hl.cast(hl.Float(32), tmp2[h0, h1])) if tmp2.type().bits() <= 32 else hl.exp(tmp2[h0, h1]) tmp4 = hl.Func('tmp4') tmp4[h1] = hl.sum(rdom, tmp3[hr1, h1]) tmp5 = hl.Func('tmp5') tmp5[h0, h1] = tmp3[h0, h1] / tmp4[h1] out_ptr3[h0, h1,] = hl.cast(hl.Float(32), tmp5[h0, h1]) assert g.using_autoscheduler() in_ptr0.dim(0).set_min(0) in_ptr0.dim(0).set_stride(1) in_ptr0.dim(0).set_extent(32) in_ptr0.dim(1).set_min(0) in_ptr0.dim(1).set_stride(32) in_ptr0.dim(1).set_extent(16) in_ptr0.set_estimates([hl.Range(0, 32), hl.Range(0, 16)]) out_ptr3.set_estimates([hl.Range(0, 32), hl.Range(0, 16)]) ``` Pull Request resolved: #129026 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025

Pull Request resolved: #127506 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026

Requires halide/Halide#8255 Pull Request resolved: #129036 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026, #127506

In theory Halide doesn't need the split reduction stuff we do for Triton since it can generate multiple kernels. Pull Request resolved: #129320 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026, #127506, #129036

Currently using this for some by-hand hacking, but might need to implement our own scheduler later. Pull Request resolved: #129321 Approved by: https://github.com/shunting314 ghstack dependencies: #126417, #129025, #129026, #127506, #129036, #129320

Update

97a8213

[ghstack-poisoned]

This was referenced May 16, 2024

[halide-backend] Refactor codegen/triton.py into codegen/simd.py #126415

Closed

[halide-backend] Add HalideCodeCache #126416

Closed

pytorch-bot bot added ciflow/inductor module: inductor labels May 16, 2024

Update

3e16046

[ghstack-poisoned]

jansel added a commit that referenced this pull request May 16, 2024

[halide-backend] Add HalideKernel and HalideScheduling

c8dc38c

ghstack-source-id: 227c314 Pull Request resolved: #126417

jansel added the release notes: inductor label May 16, 2024

Update

6deb0e8

[ghstack-poisoned]

Update

eae10fc

[ghstack-poisoned]

Update

a74626d

[ghstack-poisoned]

jansel added a commit that referenced this pull request May 17, 2024

[halide-backend] Add HalideKernel and HalideScheduling

64db385

ghstack-source-id: 77b1d63 Pull Request resolved: #126417

Update

104f412

[ghstack-poisoned]

Update

7d1ead1

[ghstack-poisoned]

jansel added a commit that referenced this pull request May 17, 2024

[halide-backend] Add HalideKernel and HalideScheduling

52fe273

ghstack-source-id: 6934e15 Pull Request resolved: #126417

Update

ab64321

[ghstack-poisoned]

Update

8ae3a41

[ghstack-poisoned]

jansel added a commit that referenced this pull request May 17, 2024

[halide-backend] Add HalideKernel and HalideScheduling

d0fc7a8

ghstack-source-id: af65a4e Pull Request resolved: #126417

jansel mentioned this pull request May 19, 2024

[inductor] Add kernel_code logging artifact #126631

Closed

Update

505c1f5

[ghstack-poisoned]

jansel mentioned this pull request May 19, 2024

Python bindings don't support bfloat16 (yet) halide/Halide#6849

Open

Update

c9fbed9

[ghstack-poisoned]

jansel added 3 commits June 19, 2024 09:36

Update

a203408

[ghstack-poisoned]

Update

b3caa2e

[ghstack-poisoned]

Update

82856a1

[ghstack-poisoned]

jansel mentioned this pull request Jun 21, 2024

[benchmarking] Add join_results.py #129202

Closed

Update

628e0ce

[ghstack-poisoned]

pytorchmergebot closed this in 4f9399b Jun 22, 2024

pytorchmergebot added the Merged label Jun 22, 2024

pytorchmergebot added the Reverted label Jun 24, 2024

pytorchmergebot reopened this Jun 24, 2024

Update

838e1b7

[ghstack-poisoned]

This was referenced Jun 25, 2024

[halide-backend] Disable split reductions for Halide #129320

Closed

[halide-backend] Support manual schedules #129321

Closed

jansel added 3 commits June 25, 2024 23:02

Update

030cc5a

[ghstack-poisoned]

Update

a7e62dd

[ghstack-poisoned]

Update

20c7437

[ghstack-poisoned]

pytorchmergebot closed this in e34b7e6 Jun 29, 2024

pytorchmergebot pushed a commit that referenced this pull request Jun 29, 2024

[halide-backend] Add GPU support (#127506)

b93bf55

Pull Request resolved: #127506 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026

github-actions bot deleted the gh/jansel/335/head branch July 31, 2024 01:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[halide-backend] Initial implementation of HalideKernel and HalideScheduling#126417

[halide-backend] Initial implementation of HalideKernel and HalideScheduling#126417
jansel wants to merge 46 commits intogh/jansel/335/basefrom
gh/jansel/335/head

jansel commented May 16, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented May 16, 2024 •

edited

Loading

Uh oh!

fbgheith commented Jun 24, 2024

Uh oh!

pytorchmergebot commented Jun 24, 2024

Uh oh!

pytorchmergebot commented Jun 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

jansel commented May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126417

✅ No Failures

Uh oh!

fbgheith commented Jun 24, 2024

Uh oh!

pytorchmergebot commented Jun 24, 2024

Uh oh!

pytorchmergebot commented Jun 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

jansel commented May 16, 2024 •

edited

Loading

pytorch-bot bot commented May 16, 2024 •

edited

Loading