[halide-backend] Dimension-based indexing by jansel · Pull Request #129026 · pytorch/pytorch

jansel · 2024-06-19T01:53:00Z

Stack from ghstack (oldest at bottom):

Prior to this the generated Halide code was a rather literal translation of the Triton code, with XBLOCK/YBLOCK/RBLOCK and 1D inputs. Halide prefers dimensions, and this 1D index triggers a lot of bugs and perf issues. This PR infers dimensions and changes the indexing in the generated code.

Before

@hl.generator(name="kernel")
class Kernel:
    in_ptr0 = hl.InputBuffer(hl.Float(32), 1)
    out_ptr3 = hl.OutputBuffer(hl.Float(32), 2)

    def generate(g):
        in_ptr0 = g.in_ptr0
        out_ptr3 = g.out_ptr3
        xindex = hl.Var('xindex')
        rindex = hl.Var('rindex')
        r1 = rindex
        x0 = xindex
        idom = hl.RDom([hl.Range(0, 16), hl.Range(0, 32)])
        odom = hl.RDom([hl.Range(0, 16)])
        rdom = hl.RDom([hl.Range(0, 32)])
        xindex_idom = idom.x
        xindex_odom = odom.x
        rindex_idom = idom.y
        r1_idom = rindex_idom
        x0_idom = xindex_idom
        x0_odom = xindex_odom
        tmp0 = hl.Func('tmp0')
        tmp0[rindex, xindex] = in_ptr0[r1 + (32*x0)]
        tmp1 = hl.Func('tmp1')
        tmp1[xindex] = hl.maximum(rdom, tmp0[rdom, xindex])
        tmp2 = hl.Func('tmp2')
        tmp2[rindex, xindex] = tmp0[rindex, xindex] - tmp1[xindex]
        tmp3 = hl.Func('tmp3')
        tmp3[rindex, xindex] = hl.fast_exp(hl.cast(hl.Float(32), tmp2[rindex, xindex])) if tmp2.type().bits() <= 32 else hl.exp(tmp2[rindex, xindex])
        tmp4 = hl.Func('tmp4')
        tmp4[xindex] = hl.sum(rdom, tmp3[rdom, xindex])
        tmp5 = hl.Func('tmp5')
        tmp5[rindex, xindex] = tmp3[rindex, xindex] / tmp4[xindex]
        out_ptr3_i0 = hl.Var('out_ptr3_i0')
        out_ptr3_i1 = hl.Var('out_ptr3_i1')
        out_ptr3[out_ptr3_i0, out_ptr3_i1] = hl.cast(out_ptr3.type(), tmp5[out_ptr3_i0, out_ptr3_i1])

        assert g.using_autoscheduler()
        in_ptr0.set_estimates([hl.Range(0, 512)])
        out_ptr3.set_estimates([hl.Range(0, 32), hl.Range(0, 16)])

After

@hl.generator(name="kernel")
class Kernel:
    in_ptr0 = hl.InputBuffer(hl.Float(32), 2)
    out_ptr3 = hl.OutputBuffer(hl.Float(32), 2)

    def generate(g):
        in_ptr0 = g.in_ptr0
        out_ptr3 = g.out_ptr3
        h0 = hl.Var('h0')
        h1 = hl.Var('h1')
        rdom = hl.RDom([hl.Range(0, 32)])
        hr1 = rdom[0]
        tmp0 = hl.Func('tmp0')
        tmp0[h0, h1] = in_ptr0[h0, h1,]
        tmp1 = hl.Func('tmp1')
        tmp1[h1] = hl.maximum(rdom, tmp0[hr1, h1])
        tmp2 = hl.Func('tmp2')
        tmp2[h0, h1] = tmp0[h0, h1] - tmp1[h1]
        tmp3 = hl.Func('tmp3')
        tmp3[h0, h1] = hl.fast_exp(hl.cast(hl.Float(32), tmp2[h0, h1])) if tmp2.type().bits() <= 32 else hl.exp(tmp2[h0, h1])
        tmp4 = hl.Func('tmp4')
        tmp4[h1] = hl.sum(rdom, tmp3[hr1, h1])
        tmp5 = hl.Func('tmp5')
        tmp5[h0, h1] = tmp3[h0, h1] / tmp4[h1]
        out_ptr3[h0, h1,] = hl.cast(hl.Float(32), tmp5[h0, h1])

        assert g.using_autoscheduler()
        in_ptr0.dim(0).set_min(0)
        in_ptr0.dim(0).set_stride(1)
        in_ptr0.dim(0).set_extent(32)
        in_ptr0.dim(1).set_min(0)
        in_ptr0.dim(1).set_stride(32)
        in_ptr0.dim(1).set_extent(16)
        in_ptr0.set_estimates([hl.Range(0, 32), hl.Range(0, 16)])
        out_ptr3.set_estimates([hl.Range(0, 32), hl.Range(0, 16)])

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @voznesenskym @penguinwu @EikanWang @Guobing-Chen @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

[ghstack-poisoned]

pytorch-bot · 2024-06-19T01:53:03Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129026

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit ddfa273 with merge base bc8883a ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

shunting314 · 2024-06-21T22:04:57Z

torch/_inductor/codegen/halide.py

+
+    def __init__(self, expr, size, stride):
+        super().__init__()
+        if V.graph.sizevars.statically_known_leq(stride, 0):


hmm, when do we get a negative stride?

torch/_inductor/codegen/halide.py

shunting314 · 2024-06-21T22:11:50Z

torch/_inductor/codegen/halide.py

+        )
+        eq = V.graph.sizevars.statically_known_equals
+        lt = V.graph.sizevars.statically_known_lt
+        size_hint = functools.partial(V.graph.sizevars.size_hint, fallback=inf)


Should we use an integer rather than a float for the fallback value?

I want it to go last and there is no such thing as a max int in python

For our purposes, int64 max would work right ?

Yeah I suppose, seems like a style preference.

[ghstack-poisoned]

eellison · 2024-06-24T19:35:22Z

torch/_inductor/codegen/halide.py

+                        try:
+                            code.writeline(
+                                f"{arg.name}.dim({i}).set_stride({int(dim.stride)})"
+                            )
+                        except TypeError:
+                            pass  # not integer
+                        try:
+                            code.writeline(
+                                f"{arg.name}.dim({i}).set_extent({int(dim.size)})"
+                            )
+                        except TypeError:
+                            pass  # not integer


Could query is_integer to avoid the try/except

It might be a regular int (not sympy)

eellison · 2024-06-24T19:38:21Z

torch/_inductor/codegen/halide.py

+        )
+        eq = V.graph.sizevars.statically_known_equals
+        lt = V.graph.sizevars.statically_known_lt
+        size_hint = functools.partial(V.graph.sizevars.size_hint, fallback=inf)


For our purposes, int64 max would work right ?

eellison · 2024-06-24T20:02:15Z

torch/_inductor/codegen/halide.py

+        line = f"{var}[{index_str},]"  # trailing comma workaround for https://github.com/halide/Halide/issues/8299
        dtype = V.graph.get_dtype(name)
        if dtype in (torch.float16, torch.bfloat16):
+            dtype = torch.float32


nit: factor out to dtype_to_compute_dtype similar to triton codegen ?

eellison · 2024-06-24T20:13:30Z

torch/_inductor/codegen/halide.py

+            all_used_symbols.update(super().prepare_indexing(index).free_symbols)
+
+        had_fallback = False
+        for tree in reversed(self.range_trees):


nit: maybe factor out to helper function

[ghstack-poisoned]

Pull Request resolved: #127506 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026

Requires halide/Halide#8255 Pull Request resolved: #129036 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026, #127506

In theory Halide doesn't need the split reduction stuff we do for Triton since it can generate multiple kernels. Pull Request resolved: #129320 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026, #127506, #129036

Currently using this for some by-hand hacking, but might need to implement our own scheduler later. Pull Request resolved: #129321 Approved by: https://github.com/shunting314 ghstack dependencies: #126417, #129025, #129026, #127506, #129036, #129320

Update

d0d9583

[ghstack-poisoned]

This was referenced Jun 19, 2024

[inductor] Refactors for Halide backend #129024

Closed

[halide-backend] Initial implementation of HalideKernel and HalideScheduling #126417

Closed

jansel mentioned this pull request Jun 19, 2024

[halide-backend] Generate standalone runtime #129025

Closed

pytorch-bot bot added ciflow/inductor module: cpu CPU specific problem (e.g., perf, algorithm) module: inductor labels Jun 19, 2024

jansel mentioned this pull request Jun 19, 2024

[halide-backend] Add GPU support #127506

Closed

Update

9fd99c6

[ghstack-poisoned]

This was referenced Jun 19, 2024

[inductor] Run more test on correct device #129033

Closed

[inductor] Add --inductor-config benchmark flag #129034

Closed

[halide-backend] Support scan kernels #129035

Closed

[halide-backend] Enable bfloat16 support #129036

Closed

jansel added the release notes: inductor label Jun 19, 2024

jansel added 4 commits June 18, 2024 21:22

Update

e83a009

[ghstack-poisoned]

Update

0e12c81

[ghstack-poisoned]

Update

5496a59

[ghstack-poisoned]

Update

6914e42

[ghstack-poisoned]

jansel mentioned this pull request Jun 21, 2024

[benchmarking] Add join_results.py #129202

Closed

jansel requested review from FindHao, eellison and shunting314 June 21, 2024 17:33

Update

dfc9ff6

[ghstack-poisoned]

shunting314 approved these changes Jun 21, 2024

View reviewed changes

jansel added 4 commits June 21, 2024 20:19

Update

8464435

[ghstack-poisoned]

Update

a9ba5cc

[ghstack-poisoned]

Update

08be113

[ghstack-poisoned]

Update

e14f414

[ghstack-poisoned]

jansel mentioned this pull request Jun 22, 2024

[halide-backend] Random number generation #129314

Closed

Update

95d5c3c

[ghstack-poisoned]

This was referenced Jun 23, 2024

[halide-backend] Disable split reductions for Halide #129320

Closed

[halide-backend] Support manual schedules #129321

Closed

eellison approved these changes Jun 24, 2024

View reviewed changes

jansel added 4 commits June 25, 2024 13:56

Update

72d62a5

[ghstack-poisoned]

Update

29c97b4

[ghstack-poisoned]

Update

5bc6039

[ghstack-poisoned]

Update

ddfa273

[ghstack-poisoned]

pytorchmergebot added the Merged label Jun 29, 2024

pytorchmergebot closed this in 86cadc6 Jun 29, 2024

pytorchmergebot pushed a commit that referenced this pull request Jun 29, 2024

[halide-backend] Add GPU support (#127506)

b93bf55

Pull Request resolved: #127506 Approved by: https://github.com/shunting314, https://github.com/eellison ghstack dependencies: #126417, #129025, #129026

github-actions bot deleted the gh/jansel/354/head branch July 31, 2024 01:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[halide-backend] Dimension-based indexing#129026

[halide-backend] Dimension-based indexing#129026
jansel wants to merge 16 commits intogh/jansel/354/basefrom
gh/jansel/354/head

jansel commented Jun 19, 2024 •

edited

Loading

Uh oh!

pytorch-bot bot commented Jun 19, 2024 •

edited

Loading

Uh oh!

shunting314 Jun 21, 2024

Uh oh!

jansel Jun 22, 2024

Uh oh!

Uh oh!

shunting314 Jun 21, 2024

Uh oh!

jansel Jun 22, 2024

Uh oh!

eellison Jun 24, 2024

Uh oh!

jansel Jun 26, 2024

Uh oh!

eellison Jun 24, 2024

Uh oh!

jansel Jun 26, 2024

Uh oh!

eellison Jun 24, 2024

Uh oh!

eellison Jun 24, 2024

Uh oh!

eellison Jun 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jansel commented Jun 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jun 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/129026

✅ No Failures

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jansel commented Jun 19, 2024 •

edited

Loading

pytorch-bot bot commented Jun 19, 2024 •

edited

Loading