Kernel IR: Splitting CUDA codegen from IrPrinter by tlemo · Pull Request #379 · csarofeen/pytorch

tlemo · 2020-09-15T17:40:01Z

One of the main goals of having a dedicated kernel IR was separation of concerns: simpler and smaller components which do one thing instead of monolithic implementations.

This PR is a significant step in that direction: the CUDA code generation is now separate from the IrPrinter.

rdspring1 · 2020-09-15T17:54:57Z

+  if ((((((blockIdx.x * 1) + (1 - 1)) * 128) + threadIdx.x) < T0.size[0])) {
+    for(size_t i6 = 0; i6 < 1; ++i6) {
+      T2[i6]
+         = T0[((((blockIdx.x * 1) + i6) * 128) + threadIdx.x)] * T1[((((blockIdx.x * 1) + i6) * 128) + threadIdx.x)];
+      T3[((((blockIdx.x * 1) + i6) * 128) + threadIdx.x)]
+         = T2[i6] * T0[((((blockIdx.x * 1) + i6) * 128) + threadIdx.x)];


I like collapsing the parenthesis, but I'd prefer to have the operators on separate lines for readability.

T2[] = T0[] * T1[]

For reads and writes, I'm fine with this: T2[] = T0[]

I agree. The formatting is not final, and I was planning to revisit it in a follow up PR to keep the changes a bit smaller (and also since we have opportunities to improve the formatting while also simplify the codegen code itself)

But if this is something we don't want to wait for, I'd be happy to update this PR.

I'm just adding my 2 cents on the kernel formatting. 😄

I also noticed that the for-loop is redundant, since it is only run once.

I also noticed that the for-loop is redundant, since it is only run once.

Yep. That's a completely different beast altogether though. We're not doing any low-level optimizations today (but we could, and probably should - another reason to have a standalone kernel IR)

naoyam · 2020-09-15T20:47:18Z

+
+  // Predicate map
+  // TODO(kir): consider a simpler, kernel IR based version
+  ThreadPredicateMap predicate_map_;


IIRC, the only reason we need to keep this mapping is for code generation of broadcastOp. A device function, blockBroacast, must be used when broadcasting thread-parallelized dimensions. Whether we should call that function is currently only determined at the code-gen time, but really I think this should be captured when lowering to KIR. One idea may be to have BlockBroadcast KIR node and generate that KIR node instance when FIR is lowered to KIR.

One idea may be to have BlockBroadcast KIR node and generate that KIR node instance when FIR is lowered to KIR.

I really like this idea. In general, I think this is the right pattern for conditional code generation: generate the intended operations during lowering rather than deciding what to print at the last minute.

naoyam

Looks good. Left a comment on ThreadPredicateMap.

tlemo added 30 commits August 21, 2020 11:31

Split the origin (def) links between Fusion IR and Kernel IR

1bf4028

Merge remote-tracking branch 'origin/20_8_18_devel' into kernel_ir

0420efc

Merge remote-tracking branch 'origin/20_8_18_devel' into kernel_ir

52daa86

Merge remote-tracking branch 'origin/20_8_18_devel' into kernel_ir

6685712

Merge remote-tracking branch 'origin/20_8_18_devel' into kernel_ir

81d4647

Merge remote-tracking branch 'origin/20_8_18_devel' into kernel_ir

60f9ed3

Minor cleanup

65b6469

Merge remote-tracking branch 'origin/20_8_18_devel' into kernel_ir

f97d304

Minor comment

a375394

Minor cleanup

151cdb4

Factor out the code generation and kernel state

255e52e

Merge remote-tracking branch 'origin/20_8_18_devel' into kernel_ir

38929bf

clang-format

737a273

Merge remote-tracking branch 'origin/20_8_18_devel' into kernel_ir

0d48138

revert .build_profile addition

2f0c751

IrPrinter cleanup

1e007e6

Merge remote-tracking branch 'origin/20_8_18_devel' into kernel_ir

470a687

Checkpoint

be28cca

Checkpoint

e1fde84

Checkpoint

0841abd

Checkpoint

21aa708

Checkpoint

8f2b240

Checkpoint

5a8254a

Improved formatting of the generated code

d5be86a

Checkpoint

6043ced

Fix a few small issues

2cec423

Small fix

8006ffb

Generated code formatting tweaks

b97292e

Small fix

a334546

Update testGPU_FusionParser

2252bc8

tlemo added 4 commits September 14, 2020 16:58

fix genPrologue()

b02ac93

Merge remote-tracking branch 'origin/20_8_18_devel' into kernel_ir

103f690

Integrate the codegen changes from PR #376

573cc0a

clang-format

bdb42a7

tlemo requested review from naoyam and rdspring1 September 15, 2020 17:40

rdspring1 reviewed Sep 15, 2020

View reviewed changes

please the almighty clang-tidy

e89bb3b

naoyam reviewed Sep 15, 2020

View reviewed changes

naoyam approved these changes Sep 15, 2020

View reviewed changes

tlemo merged commit 385fb96 into 20_8_18_devel Sep 15, 2020

tlemo deleted the kernel_ir_part7c branch September 15, 2020 21:17

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kernel IR: Splitting CUDA codegen from IrPrinter#379

Kernel IR: Splitting CUDA codegen from IrPrinter#379
tlemo merged 35 commits into20_8_18_develfrom
kernel_ir_part7c

tlemo commented Sep 15, 2020

Uh oh!

rdspring1 Sep 15, 2020 •

edited

Loading

Uh oh!

tlemo Sep 15, 2020 •

edited

Loading

Uh oh!

rdspring1 Sep 15, 2020

Uh oh!

tlemo Sep 15, 2020

Uh oh!

naoyam Sep 15, 2020

Uh oh!

tlemo Sep 15, 2020

Uh oh!

naoyam left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

tlemo commented Sep 15, 2020

Uh oh!

rdspring1 Sep 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tlemo Sep 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rdspring1 Sep 15, 2020

Choose a reason for hiding this comment

Uh oh!

tlemo Sep 15, 2020

Choose a reason for hiding this comment

Uh oh!

naoyam Sep 15, 2020

Choose a reason for hiding this comment

Uh oh!

tlemo Sep 15, 2020

Choose a reason for hiding this comment

Uh oh!

naoyam left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rdspring1 Sep 15, 2020 •

edited

Loading

tlemo Sep 15, 2020 •

edited

Loading