New CUDA Fuser: Unrolling support, interface refactor by csarofeen · Pull Request #36435 · pytorch/pytorch

csarofeen · 2020-04-11T15:00:23Z

Unrolling support has been added in a way that we get good performing code on GPUs. Not sure how long this link will last but an example of a generated unrolled kernel is:
https://godbolt.org/z/i0uAv3

What can be seen from there is multiple calls of "ld.global.f32" without "ld.store.f32" in between them (and vice versa). This means that we are launching multiple loads that can be run in parallel, as well as multiple stores that can be run in parallel. This can be a crucial optimization for memory bound kernels. This was generally a point of concern in TVM as an attempt of a similar kernel from TVM produces: https://godbolt.org/z/Vu97vG which surrounds load - store pairs in conditional branches preventing the benefits of unrolling.

…pe/loop utils.

…hecking (still not recursive). Add start index to For Loops.

…ditionals back to Int.

…pe/loop utils.

…omains as they are being transformed in these operations.

…iew APIs are pass through.

…ditionals back to Int.

jjsjann123

LGTM.

The failing CI gives very good hints on minor code change. We should fix those.

…unroll_interface_rebase

facebook-github-bot

@soumith has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@soumith has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@soumith has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@soumith has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-04-17T02:18:26Z

@soumith merged this pull request in f11c4f9.

Summary: Unrolling support has been added in a way that we get good performing code on GPUs. Not sure how long this link will last but an example of a generated unrolled kernel is: https://godbolt.org/z/i0uAv3 What can be seen from there is multiple calls of "ld.global.f32" without "ld.store.f32" in between them (and vice versa). This means that we are launching multiple loads that can be run in parallel, as well as multiple stores that can be run in parallel. This can be a crucial optimization for memory bound kernels. This was generally a point of concern in TVM as an attempt of a similar kernel from TVM produces: https://godbolt.org/z/Vu97vG which surrounds load - store pairs in conditional branches preventing the benefits of unrolling. Pull Request resolved: pytorch/pytorch#36435 Reviewed By: ZolotukhinM Differential Revision: D21024011 Pulled By: soumith fbshipit-source-id: e852e282fa7a304aba962e1926f756098c011fe0

Summary: Unrolling support has been added in a way that we get good performing code on GPUs. Not sure how long this link will last but an example of a generated unrolled kernel is: https://godbolt.org/z/i0uAv3 What can be seen from there is multiple calls of "ld.global.f32" without "ld.store.f32" in between them (and vice versa). This means that we are launching multiple loads that can be run in parallel, as well as multiple stores that can be run in parallel. This can be a crucial optimization for memory bound kernels. This was generally a point of concern in TVM as an attempt of a similar kernel from TVM produces: https://godbolt.org/z/Vu97vG which surrounds load - store pairs in conditional branches preventing the benefits of unrolling. Pull Request resolved: pytorch#36435 Reviewed By: ZolotukhinM Differential Revision: D21024011 Pulled By: soumith fbshipit-source-id: e852e282fa7a304aba962e1926f756098c011fe0

csarofeen added 30 commits April 10, 2020 10:15

Major refactor of code lowering and associated parts.

840da18

Don't remove guards around gpu tests.

f2b3982

Minor revisions.

a568e7c

Option to return vals from fusion in registered order.

f24db56

Clang.

fdadf51

Continue lowering refactor, split out loop nest generator, create sco…

a632541

…pe/loop utils.

ForLoop::range renamed to ForLoop::iter_domain

d083d82

Rename IterDomain::size -> IterDomain::extent.

c87d495

Last working test before unrolling. Add incrementally better scalar c…

28ddcf4

…hecking (still not recursive). Add start index to For Loops.

Add basic infrastructure for unrolling pass.

e761dd7

Factor out ir utilities that can be reused during lowering.

bb995a8

Unrolling loops seemingly working.

d6531e9

Clang.

7e52a32

Test fix.

81e8de4

Major refactor of code lowering and associated parts.

c984dc8

Minor revisions.

f82a603

Improve const scalar check. Add some parallelization guards. Move con…

2373ae6

…ditionals back to Int.

Option to return vals from fusion in registered order.

517f15c

Clang.

0df14ef

Continue lowering refactor, split out loop nest generator, create sco…

3da13d9

…pe/loop utils.

Refactor split/merge/reorder so they can be called direcly on tensorD…

ed0d394

…omains as they are being transformed in these operations.

tmp, working.

6ea8820

Transform iter now based on tensor domains, not tensor views.

1a4c089

Rename TensorDomain::size() -> ::nDims

413e649

Make transformations based on TensorDomains, not TensorViews. TensorV…

05467e9

…iew APIs are pass through.

TensorIndex::size renamed to ::nDims.

07dfe39

Re-enable being able to compile a fusion multiple times.

4c735d3

Major refactor of code lowering and associated parts.

87ac1e0

Minor revisions.

a13162e

Improve const scalar check. Add some parallelization guards. Move con…

84b689d

…ditionals back to Int.

csarofeen added 2 commits April 13, 2020 22:30

Clang.

b81e4b7

clang-tidy, build warning->error.

76290fa

jjsjann123 approved these changes Apr 14, 2020

View reviewed changes

csarofeen added 4 commits April 14, 2020 09:44

Clang tidy.

dba93b4

Missed clang-tidy.

c7d7488

Merge branch 'master' of https://www.github.com/pytorch/pytorch into …

5b20c5c

…unroll_interface_rebase

Clang.

c54f2b3

soumith approved these changes Apr 14, 2020

View reviewed changes

facebook-github-bot reviewed Apr 14, 2020

View reviewed changes

csarofeen and others added 3 commits April 15, 2020 15:09

Attempt 1 at fixing virtual/override combination on IterVisitor.

013d940

fixing segfault for predicate

e30ddb1

One more minor thing for overrides

285fea6

facebook-github-bot reviewed Apr 15, 2020

View reviewed changes

Couple more clang warnings->errors

3a4a364

facebook-github-bot reviewed Apr 15, 2020

View reviewed changes

clang format.

318fe76

facebook-github-bot reviewed Apr 15, 2020

View reviewed changes

facebook-github-bot closed this in f11c4f9 Apr 16, 2020

facebook-github-bot added the merged label Apr 17, 2020

csarofeen mentioned this pull request Apr 17, 2020

Unroll Support + Interface Refactor csarofeen/pytorch#15

Closed

csarofeen deleted the unroll_interface_rebase branch May 17, 2020 14:11

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New CUDA Fuser: Unrolling support, interface refactor#36435

New CUDA Fuser: Unrolling support, interface refactor#36435
csarofeen wants to merge 59 commits intopytorch:masterfrom
csarofeen:unroll_interface_rebase

csarofeen commented Apr 11, 2020 •

edited

Loading

Uh oh!

jjsjann123 left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Apr 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

csarofeen commented Apr 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjsjann123 left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Apr 17, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

csarofeen commented Apr 11, 2020 •

edited

Loading