Use explicit templates in CUDALoops kernels by malfet · Pull Request #41059 · pytorch/pytorch

malfet · 2020-07-07T02:27:49Z

Follow up after #40992
Use explicit templates instead of lambdas to reduce binary size without affecting the perf by 100-200Kb per arch per CU, namely:
BinaryMulDivKernel.cu 3.8Mb -> 3.5Mb
CompareEQKernel.cu 1.8Mb -> 1.7Mb
BinaryAddSubKernel.cu 2.0Mb -> 1.8Mb
BinaryBitwiseOpsKernels.cu 2.6Mb -> 2.3Mb

dr-ci · 2020-07-07T04:51:53Z

💊 CI failures summary and remediations

As of commit 92b99f7 (more details on the Dr. CI page):

2/3 failures possibly* introduced in this PR
- 2/2 non-CircleCI failure(s)
1/3 broken upstream at merge base bce75a2 from Jul 07 until Jul 08 (17 commits; cc29c19 - de4fc23)

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is newer than viable/strict, you can try basing on an older, stable commit:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase --onto FETCH_HEAD $(git merge-base origin/master HEAD)

If your commit is older than viable/strict:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.

pytorch_linux_backward_compatibility_check_test from Jul 07 until Jul 08 (17 commits; cc29c19 - de4fc23)
- 🔁 rerun

ci.pytorch.org: 2 failed

Failed: pr/caffe2-pytorch-linux-xenial-rocm3.3-py3.6-test
Failed: pr/pytorch-linux-xenial-rocm3.3-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 14 times.

ngimel

lgtm

This reduces binary size from 3.8 to 3.5Mb

…Kernel Reduces sizeof(CompareEQKernel.cu.o) from 1.8Mb to 1.7Mb by eliminating 11 duplicated symbols.

…l.cu This reduces object file size from 2.0 to 1.8Mb

Reduces binary size from 2.6 to 2.3Mb

Reduces binary size with no perf side effects

facebook-github-bot

@malfet is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-07-09T22:16:12Z

@malfet merged this pull request in e374280.

jeffdaily · 2020-07-09T23:12:32Z

@malfet This PR is breaking the ROCm build.

Summary: Reland attempt of #41059 Use explicit templates instead of lambdas to reduce binary size without affecting the perf by 100-200Kb per arch per CU, namely: BinaryMulDivKernel.cu 3.8Mb -> 3.5Mb CompareEQKernel.cu 1.8Mb -> 1.7Mb BinaryAddSubKernel.cu 2.0Mb -> 1.8Mb BinaryBitwiseOpsKernels.cu 2.6Mb -> 2.3Mb Pull Request resolved: #44286 Reviewed By: ngimel Differential Revision: D23859691 Pulled By: malfet fbshipit-source-id: 2c4e86f35e0f94a62294dc5d52a3ba364db23e2d

Summary: Follow up after pytorch#40992 Use explicit templates instead of lambdas to reduce binary size without affecting the perf by 100-200Kb per arch per CU, namely: BinaryMulDivKernel.cu 3.8Mb -> 3.5Mb CompareEQKernel.cu 1.8Mb -> 1.7Mb BinaryAddSubKernel.cu 2.0Mb -> 1.8Mb BinaryBitwiseOpsKernels.cu 2.6Mb -> 2.3Mb Pull Request resolved: pytorch#41059 Differential Revision: D22458928 Pulled By: malfet fbshipit-source-id: cca623bb6e769cfe372977b08463d98b1a02dd14

Summary: Reland attempt of pytorch#41059 Use explicit templates instead of lambdas to reduce binary size without affecting the perf by 100-200Kb per arch per CU, namely: BinaryMulDivKernel.cu 3.8Mb -> 3.5Mb CompareEQKernel.cu 1.8Mb -> 1.7Mb BinaryAddSubKernel.cu 2.0Mb -> 1.8Mb BinaryBitwiseOpsKernels.cu 2.6Mb -> 2.3Mb Pull Request resolved: pytorch#44286 Reviewed By: ngimel Differential Revision: D23859691 Pulled By: malfet fbshipit-source-id: 2c4e86f35e0f94a62294dc5d52a3ba364db23e2d

malfet requested review from gchanan, ngimel and zasdfgbnm July 7, 2020 02:27

malfet force-pushed the malfet/CUDALoops-more-explicit-templates branch 2 times, most recently from 5f31caa to 305c44a Compare July 7, 2020 19:48

ngimel approved these changes Jul 7, 2020

View reviewed changes

Comment thread aten/src/ATen/native/cuda/BinaryMulDivKernel.cu Outdated

malfet added 7 commits July 7, 2020 19:57

Use explicit templates in BinaryMulDivKernel.cu

23201f9

This reduces binary size from 3.8 to 3.5Mb

Use explicit template instead of lambda in Compare(EQ|GE|GT|LE|LT|NE)…

7ef31a9

…Kernel Reduces sizeof(CompareEQKernel.cu.o) from 1.8Mb to 1.7Mb by eliminating 11 duplicated symbols.

Use explicit device functions instead of lambdas in BinaryAddSubKerne…

4b969d5

…l.cu This reduces object file size from 2.0 to 1.8Mb

Use explicit templates in BinaryBitwiseOpsKernels

1cf74da

Reduces binary size from 2.6 to 2.3Mb

Use explicit templates rather than lambdas in ReduceMinmaxKernel

79b32a0

Use explicit template rather than lambda in FillKernel.cu

013fbea

Reduces binary size with no perf side effects

Use explicit templates in AbsKernel

92b99f7

malfet force-pushed the malfet/CUDALoops-more-explicit-templates branch from 305c44a to 92b99f7 Compare July 8, 2020 03:10

facebook-github-bot reviewed Jul 9, 2020

View reviewed changes

facebook-github-bot closed this in e374280 Jul 9, 2020

facebook-github-bot added the merged label Jul 9, 2020

malfet mentioned this pull request Sep 8, 2020

Use explicit templates in CUDALoops kernels #44286

Closed

malfet deleted the malfet/CUDALoops-more-explicit-templates branch September 8, 2020 00:55

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use explicit templates in CUDALoops kernels#41059

Use explicit templates in CUDALoops kernels#41059
malfet wants to merge 7 commits intopytorch:masterfrom
malfet:malfet/CUDALoops-more-explicit-templates

malfet commented Jul 7, 2020 •

edited

Loading

Uh oh!

dr-ci Bot commented Jul 7, 2020 •

edited

Loading

Uh oh!

ngimel left a comment

Uh oh!

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Jul 9, 2020

Uh oh!

jeffdaily commented Jul 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

malfet commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci Bot commented Jul 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🚧 1 fixed upstream failure:

ci.pytorch.org: 2 failed

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 9, 2020

Uh oh!

jeffdaily commented Jul 9, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

malfet commented Jul 7, 2020 •

edited

Loading

dr-ci Bot commented Jul 7, 2020 •

edited

Loading