Implement `gpu_kernel_multiple_outputs` by crcrpar · Pull Request #37969 · pytorch/pytorch

crcrpar · 2020-05-06T22:20:44Z

This PR introduces a variant of gpu_kernel for functions that return multiple values with thrust::tuple.
With this I simplified prelu_cuda_backward_share_weights_kernel.

Why using `thrust::tuple`?

Because std::tuple does not support operator= on device code which makes the implementation complicated.

make return value of lambda argument of Loader struct rewrite `prelu_cuda_backward_kernel_share_weight` remove `legacy` methods fix result store function reduce copy-paste remove unused array fix except for tuple

revert changes in CUDALoops move `is_tuple` to Loops.cuh

- changes some asserts - typename out_t -> typename ...Args and thrust::tuple<Args...>

dr-ci · 2020-05-06T22:24:19Z

💊 CI failures summary and remediations

As of commit 43289f8 (more details on the Dr. CI page):

3/3 failures possibly* introduced in this PR
- 2/3 non-CircleCI failure(s)

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_windows_vs2019_py36_cuda11.0_build (1/1)

Step: "Build" (full log | diagnosis details | 🔁 rerun)

Error generating file

Retry attempt 3: 
C:/Users/circleci/project/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(236): error: identifier "cusparseScsrmm2" is undefined

C:/Users/circleci/project/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu(259): error: identifier "cusparseDcsrmm2" is undefined

2 errors detected in the compilation of "C:/Users/circleci/project/aten/src/ATen/native/sparse/cuda/SparseCUDABlas.cu".
SparseCUDABlas.cu
-- Removing C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/./torch_cuda_generated_SparseCUDABlas.cu.obj
C:/Jenkins/Miniconda3/Library/bin/cmake.exe -E remove C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/./torch_cuda_generated_SparseCUDABlas.cu.obj
CMake Error at torch_cuda_generated_SparseCUDABlas.cu.obj.Release.cmake:281 (message):
  Error generating file
  C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/sparse/cuda/./torch_cuda_generated_SparseCUDABlas.cu.obj


a3\Library\bin\cmake.exe -D verbose:BOOL=ON -D build_configuration:STRING=Release -D generated_file:STRING=C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/core/./torch_cuda_generated_context_gpu.cu.obj -D generated_cubin_file:STRING=C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/core/./torch_cuda_generated_context_gpu.cu.obj.cubin.txt -P C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/core/torch_cuda_generated_context_gpu.cu.obj.Release.cmake" 
-- Removing C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/core/./torch_cuda_generated_context_gpu.cu.obj
C:/Jenkins/Miniconda3/Library/bin/cmake.exe -E remove C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/core/./torch_cuda_generated_context_gpu.cu.obj
-- Generating dependency file: C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/core/torch_cuda_generated_context_gpu.cu.obj.NVCC-depend
hird_party/catch/single_include -IC:/Users/circleci/project/aten/src/ATen/.. -IC:/Users/circleci/project/build/caffe2/aten/src/ATen -IC:/Users/circleci/project/c10/cuda/../.. -IC:/Users/circleci/project/c10/../ "-IC:/Program Files/NVIDIA Corporation/NvToolsExt/include" -IC:/Users/circleci/project/torch/csrc/api -IC:/Users/circleci/project/torch/csrc/api/include -IC:/Users/circleci/project/build/third_party/ideep/mkl-dnn/include -IC:/Users/circleci/project/third_party/ideep/mkl-dnn/src/../include
context_gpu.cu 
-- Generating temporary cmake readable file: C:/Users/circleci/project/build/caffe2/CMakeFiles/torch_cuda.dir/core/torch_cuda_generated_context_gpu.cu.obj.depend.tmp

ci.pytorch.org: 2 failed

Failed: pr/caffe2-pytorch-linux-xenial-rocm3.5.1-py3.6-test
Failed: pr/pytorch-linux-xenial-rocm3.5.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 20 times.

zasdfgbnm

@ngimel This refactors the logics of TensorIterator offset calculation, 32bit indexing splitting, etc. of prelu kernel into gpu_kernel_multiple_outputs, so that the codes in Activation.cu become cleaner without these logics. It is a gpu kernel that supports multiple outputs, but does not support dynamic casting for simplicity. It is supposed to cover some cases where gpu_kernel is not usable.

z-a-f · 2020-07-31T16:53:19Z

@zasdfgbnm, @ngimel I will import it to phabricator (want to try it out internally). Please, commandeer the diff is ready to land.

@crcrpar Please, resolve the merge conflicts whenever you have a chance

facebook-github-bot

@z-a-f has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

crcrpar · 2020-08-01T02:04:36Z

@z-a-f Thanks. Looks like @zasdfgbnm fixed conflicts on behalf of me including applying the fix suggested by @ngimel.

Thank you @zasdfgbnm, @ngimel

z-a-f · 2020-08-05T21:07:46Z

@ngimel, ~~Should I land this?~~ Is it ready to be landed?

zasdfgbnm · 2020-08-05T21:19:38Z

@z-a-f This is not ready. Some cut-pastes are needed to make it build on ROCm. Working on it.

…puts-gpu-kernel

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-08-06T06:11:53Z

@ngimel merged this pull request in eb9ae7c.

Summary: This PR introduces a variant of `gpu_kernel` for functions that return multiple values with `thrust::tuple`. With this I simplified `prelu_cuda_backward_share_weights_kernel`. ### Why using `thrust::tuple`? Because `std::tuple` does not support `operator=` on device code which makes the implementation complicated. Pull Request resolved: pytorch#37969 Reviewed By: paulshaoyuqiao Differential Revision: D22868670 Pulled By: ngimel fbshipit-source-id: eda0a29ac0347ad544b24bf60e3d809a7db1a929

crcrpar added 6 commits May 6, 2020 15:13

[WIP] gpu_kernel for lambda returning tuple

2b1dafb

make return value of lambda argument of Loader struct rewrite `prelu_cuda_backward_kernel_share_weight` remove `legacy` methods fix result store function reduce copy-paste remove unused array fix except for tuple

use thrust::tuple for = on device

4560694

make compiler generate constructor

6744245

add pass for contiguous

09a3cb4

move multiple outputs kernel to loops

a9b15d7

revert changes in CUDALoops move `is_tuple` to Loops.cuh

cosmetic

50bd724

- changes some asserts - typename out_t -> typename ...Args and thrust::tuple<Args...>

pytorchbot added the open source label May 6, 2020

zasdfgbnm approved these changes May 7, 2020

View reviewed changes

zasdfgbnm requested a review from ngimel May 7, 2020 01:23

facebook-github-bot reviewed Jul 31, 2020

View reviewed changes

Merge branch 'master' into multiple-outputs-gpu-kernel

e5460bd

facebook-github-bot reviewed Jul 31, 2020

View reviewed changes

zasdfgbnm added 2 commits July 31, 2020 12:31

Update Loops.cuh

3670d5a

fix

5c563b1

ngimel reviewed Jul 31, 2020

View reviewed changes

Comment thread aten/src/ATen/native/cuda/MemoryAccess.cuh

facebook-github-bot reviewed Jul 31, 2020

View reviewed changes

zasdfgbnm added 3 commits August 5, 2020 14:29

Move code to make it build on ROCm

a30bba1

Merge branch 'master' of github.com:pytorch/pytorch into multiple-out…

904b0c6

…puts-gpu-kernel

fix constants

43289f8

facebook-github-bot reviewed Aug 5, 2020

View reviewed changes

facebook-github-bot closed this in eb9ae7c Aug 6, 2020

facebook-github-bot added the merged label Aug 6, 2020

mruberry added the Merged label Oct 28, 2020

RockingJavaBean mentioned this pull request Jan 26, 2021

Function request: support returning multiple values in CPU kernel #51108

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement `gpu_kernel_multiple_outputs`#37969

Implement `gpu_kernel_multiple_outputs`#37969
crcrpar wants to merge 12 commits intopytorch:masterfrom
crcrpar:multiple-outputs-gpu-kernel

crcrpar commented May 6, 2020 •

edited

Loading

Uh oh!

dr-ci Bot commented May 6, 2020 •

edited

Loading

Uh oh!

zasdfgbnm left a comment

Uh oh!

z-a-f commented Jul 31, 2020

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

Uh oh!

facebook-github-bot left a comment

Uh oh!

crcrpar commented Aug 1, 2020

Uh oh!

z-a-f commented Aug 5, 2020 •

edited

Loading

Uh oh!

zasdfgbnm commented Aug 5, 2020 •

edited

Loading

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Aug 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

crcrpar commented May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why using thrust::tuple?

Uh oh!

dr-ci Bot commented May 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_windows_vs2019_py36_cuda11.0_build (1/1)

ci.pytorch.org: 2 failed

Uh oh!

zasdfgbnm left a comment

Choose a reason for hiding this comment

Uh oh!

z-a-f commented Jul 31, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

crcrpar commented Aug 1, 2020

Uh oh!

z-a-f commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zasdfgbnm commented Aug 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Aug 6, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

crcrpar commented May 6, 2020 •

edited

Loading

Why using `thrust::tuple`?

dr-ci Bot commented May 6, 2020 •

edited

Loading

z-a-f commented Aug 5, 2020 •

edited

Loading

zasdfgbnm commented Aug 5, 2020 •

edited

Loading