Implement gpu_kernel_multiple_outputs#37969
Conversation
make return value of lambda argument of Loader struct rewrite `prelu_cuda_backward_kernel_share_weight` remove `legacy` methods fix result store function reduce copy-paste remove unused array fix except for tuple
revert changes in CUDALoops move `is_tuple` to Loops.cuh
💊 CI failures summary and remediationsAs of commit 43289f8 (more details on the Dr. CI page):
🕵️ 1 new failure recognized by patternsThe following CI failures do not appear to be due to upstream breakages:
|
zasdfgbnm
left a comment
There was a problem hiding this comment.
@ngimel This refactors the logics of TensorIterator offset calculation, 32bit indexing splitting, etc. of prelu kernel into gpu_kernel_multiple_outputs, so that the codes in Activation.cu become cleaner without these logics. It is a gpu kernel that supports multiple outputs, but does not support dynamic casting for simplicity. It is supposed to cover some cases where gpu_kernel is not usable.
|
@zasdfgbnm, @ngimel I will import it to phabricator (want to try it out internally). Please, commandeer the diff is ready to land. @crcrpar Please, resolve the merge conflicts whenever you have a chance |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@z-a-f has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
@z-a-f Thanks. Looks like @zasdfgbnm fixed conflicts on behalf of me including applying the fix suggested by @ngimel. Thank you @zasdfgbnm, @ngimel |
|
@ngimel, |
|
@z-a-f This is not ready. Some cut-pastes are needed to make it build on ROCm. Working on it. |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This PR introduces a variant of `gpu_kernel` for functions that return multiple values with `thrust::tuple`. With this I simplified `prelu_cuda_backward_share_weights_kernel`. ### Why using `thrust::tuple`? Because `std::tuple` does not support `operator=` on device code which makes the implementation complicated. Pull Request resolved: pytorch#37969 Reviewed By: paulshaoyuqiao Differential Revision: D22868670 Pulled By: ngimel fbshipit-source-id: eda0a29ac0347ad544b24bf60e3d809a7db1a929
This PR introduces a variant of
gpu_kernelfor functions that return multiple values withthrust::tuple.With this I simplified
prelu_cuda_backward_share_weights_kernel.Why using
thrust::tuple?Because
std::tupledoes not supportoperator=on device code which makes the implementation complicated.