Move where cuda implementation to TensorIterator#32984
Move where cuda implementation to TensorIterator#32984zasdfgbnm wants to merge 18 commits intopytorch:masterfrom
Conversation
|
|
||
| } // namespace modern | ||
|
|
||
| template<typename func_t, int nargs=function_traits<func_t>::arity> |
There was a problem hiding this comment.
moved to Loops.cuh
|
|
||
| } // namespace modern | ||
|
|
||
| template<typename func_t, int nargs=function_traits<func_t>::arity> |
There was a problem hiding this comment.
moved to Loops.cuh
| #include <ATen/native/cuda/ROCmLoops.cuh> | ||
| #endif | ||
|
|
||
| namespace at { namespace native { |
There was a problem hiding this comment.
Moved from CUDALoops.cuh and ROCmLoops.cuh, this part of the code is identical for CUDA and ROCm.
|
|
||
| namespace at { namespace native { namespace modern { namespace detail { | ||
|
|
||
| template<typename func_t, int remaining=function_traits<func_t>::arity-1> |
There was a problem hiding this comment.
this part is newly added
| arg0_t result = legacy::invoke(f, &data.data[1], &strides.data[1], &dtypes.data[1], idx); | ||
| c10::cast_and_store<arg0_t>(dtypes[0], out, result); | ||
| }); | ||
| } else if (iter.has_contiguous_first_dim() && modern::detail::has_same_arg_types<func_t>::value) { |
There was a problem hiding this comment.
This is the only line changed for this copy-pasted chunk of code.
|
I hope this enables #9190 |
|
@vadimkantorov Unfortunately this doesn't... |
💊 CircleCI build failures summary and remediationsAs of commit f5e1114: None of the build failures appear to be your fault.
Detailed failure analysisOne may explore the probable reasons each build failed interactively on the Dr. CI website. 🚧 1 upstream failure recognized by patterns:These builds matched patterns, but were probably caused by upstream breakages:
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 16 times. |
aten/src/ATen/native/cuda/Loops.cuh
Outdated
| }; | ||
|
|
||
| // simple compile time test for has_same_arg_types: | ||
| using func1_t = int (*)(float, float); |
There was a problem hiding this comment.
This belongs in tests, not in actual source?
There was a problem hiding this comment.
Yes, this is a compile-time unit test for has_same_arg_types. Maybe I should remove it from Loops.cuh and move to somewhere else?
| using traits = function_traits<func_t>; | ||
| static constexpr bool value = std::is_same< | ||
| typename traits::template arg<remaining>::type, | ||
| typename traits::template arg<remaining-1>::type |
There was a problem hiding this comment.
out of curiosity, how does this work with -1?
There was a problem hiding this comment.
It is specialized as true as in the code below. For nullary function, arity == 0, therefore has_same_arg_types<func_t> will becomes has_same_arg_types<func_t, function_traits<func_t>::arity-1> which is has_same_arg_types<func_t, -1>
|
|
||
| namespace at { namespace native { | ||
|
|
||
| // `needs_dynamic_casting` compares the types expected by iterator |
There was a problem hiding this comment.
Cool, so it'll be good to go once you move out tests from Loops.cuh.
There was a problem hiding this comment.
Test moved to cuda_vectorized_test.cu
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: `where` is special because the arguments do not have the same type, which does not satisfy the assumption in modern pytorch#32383. I migrate it to TensorIterator so that there is something to test that this case is not broken. Currently, this case fallback to using legacy (not vectorized, not unrolled) code. It should be supported in the future when I cleanup `Loops.cuh`. I also move some sharing part of `CUDALoops.cuh` and `ROCmLoops.cuh` into `Loops.cuh` so that to logic for checking whether `func_t` has the same arg types could be shared. Pull Request resolved: pytorch#32984 Differential Revision: D19825127 Pulled By: ngimel fbshipit-source-id: bbf4682349d96b4480c4d657f3c18a3a67a9bf17
Summary: Reopen of pytorch#32984 Pull Request resolved: pytorch#33228 Differential Revision: D19850862 Pulled By: ngimel fbshipit-source-id: b92446a49b4980188fa4788220a2164650e905c2
whereis special because the arguments do not have the same type, which does not satisfy the assumption in modern #32383. I migrate it to TensorIterator so that there is something to test that this case is not broken. Currently, this case fallback to using legacy (not vectorized, not unrolled) code. It should be supported in the future when I cleanupLoops.cuh.I also move some sharing part of
CUDALoops.cuhandROCmLoops.cuhintoLoops.cuhso that to logic for checking whetherfunc_thas the same arg types could be shared.