[SYCL] Reverse max work-group size order#1177
Conversation
romanovvlad
left a comment
There was a problem hiding this comment.
Shouldn't we do the same for the OpenCL BE? If so, suggest implementing on the SYCL level.
Yes, this should also be done for the OpenCL BE, but if we want to do it at SYCL level we will be making a special case in |
Why would we want to avoid it? |
Due to the user defined work-group size being flipped before launching kernels, the reported max work-group size for a device must be reversed correspondingly. Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
5e930ae to
c66a644
Compare
That makes perfect sense to me. On second look I can see that there are already plenty specializations of device info queries, so I've changed the approach to be for both CUDA and OpenCL. |
…ctor_tests * origin/sycl: (32 commits) [SYCL] Fix circular reference between events and queues (intel#1226) [CI][Doc] Use SSH to deploy GitHub Pages (intel#1232) [SYCL][CUDA][Test] Testing for use of CUDA primary context (intel#1174) [SYCL] allow underscore symbol in temporary directory name [SYCL] Reject zero length arrays (intel#1153) [SYCL] Fix static code analyzis concerns (intel#1189) [SYCL] Add more details about the -fintelfpga option (intel#1218) [SYCL][CUDA] Select only NVPTX64 device binaries (intel#1223) [SYCL] Reverse max work-group size order (intel#1177) [SYCL][Doc] Add GroupAlgorithms extension (intel#1079) [SYCL] Fix SYCL internal enumerators conflict with user defined macro (intel#1188) [SYCL][CUDA] Fixes context release and unnamed context scope (intel#1207) [SYCL][CUDA] Fix context creation property parsing [CUDA][PI] clang-format pi.h [SYCL][CUDA] Handle the case of not having any CUDA device (intel#1212) [SYCL] Fix check-sycl-deploy target problems (intel#1165) [SYCL] Disable tests which take more than 5 minutes (intel#1220) [SYCL] Make context constructors explicit to avoid unintended conversions (intel#1219) [SYCL][NFC] Add clang-format configuration file for SYCL LIT tests (intel#1224) [SYCL] Fix command cleanup invoked from multiple threads (intel#1214) ...
…_accessor_refactor * origin/sycl: (38 commits) [SYCL] Fix device::get_devices() with a non-host device type (intel#1235) [SYCL][PI][CUDA] Implement kernel and kernel-group information queries (intel#1180) [SYCL] Remove default error code value in exception (intel#1150) [SYCL] Fix devicelib assert LIT test (intel#1245) [SYCL] Set aux-target-cpu for SYCL offload device compilation (intel#1225) [SYCL] Remove fabs and ceil from the list of unsupported math functions (intel#1217) [SYCL] Fix circular reference between events and queues (intel#1226) [CI][Doc] Use SSH to deploy GitHub Pages (intel#1232) [SYCL][CUDA][Test] Testing for use of CUDA primary context (intel#1174) [SYCL] allow underscore symbol in temporary directory name [SYCL] Reject zero length arrays (intel#1153) [SYCL] Fix static code analyzis concerns (intel#1189) [SYCL] Add more details about the -fintelfpga option (intel#1218) [SYCL][CUDA] Select only NVPTX64 device binaries (intel#1223) [SYCL] Reverse max work-group size order (intel#1177) [SYCL][Doc] Add GroupAlgorithms extension (intel#1079) [SYCL] Fix SYCL internal enumerators conflict with user defined macro (intel#1188) [SYCL][CUDA] Fixes context release and unnamed context scope (intel#1207) [SYCL][CUDA] Fix context creation property parsing [CUDA][PI] clang-format pi.h ...
Due to the user defined work-group size being flipped before launching
kernels, the reported max work-group size for a device must be reversed
correspondingly.
Signed-off-by: Steffen Larsen steffen.larsen@codeplay.com