Conversation
Uses __nvvm_bar_warp_sync, which is equivalent to CUDA __syncwarp(). Because sub-group functions must always be called in converged control flow, the membermask is always set to represent all active work-items in the warp. Enabling this functionality requires that we switch to PTX 6.4, which is consistent with the existing requirement to use CUDA 10.1. Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
|
Thanks to @Naghasan and @bader for their help in getting this working. Also, a note to reviewers: I had some trouble getting CMake to handle the additional PTX flags correctly. I'm not a CMake expert, and would welcome any suggestions regarding how to improve what I've committed here. The issue as I understand it is that the list of compilation options constructed in libclc/CMakeLists.txt is passed to two functions in AddLibclc.cmake, but each function consumes those options differently. One passes the options to |
| TRIPLE ${t} | ||
| TARGET_ENV libspirv | ||
| COMPILE_OPT ${mcpu} | ||
| COMPILE_OPT ${flags} |
There was a problem hiding this comment.
COMPILE_OPT is a multi value option, so you should be able to add the extra flags directly.
A more long term solution would be perhaps to define flag per arch_sufix (they can then be accessed later), but should be for later I guess.
| set( mcpu ) | ||
| # FIXME: Ideally we would not be tied to a specific PTX ISA version | ||
| if( ${ARCH} STREQUAL nvptx OR ${ARCH} STREQUAL nvptx64 ) | ||
| set( flags "SHELL:-Xclang -target-feature" "SHELL:-Xclang +ptx64") |
There was a problem hiding this comment.
Why using "SHELL: and string( REGEX REPLACE "SHELL:" later is needed ?
There was a problem hiding this comment.
add_target_options only works if the SHELL: is there, but add_custom_command only works if the SHELL: is not there.
This is definitely a bit of a hack, but it seemed less error-prone than defining the same set of flags twice. If there's a more standard way to do this, please let me know and I'll fix it.
There was a problem hiding this comment.
Makes sense. I'm no CMake expert so I'm not quite sure how to make it better.
When there is forward declaration of a spirv entry, its decorates are not translated until its definition is seen. Forward id is re-used for its entry. Id in entry decorates should use forward id as well. Original commit: KhronosGroup/SPIRV-LLVM-Translator@305f48884606abf
Rename urCommandBufferEnqueueExp to urEnqueueCommandBufferExp
This reverts commit cc60d08, from oneapi-src/unified-runtime#2606 due to CI fails in the DPC++ bump PR that need further investigation #16747
Revert "Merge pull request #2606 from Bensuo/cmd-buf_enqueue_refactor"
Uses __nvvm_bar_warp_sync, which is equivalent to CUDA __syncwarp().
Because sub-group functions must always be called in converged control flow,
the membermask is always set to represent all active work-items in the warp.
Enabling this functionality requires that we switch to PTX 6.4, which is
consistent with the existing requirement to use CUDA 10.1.
Signed-off-by: John Pennycook john.pennycook@intel.com