[SYCL][CUDA] Add sub-group barrier by Pennycook · Pull Request #2606 · intel/llvm

Pennycook · 2020-10-07T17:50:18Z

Uses __nvvm_bar_warp_sync, which is equivalent to CUDA __syncwarp().
Because sub-group functions must always be called in converged control flow,
the membermask is always set to represent all active work-items in the warp.

Enabling this functionality requires that we switch to PTX 6.4, which is
consistent with the existing requirement to use CUDA 10.1.

Signed-off-by: John Pennycook john.pennycook@intel.com

Uses __nvvm_bar_warp_sync, which is equivalent to CUDA __syncwarp(). Because sub-group functions must always be called in converged control flow, the membermask is always set to represent all active work-items in the warp. Enabling this functionality requires that we switch to PTX 6.4, which is consistent with the existing requirement to use CUDA 10.1. Signed-off-by: John Pennycook <john.pennycook@intel.com>

Signed-off-by: John Pennycook <john.pennycook@intel.com>

Pennycook · 2020-10-07T17:55:42Z

Thanks to @Naghasan and @bader for their help in getting this working.

Also, a note to reviewers: I had some trouble getting CMake to handle the additional PTX flags correctly. I'm not a CMake expert, and would welcome any suggestions regarding how to improve what I've committed here. The issue as I understand it is that the list of compilation options constructed in libclc/CMakeLists.txt is passed to two functions in AddLibclc.cmake, but each function consumes those options differently.

One passes the options to add_target_options, which unhelpfully strips the second -Xclang option if it isn't prefixed with SHELL:. The other passes the options directly to add_custom_command unmodified, leaving SHELL: in the command line. The best solution I could find was to write the options assuming that SHELL: was required, then strip them when they weren't necessary.

Naghasan · 2020-10-08T09:30:24Z

libclc/CMakeLists.txt

 			TRIPLE ${t}
 			TARGET_ENV libspirv
-			COMPILE_OPT ${mcpu}
+			COMPILE_OPT ${flags}


COMPILE_OPT is a multi value option, so you should be able to add the extra flags directly.

A more long term solution would be perhaps to define flag per arch_sufix (they can then be accessed later), but should be for later I guess.

Naghasan · 2020-10-08T09:32:34Z

libclc/CMakeLists.txt

-			set( mcpu )
+			# FIXME: Ideally we would not be tied to a specific PTX ISA version
+			if( ${ARCH} STREQUAL nvptx OR ${ARCH} STREQUAL nvptx64 )
+				set( flags "SHELL:-Xclang -target-feature" "SHELL:-Xclang +ptx64")


Why using "SHELL: and string( REGEX REPLACE "SHELL:" later is needed ?

add_target_options only works if the SHELL: is there, but add_custom_command only works if the SHELL: is not there.

This is definitely a bit of a hack, but it seemed less error-prone than defining the same set of flags twice. If there's a more standard way to do this, please let me know and I'll fix it.

Makes sense. I'm no CMake expert so I'm not quite sure how to make it better.

When there is forward declaration of a spirv entry, its decorates are not translated until its definition is seen. Forward id is re-used for its entry. Id in entry decorates should use forward id as well. Original commit: KhronosGroup/SPIRV-LLVM-Translator@305f48884606abf

Rename urCommandBufferEnqueueExp to urEnqueueCommandBufferExp

This reverts commit cc60d08, from oneapi-src/unified-runtime#2606 due to CI fails in the DPC++ bump PR that need further investigation #16747

Revert "Merge pull request #2606 from Bensuo/cmd-buf_enqueue_refactor"

Pennycook added 2 commits October 7, 2020 13:44

[SYCL][CUDA] Enable sub-group barrier test

3b8dba1

Signed-off-by: John Pennycook <john.pennycook@intel.com>

Pennycook added enhancement New feature or request spec extension All issues/PRs related to extensions specifications cuda CUDA back-end labels Oct 7, 2020

Pennycook requested review from a team and bader as code owners October 7, 2020 17:50

Pennycook requested a review from againull October 7, 2020 17:50

bader approved these changes Oct 7, 2020

View reviewed changes

againull approved these changes Oct 7, 2020

View reviewed changes

bader merged commit 551d706 into intel:sycl Oct 8, 2020

Naghasan reviewed Oct 8, 2020

View reviewed changes

Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025

Merge pull request #2606 from Bensuo/cmd-buf_enqueue_refactor

05dd502

Rename urCommandBufferEnqueueExp to urEnqueueCommandBufferExp

Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025

Merge pull request #2688 from Bensuo/revert_2606

ea8bbf6

Revert "Merge pull request #2606 from Bensuo/cmd-buf_enqueue_refactor"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SYCL][CUDA] Add sub-group barrier#2606

[SYCL][CUDA] Add sub-group barrier#2606
bader merged 2 commits intointel:syclfrom
Pennycook:cuda-sub-groups

Pennycook commented Oct 7, 2020

Uh oh!

Pennycook commented Oct 7, 2020

Uh oh!

Naghasan Oct 8, 2020

Uh oh!

Naghasan Oct 8, 2020

Uh oh!

Pennycook Oct 8, 2020

Uh oh!

Naghasan Oct 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

Pennycook commented Oct 7, 2020

Uh oh!

Pennycook commented Oct 7, 2020

Uh oh!

Naghasan Oct 8, 2020

Choose a reason for hiding this comment

Uh oh!

Naghasan Oct 8, 2020

Choose a reason for hiding this comment

Uh oh!

Pennycook Oct 8, 2020

Choose a reason for hiding this comment

Uh oh!

Naghasan Oct 8, 2020

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants