Skip to content

[SYCL][CUDA] Add sub-group barrier#2606

Merged
bader merged 2 commits intointel:syclfrom
Pennycook:cuda-sub-groups
Oct 8, 2020
Merged

[SYCL][CUDA] Add sub-group barrier#2606
bader merged 2 commits intointel:syclfrom
Pennycook:cuda-sub-groups

Conversation

@Pennycook
Copy link
Contributor

Uses __nvvm_bar_warp_sync, which is equivalent to CUDA __syncwarp().
Because sub-group functions must always be called in converged control flow,
the membermask is always set to represent all active work-items in the warp.

Enabling this functionality requires that we switch to PTX 6.4, which is
consistent with the existing requirement to use CUDA 10.1.

Signed-off-by: John Pennycook john.pennycook@intel.com

Uses __nvvm_bar_warp_sync, which is equivalent to CUDA __syncwarp().
Because sub-group functions must always be called in converged control flow,
the membermask is always set to represent all active work-items in the warp.

Enabling this functionality requires that we switch to PTX 6.4, which is
consistent with the existing requirement to use CUDA 10.1.

Signed-off-by: John Pennycook <john.pennycook@intel.com>
Signed-off-by: John Pennycook <john.pennycook@intel.com>
@Pennycook Pennycook added enhancement New feature or request spec extension All issues/PRs related to extensions specifications cuda CUDA back-end labels Oct 7, 2020
@Pennycook Pennycook requested review from a team and bader as code owners October 7, 2020 17:50
@Pennycook Pennycook requested a review from againull October 7, 2020 17:50
@Pennycook
Copy link
Contributor Author

Thanks to @Naghasan and @bader for their help in getting this working.

Also, a note to reviewers: I had some trouble getting CMake to handle the additional PTX flags correctly. I'm not a CMake expert, and would welcome any suggestions regarding how to improve what I've committed here. The issue as I understand it is that the list of compilation options constructed in libclc/CMakeLists.txt is passed to two functions in AddLibclc.cmake, but each function consumes those options differently.

One passes the options to add_target_options, which unhelpfully strips the second -Xclang option if it isn't prefixed with SHELL:. The other passes the options directly to add_custom_command unmodified, leaving SHELL: in the command line. The best solution I could find was to write the options assuming that SHELL: was required, then strip them when they weren't necessary.

@bader bader merged commit 551d706 into intel:sycl Oct 8, 2020
TRIPLE ${t}
TARGET_ENV libspirv
COMPILE_OPT ${mcpu}
COMPILE_OPT ${flags}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

COMPILE_OPT is a multi value option, so you should be able to add the extra flags directly.

A more long term solution would be perhaps to define flag per arch_sufix (they can then be accessed later), but should be for later I guess.

set( mcpu )
# FIXME: Ideally we would not be tied to a specific PTX ISA version
if( ${ARCH} STREQUAL nvptx OR ${ARCH} STREQUAL nvptx64 )
set( flags "SHELL:-Xclang -target-feature" "SHELL:-Xclang +ptx64")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why using "SHELL: and string( REGEX REPLACE "SHELL:" later is needed ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add_target_options only works if the SHELL: is there, but add_custom_command only works if the SHELL: is not there.

This is definitely a bit of a hack, but it seemed less error-prone than defining the same set of flags twice. If there's a more standard way to do this, please let me know and I'll fix it.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I'm no CMake expert so I'm not quite sure how to make it better.

jsji pushed a commit that referenced this pull request Jun 27, 2024
When there is forward declaration of a spirv entry, its decorates are
not translated until its definition is seen. Forward id is re-used for
its entry. Id in entry decorates should use forward id as well.

Original commit:
KhronosGroup/SPIRV-LLVM-Translator@305f48884606abf
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
Rename urCommandBufferEnqueueExp to urEnqueueCommandBufferExp
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
This reverts commit cc60d08, from
oneapi-src/unified-runtime#2606 due to
CI fails in the DPC++ bump PR that need further investigation
#16747
Chenyang-L pushed a commit that referenced this pull request Feb 18, 2025
Revert "Merge pull request #2606 from Bensuo/cmd-buf_enqueue_refactor"
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cuda CUDA back-end enhancement New feature or request spec extension All issues/PRs related to extensions specifications

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants