[SYCL] Add faster reduction implementations using atomic or/and intel…#1615
Conversation
…::reduce() Signed-off-by: Vyacheslav N Klochkov <vyacheslav.n.klochkov@intel.com>
|
@Pennycook could you review the patch? |
| } | ||
|
|
||
| /// Implements parallel_for() accepting nd_range and 1 reduction variable | ||
| /// having 'read_write' access mode. |
There was a problem hiding this comment.
[Minor] Suggest mentioning that reduction should support "fast atomics".
There was a problem hiding this comment.
Vlad, doesn't this comment already tell that (see the line 3 and 4 of this comment section)?
/// Implements parallel_for() accepting nd_range and 1 reduction variable
/// having 'read_write' access mode.
/// This version uses fast sycl::atomic operations to update user's reduction
/// variable at the end of each work-group work.
Pennycook
left a comment
There was a problem hiding this comment.
I'm not sure about the names fast_atomic and fast_reduce going forward -- what we're really testing here is whether SYCL provides native atomics or reductions for those types (since a device is not required to guarantee that their implementation is fast).
I don't feel strongly enough about this to block merging this PR, but we might want to revisit this naming convention when tuning for additional platforms.
If for some device those 'native' atomics happen to work slowly, then the good move is to exclude such 'native' atomic from 'fast' atomics list and use different algorithm not using them. Such exclusion would require some additional changes and maybe dynamic/runtime checks on HOST (I did not think much about it yet). |
That's a good point. I agree -- as long as we continue to update the logic and use these specializations only if the features are expected to improve performance, "fast" is a good name. |
…_docs * origin/sycl: (6482 commits) [SYCL][NFC] Clean formatting in Markdown documents (intel#1635) [SYCL][Doc] Remove obsolete parens from README (intel#1637) [SYCL] Fix failing ABI tests when LLVM_LIBDIR_SUFFIX is set (intel#1605) [SYCL] Fix warnings in libdevice (intel#1630) [SYCL][CUDA] Triage and clean LIT (intel#1620) [SYCL][NFC] Fix GCC 8 compilation warnings (intel#1631) [SYCL] Minor fixes in LowerWGScope [SYCL] PI: correct default interoperability plugin selection [SYCL] Add faster reduction implementations using atomic or/and intel::reduce() (intel#1615) [SYCL] Add sycl-ls utility for listing devices discovered/selected by SYCL RT (intel#1575) [SYCL] Fix getDeviceFromHandler declarations (intel#1626) [SPIR-V] Correct/improve declaration of SPIR-V builtins (intel#1519) [SYCL][USM] Improve USM allocator test and fix improper behavior. (intel#1538) [SYCL] Fix failing ABI LITs (intel#1622) [SYCL] Add support for MSVC internal math functions in device library (intel#1441) [SYCL] Add runtime library versioning (intel#1604) [SYCL] Check weak symbols in ABI dumps (intel#1609) [NFC][SYCL] Improve kernel metadata test (intel#1610) Revert "[SYCL] XFAIL LIT test due to duplicate diagnostic" (intel#1460) [SYCL] Move the reduction command group funcs out of handler.hpp (intel#1602) ...
…::reduce()
Signed-off-by: Vyacheslav N Klochkov vyacheslav.n.klochkov@intel.com