[Driver][SYCL]Emit an error if c compilation is forced using -x c or -x c-header when -fsycl mode is used#1416
Closed
hchilama wants to merge 4290 commits intointel:masterfrom
Closed
[Driver][SYCL]Emit an error if c compilation is forced using -x c or -x c-header when -fsycl mode is used#1416hchilama wants to merge 4290 commits intointel:masterfrom
hchilama wants to merge 4290 commits intointel:masterfrom
Conversation
CONFLICT (content): Merge conflict in clang/lib/Sema/Sema.cpp
This patch improves the tool's diagnostic upon finding a SPIR kernel within an LLVM module. Despite that the tool's only current use is within the SYCL FPGA flow, it's important to make the message target-agnostic, so that the tool is not tied to a particular device BE. A related commit to the Clang driver has extended these diagnostics with SYCL FPGA specifics without affecting the tool itself. This patch also introduces testing for the return code value. For example, this should allow the Clang driver users/developers to differentiate between the two possible causes of llvm-no-spir-kernel failure. Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Signed-off-by: Alexey Bader <alexey.bader@intel.com>
intel#1141) Signed-off-by: Aleksander Fadeev <aleksander.fadeev@intel.com>
Signed-off-by: Dmitry Vodopyanov <dmitry.vodopyanov@intel.com>
Move internal headers from include/CL/sycl to source directory to prevent implementation details leak to user application and enforce stable ABI. A few more changes were applied to make the movement possible: - addHostAccessorAndWait functions in accessor to avoid calls to RT internals from header file - Removed getImageInfo - Move buffer size acquisition from buffer constructor to SYCLMemObjT cpp to avoid calls to PI - getPluginFromContext function in context - Standard containers replaced with SYCL variants in sycl_mem_obj_i.hpp. Unique ptr replaced with shared - A few implementations moved from queue.hpp to queue.cpp - Some LIT tests temporarily include implementaion specific headers. They will be converted to unit tests later. Signed-off-by: Alexander Batashev <alexander.batashev@intel.com>
intel#1144) Since we really just want to be able to memcpy the type to the device, 'is-trivially-copyable' is not the correct trait. Since CWG1734, If we want to support trivially copyable types, we would be required to create 1 of 4 different mechanisms for having a type on the device (depending on the way the type is structured). Additionally, 2 of these ways require us to ALSO have the type be default constructible. This patch transitions to trivially-copy-constructible , so that we can simply memcpy from the existing one into new memory. Signed-off-by: Erich Keane <erich.keane@intel.com>
intel#1118) Signed-off-by: James Brodman <james.brodman@intel.com>
LowerWGScope pass performs required transformations to enable hierarchical parallelism semantics. This pass should not be skipped even if optimizations are disabled. Also some typos in the comments are fixed. Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
…el#1156) After intel#1068 has included the Demangle header, this fix to CMakeLists should guarantee successful builds in all configurations Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
SPIR-V OpGroupBroadcast accepts three forms of local ID: - scalar integer - vector integer with 2 components - vector integer with 3 components Signed-off-by: John Pennycook <john.pennycook@intel.com>
Also remove idle semicolon. Signed-off-by: Alexey Bader <alexey.bader@intel.com>
…#1162) Fix the cl_device_unified_shared_memory_capabilities_intel bitfield type name. Signed-off-by: Alexey Bader <alexey.bader@intel.com>
* [SYCL][LIBCLC] Additional libclc builtins to support SYCL work Adds builtins to libclc to support the CUDA backend for SYCL. Contributors Alexander Johnston <alexander@codeplay.com> David Wood <david.wood@codeplay.com> Victor Lomuller <victor@codeplay.com> Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL] CMake and lit support for SYCL CUDA backend Adds defines CMake and lit variables used for SYCL CUDA backend development and test Contributors Alexander Johnston <alexander@codeplay.com> Bjoern Knafla <bjoern@codeplay.com> Ruyman Reyes <ruyman@codeplay.com> Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL] Local Accessor Support for CUDA Provides the LocalAccessorToSharedMemory compiler pass required for supporting SYCL local accessors in CUDA. Contributors Alexander Johnston <alexander@codeplay.com> David Wood <david.wood@codeplay.com> Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL][CUDA] Change __spirv_BuiltIn.. to functions Changes the following builtins to functions __spirv_BuiltInGlobalSize __spirv_BuiltInWorkgroupSize __spirv_BuiltInNumWorkgroups __spirv_BuiltInLocalInvocationId __spirv_BuiltInWorkgroupId __spirv_BuiltInGlobalOffset Contributors David Wood <david.wood@codeplay.com> Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL][CUDA] Add SYCL CUDA support to clang driver Adds CUDA support for sycl compilation in the clang driver Contributors Alexander Johnston <alexander@codeplay.com> David Wood <david.wood@codeplay.com> Victor Lomuller <victor@codeplay.com> Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL][CUDA] Initial Implementation of the CUDA backend Contributors Alan Forbes <alan.forbes@codeplay.com> Alexander Johnston <alexander@codeplay.com> Bjoern Knafla <bjoern@codeplay.com> Daniel Soutar <daniel.soutar@codeplay.com> David Wood <david.wood@codeplay.com> Kumudha Narasimhan <kumudha.narasimhan@codeplay.com> Mehdi Goli <mehdi.goli@codeplay.com> Przemek Malon <przemek.malon@codeplay.com> Ruyman Reyes <ruyman@codeplay.com> Stuart Adams <stuart.adams@codeplay.com> Svetlozar Georgiev <svetlozar.georgiev@codeplay.com> Steffen Larsen <steffen.larsen@codeplay.com> Victor Lomuller <victor@codeplay.com> Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL] Update libclc install rules Have libclc install clc-* and libspirv-* to lib and share Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL][CUDA] Inline cl namespace to simplify SYCL API usage Synchronise the CUDA backend with the general SYCL changes from intel#974. Signed-off-by: Andrea Bocci <andrea.bocci@cern.ch> * Added missing flags for device-side builtins Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL][CUDA] Removing unnecessary tool from the tree Acked-by: Victor Lomuller <victor@codeplay.com> Signed-off-by: Ruyman <ruyman@codeplay.com> * [SYCL][PI] Fix kernel group info parameter conversion Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com> * [SYCL][CUDA] Refactor __SYCL_INLINE macro Synchronise the CUDA backend with the general SYCL changes from intel#1121. Signed-off-by: Andrea Bocci <andrea.bocci@cern.ch> * [SYCL] Have default_selector consider SYCL_BE Have the default_selector consider the env var SYCL_BE when rating device scores to make choosing a backend easier. Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL] Select GlobalPlugin based on SYCL_BE Rather than choose the last found plugin as GlobalPlugin, select it depending on the SYCL_BE env var. Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL] Improve default device selection checks Better checks for CUDA and OpenCL devices to match with SYCL_BE in the default device selection, based on the platform version info. Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL] Formatting update for device_selector.cpp Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL] Changed CUDA unit tests to call through plugin Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com> * [SYCL] Pass SYCL_BE=PI_OPENCL in check-sycl To ensure that the check-sycl targets test OpenCL devices, pass SYCL_BE=PI_OPENCL. This mirrors the check-sycl-cuda target which passes SYCL_BE=PI_CUDA. Without this it is nondeterministic which device is tested by check-sycl. Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL][CUDA] Remove PI_CUDA specific details from clang Removes PI_CUDA specific code paths and tests from clang, opting to always enable them. Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL][CUDA] Disable linear_id/opencl-interop.cpp for cuda Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL][CUDA] Further fixes to CUDA device selection Fix platform string comparison for CUDA platform detection. Fix device info platform query so that it uses the device's plugin, rather than the GlobalPlugin. Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL][CUDA] Code style and cleanup to CUDA support Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL] Enable asserts in all buildbot builds Signed-off-by: Alexander Johnston <alexander@codeplay.com> * [SYCL][CUDA] Minor test and build configuration Fix minor test and build configuration issues introduced in the development of the CUDA backend. Signed-off-by: Alexander Johnston <alexander@codeplay.com> Co-authored-by: Andrea Bocci <andrea.bocci@cern.ch> Co-authored-by: Ruyman <ruyman@codeplay.com> Co-authored-by: Steffen Larsen <56076654+steffenlarsen@users.noreply.github.com>
Signed-off-by: Alexey Bader alexey.bader@intel.com Co-Authored-By: Alexander Batashev <alexbatashev@outlook.com>
CONFLICT (content): Merge conflict in clang/lib/Sema/SemaChecking.cpp
CONFLICT (content): Merge conflict in clang/lib/Sema/SemaChecking.cpp
Error was reproducible in two cases: - using something like `numeric_limits<half>::min()` in within another `constexpr` - not treating SYCL headers as system ones with `-Winvalid-constexpr` treated as error Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Event type triggers are misspelled "open"->"opened", etc. Default event type triggers should work fine. Signed-off-by: Alexey Bader <alexey.bader@intel.com>
…1053) We had issue with wrong mangling of s_upsample. I fixed it a long time ago, so we can delete workaround now. Signed-off-by: Ilya Mashkov <ilya.mashkov@intel.com>
Signed-off-by: Igor Dubinov <igor.dubinov@intel.com>
During the building x64 Debug configuration of Windows using scripts from buildbot folder, there were two issues: 1. OpenCL ICD Loader failed to build because of the missing OpenCL headers 2. Fatal error C1128: clang\lib\Sema\SemaTemplateDeduction.cpp : number of sections exceeded object file format limit: compile with /bigobj Signed-off-by: Dmitry Vodopyanov <dmitry.vodopyanov@intel.com>
Signed-off-by: Dmitry Vodopyanov <dmitry.vodopyanov@intel.com>
It turns out that my original implementation was correct and I just mis-understand the double dot commit range description from ProGit https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection. Signed-off-by: Alexey Bader <alexey.bader@intel.com>
CONFLICT (content): Merge conflict in clang/lib/Sema/SemaChecking.cpp
Signed-off-by: Alexey Sotkin <alexey.sotkin@intel.com>
Define __SPIRV_BUILTIN_DECLARATIONS__ when passing -fdeclare-spirv-builtins to clang. Signed-off-by: Victor Lomuller <victor@codeplay.com>
Added OpenCL SPIR-V extended set builtins bindings and part of the core SPIR-V (mostly missing Images and Pipes) Known vendor extensions are not implemented yet. Signed-off-by: Victor Lomuller <victor@codeplay.com> Co-Authored-By: Alexey Bader <alexey.bader@intel.com>
…l#1252) Implementation of piEventSetCallback with tests GlueEvent uses now the correct plugins The SYCL RT code for GlueEvent calls now the right plugin to create the event that triggers the dependency chain. Renamed variables to clarify the source code and avoid confusions between Context and Plugin Signed-off-by: Ruyman Reyes <ruyman@codeplay.com> Signed-off-by: Stuart Adams <stuart.adams@codeplay.com> Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Stuart Adams <stuart.adams@codeplay.com>
…ntel#1381) Signed-off-by: gejin <ge.jin@intel.com>
…#1376) NOTE: This flag is not exposed to the driver and not intended for users. It's added to make experiments and identify issues with optimizations. Signed-off-by: Alexey Bader <alexey.bader@intel.com>
…#1383) By emitting the legacy variant of the LLVM IR alongside the newer representation of the attribute, backwards compatibility with any existing BE implementation is restored. A smooth transition period is thus achieved for the aforementiond BE - until it's able to consume the new LLVM IR, it has an option to simply ignore the unknown metadata. Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
If found alloca command is not sub-buffer alloca, then it's parent alloca which has same context Signed-off-by: Ivan Karachun <ivan.karachun@intel.com>
…ntel#1344) Signed-off-by: Michael Kinsner <michael.kinsner@intel.com>
Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
Enable -fdeclare-spirv-builtins for SYCL device compilation mode For device compilation, SPIR-V builtins are now looked up by the device compiler. They now longer need to be forward declared. [SYCL-PTX] Revert manual mangling of some SPIR-V builtins [SYCL-PTX] Add fmod builtin [SYCL-PTX] Update Atomic mangling Signed-off-by: Victor Lomuller <victor@codeplay.com>
…<dir> (intel#1346) When using /Fo<dir> the improper dependency file name was generated, causing the bundle step to not be able to locate the dependency file when compiling to object Signed-off-by: Michael D Toguchi <michael.d.toguchi@intel.com>
This patch introduces the following loop attributes: - loop_coalesce: Indicates that the loop nest should be coalesced into a single loop without affecting functionality - speculated_iterations: Specifies the number of concurrent speculated iterations that will be in flight for a loop invocation - disable_loop_pipelining: Disables pipelining of the loop data path, causing the loop to be executed serially - max_interleaving: Places a maximum limit N on the number of interleaved invocations of an inner loop by an outer loop Signed-off-by: Viktoria Maksimova <viktoria.maksimova@intel.com>
Fixed the buffer constructor called with a pair of iterators. The current implementation has a problem due to ambiguous spec. The buffer should never write back data unless there is a call to set_final_data(), but the current implementation does it. I corrected the spec in KhronosGroup/SYCL-Docs#76. So, now we can change the buffer implementation according to the clarified spec. The test case buffer.cpp also needed change because of this change. The user should not expect the automatic write-back of data upon destruction of buffer. Signed-off-by: Byoungro So <byoungro.so@intel.com> Co-authored-by: Ronan Keryell <ronan@keryell.fr>
A simple library which allows to construct and serialize/deserialize a sequence of typed property sets, where each property is a <name,typed value> pair. To be used in offload tools. Signed-off-by: Konstantin S Bobrovsky <konstantin.s.bobrovsky@intel.com>
This reverts commit d357add. Signed-off-by: Vladimir Lazarev <vladimir.lazarev@intel.com>
Signed-off-by: Alexander Batashev <alexander.batashev@intel.com>
…ntel#1359) Signed-off-by: Konstantin S Bobrovsky <konstantin.s.bobrovsky@intel.com>
…for (intel#1348) The kernel callable being invoked from an nd_range parallel_for is accepting an id argument, while it should be nd_item. After my analysis, I found we check arguments' type for kernel_parallel_for instead of parallel_for. But that check is useless, because the compiler can still find a candidate for kernel_parallel_for with nd_range and id which is a wrong combination. In my solution, parallel_for with nd_range calls kernel_parallel_for_nd_range(...) which is only available for nd_item. Signed-off-by: Bing1 Yu <bing1.yu@intel.com>
Implements a few code simplification/unification for LowerWGScope. Signed-off-by: Victor Lomuller <victor@codeplay.com>
…tel#1405) For NVPTX target address space inference for kernel arguments and allocas is happening in the backend (NVPTXLowerArgs and NVPTXLowerAlloca passes). After frontend these pointers are in LLVM default address space 0 which is the generic address space for NVPTX target. Perform address space cast of a pointer to the shadow global variable from the local to the generic address space before replacing all usages of a byval argument. Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
- Adds static members to sub_group class. - sub_group member functions marked deprecated, to be removed later. - SPIR-V helpers expanded to convert SYCL group to SPIR-V scope. - Add workaround for half types Signed-off-by: John Pennycook <john.pennycook@intel.com>
Whereas it is not possible to generate vector of bools in FE, we have to change return type for corresponding instructions in SPIRV translator to vector of bools. SPIRV translator already did this for some instructions, this patch extends this behaviour to handle more instructions.
Adding doxygen documentation to PI CUDA backend. Some code is re-ordered in the file to help sorting the doxygen. Co-Authored-By: Alexey Bader <alexey.bader@intel.com> Co-Authored-By: Alexander Batashev <alexbatashev@outlook.com> Co-Authored-By: Romanov Vlad <17316488+romanovvlad@users.noreply.github.com> Signed-off-by: Ruyman Reyes <ruyman@codeplay.com>
Based on https://github.com/codeplaysoftware/standards-proposals/blob/master/spec-constant/index.md * [SYCL] PI changes: 1. Add specialization constant API to the SYCL RT Plugin Interface. New PI API added: pi_result piProgramSetSpecializationConstant(pi_program prog, pi_uint32 spec_id, size_t spec_size, const void *spec_value); 2. Add property set fields to the binary image descriptor, bump PI version. This change breaks backward binary compatibility of device binary image descriptors. 3. Add convenience C++ wrappers for PI binary image hierarchy objects. * [SYCL] Support device binary properties and file tables in the offload wrapper. 1. New option - "-properties=<file>". <file> must be a property set registry file, as defined by llvm/Support/PropertySetIO.h. The wrapper will add the property sets to the binary image descriptor and the them available to the runtime. 2. New options - "-batch". With this option the only input can be a file table, as defined by llvm/Support/SimpleTable.h. Column names are a part of interface between this tool and the sycl-post-link, which produces the file table. 3. Binary image descriptor LLVM type updated to resemble changes in Plugin Interface v1.2. * [SYCL] Specialization constants support in the Front End. 1. Detect kernel lambda object captures corresponding to specialization constants and (a) don't create kernel arguments for them (b) generate specializations of the SpecConstantInfo structure into the integration header. 2. Recognize the __unique_stable_name intrinsic and replace it with a string literal uniquely identifying the type of the typename template parameter to this intrinsic. 3. FE-related changes in the runtime: - new SpecConstantInfo templated struct for type->name translation for specialization constants used by integration header - define the __sycl_fe_getStableUniqueTypeName intrinsic * [SYCL] Add specialization constant support in SYCL runtime. 1. Define SYCL API (sycl/include/CL/sycl/experimental/spec_constant.hpp) 2. Add convenience C++ wrappers for PI device binary structures and refactor runtime to use the wrappers. Get rid of custom deleters for binary images. 3. Implement SYCL spec constant APIs in program an program manager. * [SYCL] Use file-table-tform in SYCL offload processing in clang driver. Clang driver's design can't handily model (1) multiple inputs/outputs in the action graph. Because of that, for example, sycl-post-link tool is invoked twice - once to to split the code and produce multiple bitcode files, and secondly - to generate symbol files for the split modules. (2) "Clusters" of inputs/outputs, when subsets of inputs/outputs are associated and describe different aspects of the same data. Example of such clustering is the split module + its symbol file above. Clustering would require support both in the driver and the tools invoked in response to actions. This commit moves SYCL offload processing to the "file table concept." sycl-post-link instead of (1) being invoked n times, once per each output type requested (once for device split and once for symbol file generation) (2) outputting multiple file lists each listing outputs from the corresponding invocation above is now invoked once and produces single file table output. E.g. [Code|Symbols|Properties] a_0.bc|a_0.sym|a_0.props a_1.bc|a_1.sym|a_1.props This solves both problems - multiple input/output and clustering. Combined with the file-table-tform tool, this allows for efficent handling of multiple clusters of files (each represented as a row in the table file) in the clang driver infrastructure. For example, there is a real offload processing problem: step1. sycl-post-link outputs N clusters of files step2. "Code" file of each cluster resuilting from step1 ({a_0.bc, a_1.bc} in the example above) must undergo further transformations - translation to SPIRV and optional ahead-of-time compilation. step3. In each cluster resulting from step1 the "Code" file needs to be replaced with the result of step2 step4. All the clusters are processed by the ClangOffloadWrapper tool, which needs to know how files are distributed into clusters and what is the roles of each file in a cluster - whether it is "Code", "Symbol" or "Properties". To solve this, the following action graph is constructed in the clang driver: column:"Code" t1 -> [file-table-tform:extract column] -> t1a -> [for-each:] -> t1b llvm-spirv aot-comp t1 \ column:"Code" [file-table-tform:replace column] -> t2 -> [ClangOffloadWrapper] / t1b where t1b is ["Code"] and t2 is [Code|Symbols|Properties] a_0.bin a_0.bin|a_0.sym|a_0.props a_1.bin a_1.bin|a_1.sym|a_1.props Note that the graph does not change with growing number of clusters, neither it changes when more files are added to each cluster (e.g. a "Manifest" file). * [SYCL] Process specialization constants in sycl-post-link tool. Add a spec constant lowering pass to sycl-post-link tool. Support file table output format. * [SYCL] Temporarily disable spec_const_hw.cpp on CPU. CPU OpenCL Runtime on build machines is not updated yet. Signed-off-by: Konstantin S Bobrovsky <konstantin.s.bobrovsky@intel.com>
aelovikov-intel
pushed a commit
to aelovikov-intel/llvm
that referenced
this pull request
Feb 23, 2023
Test integration of kernel fusion into the SYCL runtime scheduler.
Check that cancellation of the fusion happens if required by synchronization rules, as described in the [extension proposal](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_codeplay_kernel_fusion.asciidoc#synchronization-in-the-sycl-application).
Spec: intel#7098
Implementation: intel#7531
Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.