Skip to content

[Driver][SYCL]Emit an error if c compilation is forced using -x c or -x c-header when -fsycl mode is used#1416

Closed
hchilama wants to merge 4290 commits intointel:masterfrom
hchilama:intel_llvm
Closed

[Driver][SYCL]Emit an error if c compilation is forced using -x c or -x c-header when -fsycl mode is used#1416
hchilama wants to merge 4290 commits intointel:masterfrom
hchilama:intel_llvm

Conversation

@hchilama
Copy link
Contributor

No description provided.

Fznamznon and others added 30 commits February 19, 2020 11:53
  CONFLICT (content): Merge conflict in clang/lib/Sema/Sema.cpp
This patch improves the tool's diagnostic upon finding a
SPIR kernel within an LLVM module. Despite that the tool's
only current use is within the SYCL FPGA flow, it's important
to make the message target-agnostic, so that the tool is not
tied to a particular device BE.
A related commit to the Clang driver has extended these diagnostics
with SYCL FPGA specifics without affecting the tool itself.

This patch also introduces testing for the return code value. For
example, this should allow the Clang driver users/developers to
differentiate between the two possible causes of llvm-no-spir-kernel
failure.

Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
Signed-off-by: Alexey Bader <alexey.bader@intel.com>
intel#1141)

Signed-off-by: Aleksander Fadeev <aleksander.fadeev@intel.com>
Signed-off-by: Dmitry Vodopyanov <dmitry.vodopyanov@intel.com>
Move internal headers from include/CL/sycl to source directory to
prevent implementation details leak to user application and enforce
stable ABI.

A few more changes were applied to make the movement possible:

- addHostAccessorAndWait functions in accessor to avoid calls to RT
  internals from header file
- Removed getImageInfo
- Move buffer size acquisition from buffer constructor to SYCLMemObjT
  cpp to avoid calls to PI
- getPluginFromContext function in context
- Standard containers replaced with SYCL variants in sycl_mem_obj_i.hpp.
  Unique ptr replaced with shared
- A few implementations moved from queue.hpp to queue.cpp
- Some LIT tests temporarily include implementaion specific headers.
  They will be converted to unit tests later.

Signed-off-by: Alexander Batashev <alexander.batashev@intel.com>
intel#1144)

Since we really just want to be able to memcpy the type to the device,
'is-trivially-copyable' is not the correct trait. Since CWG1734, If we want
to support trivially copyable types, we would be required to create 1 of 4
different mechanisms for having a type on the device (depending on the
way the type is structured). Additionally, 2 of these ways require us to
ALSO have the type be default constructible.

This patch transitions to trivially-copy-constructible , so that we can
simply memcpy from the existing one into new memory.

Signed-off-by: Erich Keane <erich.keane@intel.com>
intel#1118)

Signed-off-by: James Brodman <james.brodman@intel.com>
LowerWGScope pass performs required transformations to enable
hierarchical parallelism semantics. This pass should not be skipped even
if optimizations are disabled.

Also some typos in the comments are fixed.

Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
…el#1156)

After intel#1068 has included the Demangle header, this fix to CMakeLists
should guarantee successful builds in all configurations

Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
SPIR-V OpGroupBroadcast accepts three forms of local ID:
- scalar integer
- vector integer with 2 components
- vector integer with 3 components

Signed-off-by: John Pennycook <john.pennycook@intel.com>
Also remove idle semicolon.

Signed-off-by: Alexey Bader <alexey.bader@intel.com>
…#1162)

Fix the cl_device_unified_shared_memory_capabilities_intel bitfield type
name.

Signed-off-by: Alexey Bader <alexey.bader@intel.com>
* [SYCL][LIBCLC] Additional libclc builtins to support SYCL work

Adds builtins to libclc to support the CUDA backend for SYCL.

Contributors
Alexander Johnston <alexander@codeplay.com>
David Wood <david.wood@codeplay.com>
Victor Lomuller <victor@codeplay.com>

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL] CMake and lit support for SYCL CUDA backend

Adds defines CMake and lit variables used for SYCL CUDA backend
development and test

Contributors
Alexander Johnston <alexander@codeplay.com>
Bjoern Knafla <bjoern@codeplay.com>
Ruyman Reyes <ruyman@codeplay.com>

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL] Local Accessor Support for CUDA

Provides the LocalAccessorToSharedMemory compiler pass required
for supporting SYCL local accessors in CUDA.

Contributors
Alexander Johnston <alexander@codeplay.com>
David Wood <david.wood@codeplay.com>

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL][CUDA] Change __spirv_BuiltIn.. to functions

Changes the following builtins to functions

__spirv_BuiltInGlobalSize
__spirv_BuiltInWorkgroupSize
__spirv_BuiltInNumWorkgroups
__spirv_BuiltInLocalInvocationId
__spirv_BuiltInWorkgroupId
__spirv_BuiltInGlobalOffset

Contributors
David Wood <david.wood@codeplay.com>

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL][CUDA] Add SYCL CUDA support to clang driver

Adds CUDA support for sycl compilation in the clang driver

Contributors
Alexander Johnston <alexander@codeplay.com>
David Wood <david.wood@codeplay.com>
Victor Lomuller <victor@codeplay.com>

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL][CUDA] Initial Implementation of the CUDA backend

Contributors
Alan Forbes <alan.forbes@codeplay.com>
Alexander Johnston <alexander@codeplay.com>
Bjoern Knafla <bjoern@codeplay.com>
Daniel Soutar <daniel.soutar@codeplay.com>
David Wood <david.wood@codeplay.com>
Kumudha Narasimhan <kumudha.narasimhan@codeplay.com>
Mehdi Goli <mehdi.goli@codeplay.com>
Przemek Malon <przemek.malon@codeplay.com>
Ruyman Reyes <ruyman@codeplay.com>
Stuart Adams <stuart.adams@codeplay.com>
Svetlozar Georgiev <svetlozar.georgiev@codeplay.com>
Steffen Larsen <steffen.larsen@codeplay.com>
Victor Lomuller <victor@codeplay.com>

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL] Update libclc install rules

Have libclc install clc-* and libspirv-* to lib and share

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL][CUDA] Inline cl namespace to simplify SYCL API usage

Synchronise the CUDA backend with the general SYCL changes from intel#974.

Signed-off-by: Andrea Bocci <andrea.bocci@cern.ch>

* Added missing flags for device-side builtins

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL][CUDA] Removing unnecessary tool from the tree

Acked-by: Victor Lomuller <victor@codeplay.com>
Signed-off-by: Ruyman <ruyman@codeplay.com>

* [SYCL][PI] Fix kernel group info parameter conversion

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

* [SYCL][CUDA] Refactor __SYCL_INLINE macro

Synchronise the CUDA backend with the general SYCL changes from intel#1121.

Signed-off-by: Andrea Bocci <andrea.bocci@cern.ch>

* [SYCL] Have default_selector consider SYCL_BE

Have the default_selector consider the env var SYCL_BE when rating
device scores to make choosing a backend easier.

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL] Select GlobalPlugin based on SYCL_BE

Rather than choose the last found plugin as GlobalPlugin, select
it depending on the SYCL_BE env var.

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL] Improve default device selection checks

Better checks for CUDA and OpenCL devices to match with SYCL_BE in the
default device selection, based on the platform version info.

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL] Formatting update for device_selector.cpp

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL] Changed CUDA unit tests to call through plugin

Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>

* [SYCL] Pass SYCL_BE=PI_OPENCL in check-sycl

To ensure that the check-sycl targets test OpenCL devices, pass
SYCL_BE=PI_OPENCL. This mirrors the check-sycl-cuda target which
passes SYCL_BE=PI_CUDA. Without this it is nondeterministic which
device is tested by check-sycl.

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL][CUDA] Remove PI_CUDA specific details from clang

Removes PI_CUDA specific code paths and tests from clang, opting to
always enable them.

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL][CUDA] Disable linear_id/opencl-interop.cpp for cuda

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL][CUDA] Further fixes to CUDA device selection

Fix platform string comparison for CUDA platform detection.
Fix device info platform query so that it uses the device's plugin,
rather than the GlobalPlugin.

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL][CUDA] Code style and cleanup to CUDA support

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL] Enable asserts in all buildbot builds

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

* [SYCL][CUDA] Minor test and build configuration

Fix minor test and build configuration issues introduced in the
development of the CUDA backend.

Signed-off-by: Alexander Johnston <alexander@codeplay.com>

Co-authored-by: Andrea Bocci <andrea.bocci@cern.ch>
Co-authored-by: Ruyman <ruyman@codeplay.com>
Co-authored-by: Steffen Larsen <56076654+steffenlarsen@users.noreply.github.com>
Signed-off-by: Alexey Bader alexey.bader@intel.com

Co-Authored-By: Alexander Batashev <alexbatashev@outlook.com>
  CONFLICT (content): Merge conflict in clang/lib/Sema/SemaChecking.cpp
  CONFLICT (content): Merge conflict in clang/lib/Sema/SemaChecking.cpp
Error was reproducible in two cases:
- using something like `numeric_limits<half>::min()` in within another
  `constexpr`
- not treating SYCL headers as system ones with `-Winvalid-constexpr`
  treated as error

Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
Signed-off-by: Sergey Kanaev <sergey.kanaev@intel.com>
Event type triggers are misspelled "open"->"opened", etc.
Default event type triggers should work fine.

Signed-off-by: Alexey Bader <alexey.bader@intel.com>
…1053)

We had issue with wrong mangling of s_upsample. I fixed it a long time ago, so we can delete workaround now.

Signed-off-by: Ilya Mashkov <ilya.mashkov@intel.com>
Signed-off-by: Igor Dubinov <igor.dubinov@intel.com>
During the building x64 Debug configuration of Windows using scripts from buildbot folder, there were two issues:
1. OpenCL ICD Loader failed to build because of the missing OpenCL headers
2. Fatal error C1128: clang\lib\Sema\SemaTemplateDeduction.cpp : number of sections exceeded object file format limit: compile with /bigobj

Signed-off-by: Dmitry Vodopyanov <dmitry.vodopyanov@intel.com>
Signed-off-by: Dmitry Vodopyanov <dmitry.vodopyanov@intel.com>
It turns out that my original implementation was correct and I just
mis-understand the double dot commit range description from ProGit
https://git-scm.com/book/en/v2/Git-Tools-Revision-Selection.

Signed-off-by: Alexey Bader <alexey.bader@intel.com>
  CONFLICT (content): Merge conflict in clang/lib/Sema/SemaChecking.cpp
Signed-off-by: Alexey Sotkin <alexey.sotkin@intel.com>
Victor Lomuller and others added 26 commits March 24, 2020 20:35
Define __SPIRV_BUILTIN_DECLARATIONS__ when passing
-fdeclare-spirv-builtins to clang.

Signed-off-by: Victor Lomuller <victor@codeplay.com>
Added OpenCL SPIR-V extended set builtins bindings and
part of the core SPIR-V (mostly missing Images and Pipes)

Known vendor extensions are not implemented yet.

Signed-off-by: Victor Lomuller <victor@codeplay.com>
Co-Authored-By: Alexey Bader <alexey.bader@intel.com>
…l#1252)

Implementation of piEventSetCallback with tests

GlueEvent uses now the correct plugins

The SYCL RT code for GlueEvent calls now
the right plugin to create the event that triggers the
dependency chain.
Renamed variables to clarify the source code and avoid
confusions between Context and Plugin

Signed-off-by: Ruyman Reyes <ruyman@codeplay.com>
Signed-off-by: Stuart Adams <stuart.adams@codeplay.com>
Signed-off-by: Steffen Larsen <steffen.larsen@codeplay.com>
Signed-off-by: Stuart Adams <stuart.adams@codeplay.com>
…#1376)

NOTE: This flag is not exposed to the driver and not intended for users.
It's added to make experiments and identify issues with optimizations.

Signed-off-by: Alexey Bader <alexey.bader@intel.com>
…#1383)

By emitting the legacy variant of the LLVM IR alongside the newer
representation of the attribute, backwards compatibility with any
existing BE implementation is restored. A smooth transition period
is thus achieved for the aforementiond BE - until it's able to consume
the new LLVM IR, it has an option to simply ignore the unknown metadata.

Signed-off-by: Artem Gindinson <artem.gindinson@intel.com>
If found alloca command is not sub-buffer alloca, then
it's parent alloca which has same context

Signed-off-by: Ivan Karachun <ivan.karachun@intel.com>
…ntel#1344)

Signed-off-by: Michael Kinsner <michael.kinsner@intel.com>
Signed-off-by: Alexey Sachkov <alexey.sachkov@intel.com>
Enable -fdeclare-spirv-builtins for SYCL device compilation mode

For device compilation, SPIR-V builtins are now looked up by
the device compiler. They now longer need to be forward declared.

[SYCL-PTX] Revert manual mangling of some SPIR-V builtins
[SYCL-PTX] Add fmod builtin
[SYCL-PTX] Update Atomic mangling

Signed-off-by: Victor Lomuller <victor@codeplay.com>
…<dir> (intel#1346)

When using /Fo<dir> the improper dependency file name was generated, causing
the bundle step to not be able to locate the dependency file when compiling
to object

Signed-off-by: Michael D Toguchi <michael.d.toguchi@intel.com>
This patch introduces the following loop attributes:
- loop_coalesce:
  Indicates that the loop nest should be coalesced into a single loop without
  affecting functionality
- speculated_iterations:
  Specifies the number of concurrent speculated iterations that will be in
  flight for a loop invocation
- disable_loop_pipelining:
  Disables pipelining of the loop data path, causing the loop to be executed
  serially
- max_interleaving:
  Places a maximum limit N on the number of interleaved invocations of an inner
  loop by an outer loop

Signed-off-by: Viktoria Maksimova <viktoria.maksimova@intel.com>
Fixed the buffer constructor called with a pair of iterators.
The current implementation has a problem due to ambiguous spec.
The buffer should never write back data unless there is a call to set_final_data(), but the current implementation does it.
I corrected the spec in KhronosGroup/SYCL-Docs#76.
So, now we can change the buffer implementation according to the clarified spec.

The test case buffer.cpp also needed change because of this change.
The user should not expect the automatic write-back of data upon destruction of buffer.

Signed-off-by: Byoungro So <byoungro.so@intel.com>
Co-authored-by: Ronan Keryell <ronan@keryell.fr>
A simple library which allows to construct and serialize/deserialize
a sequence of typed property sets, where each property is a <name,typed value>
pair. To be used in offload tools.

Signed-off-by: Konstantin S Bobrovsky <konstantin.s.bobrovsky@intel.com>
)

The library allows to create, serialize/deserialize tables of strings,
insert/delete/replace/rename columns, add rows. To be used in offload
tools.

Signed-off-by: Konstantin S Bobrovsky <konstantin.s.bobrovsky@intel.com>
This reverts commit d357add.

Signed-off-by: Vladimir Lazarev <vladimir.lazarev@intel.com>
Signed-off-by: Alexander Batashev <alexander.batashev@intel.com>
…ntel#1359)

Signed-off-by: Konstantin S Bobrovsky <konstantin.s.bobrovsky@intel.com>
…for (intel#1348)

The kernel callable being invoked from an nd_range parallel_for is accepting an id argument, while it should be nd_item.

After my analysis, I found we check arguments' type for kernel_parallel_for instead of parallel_for. But that check is useless, because the compiler can still find a candidate for kernel_parallel_for with nd_range and id which is a wrong combination.

In my solution, parallel_for with nd_range calls kernel_parallel_for_nd_range(...) which is only available for nd_item.

Signed-off-by: Bing1 Yu <bing1.yu@intel.com>
Implements a few code simplification/unification for LowerWGScope.

Signed-off-by: Victor Lomuller <victor@codeplay.com>
…tel#1405)

For NVPTX target address space inference for kernel arguments and
allocas is happening in the backend (NVPTXLowerArgs and
NVPTXLowerAlloca passes). After frontend these pointers are in LLVM
default address space 0 which is the generic address space for NVPTX
target. Perform address space cast of a pointer to the shadow global
variable from the local to the generic address space before replacing
all usages of a byval argument.

Signed-off-by: Artur Gainullin <artur.gainullin@intel.com>
- Adds static members to sub_group class.
- sub_group member functions marked deprecated, to be removed later.
- SPIR-V helpers expanded to convert SYCL group to SPIR-V scope.
- Add workaround for half types

Signed-off-by: John Pennycook <john.pennycook@intel.com>
Whereas it is not possible to generate vector of bools in FE,
we have to change return type for corresponding instructions in SPIRV
translator to vector of bools. SPIRV translator already did this for
some instructions, this patch extends this behaviour to handle more
instructions.
Adding doxygen documentation to PI CUDA backend.
Some code is re-ordered in the file to help sorting the
doxygen.

Co-Authored-By: Alexey Bader <alexey.bader@intel.com>
Co-Authored-By: Alexander Batashev <alexbatashev@outlook.com>
Co-Authored-By: Romanov Vlad <17316488+romanovvlad@users.noreply.github.com>

Signed-off-by: Ruyman Reyes <ruyman@codeplay.com>
Based on
https://github.com/codeplaysoftware/standards-proposals/blob/master/spec-constant/index.md

* [SYCL] PI changes:

1. Add specialization constant API to the SYCL RT Plugin Interface.
New PI API added:
pi_result piProgramSetSpecializationConstant(pi_program prog, pi_uint32 spec_id,
                                             size_t spec_size,
                                             const void *spec_value);
2. Add property set fields to the binary image descriptor, bump PI version.
This change breaks backward binary compatibility of device binary image descriptors.
3. Add convenience C++ wrappers for PI binary image hierarchy objects.

* [SYCL] Support device binary properties and file tables in the offload wrapper.

1. New option - "-properties=<file>". <file> must be a property set registry
file, as defined by llvm/Support/PropertySetIO.h. The wrapper will add the
property sets to the binary image descriptor and the them available to the
runtime.

2. New options - "-batch". With this option the only input can be a file table,
as defined by llvm/Support/SimpleTable.h. Column names are a part of interface
between this tool and the sycl-post-link, which produces the file table.

3. Binary image descriptor LLVM type updated to resemble changes in Plugin
Interface v1.2.

* [SYCL] Specialization constants support in the Front End.

1. Detect kernel lambda object captures corresponding to specialization
constants and (a) don't create kernel arguments for them (b) generate
specializations of the SpecConstantInfo structure into the integration
header.

2. Recognize the __unique_stable_name intrinsic and replace
it with a string literal uniquely identifying the type of the typename
template parameter to this intrinsic.

3. FE-related changes in the runtime:
- new SpecConstantInfo templated struct for type->name translation for
  specialization constants used by integration header
- define the __sycl_fe_getStableUniqueTypeName intrinsic

* [SYCL] Add specialization constant support in SYCL runtime.

1. Define SYCL API (sycl/include/CL/sycl/experimental/spec_constant.hpp)
2. Add convenience C++ wrappers for PI device binary structures and refactor
   runtime to use the wrappers. Get rid of custom deleters for binary images.
3. Implement SYCL spec constant APIs in program an program manager.

* [SYCL] Use file-table-tform in SYCL offload processing in clang driver.

Clang driver's design can't handily model
(1) multiple inputs/outputs in the action graph. Because of that, for
example, sycl-post-link tool is invoked twice - once to to split the code
and produce multiple bitcode files, and secondly - to generate symbol
files for the split modules.
(2) "Clusters" of inputs/outputs, when subsets of inputs/outputs are
associated and describe different aspects of the same data. Example of
such clustering is the split module + its symbol file above. Clustering
would require support both in the driver and the tools invoked in
response to actions.

This commit moves SYCL offload processing to the "file table concept."
sycl-post-link instead of
(1) being invoked n times, once per each output type requested (once for
    device split and once for symbol file generation)
(2) outputting multiple file lists each listing outputs from the
    corresponding invocation above
is now invoked once and produces single file table output. E.g.
  [Code|Symbols|Properties]
  a_0.bc|a_0.sym|a_0.props
  a_1.bc|a_1.sym|a_1.props
This solves both problems - multiple input/output and clustering.
Combined with the file-table-tform tool, this allows for efficent handling
of multiple clusters of files (each represented as a row in the table file)
in the clang driver infrastructure.
For example, there is a real offload processing problem:
step1. sycl-post-link outputs N clusters of files
step2. "Code" file of each cluster resuilting from step1 ({a_0.bc, a_1.bc}
       in the example above) must undergo further transformations -
       translation to SPIRV and optional ahead-of-time compilation.
step3. In each cluster resulting from step1 the "Code" file needs to be
       replaced with the result of step2
step4. All the clusters are processed by the ClangOffloadWrapper tool, which
       needs to know how files are distributed into clusters and what is
       the roles of each file in a cluster - whether it is "Code", "Symbol"
       or "Properties".
To solve this, the following action graph is constructed in the clang driver:

                        column:"Code"
t1 -> [file-table-tform:extract column] -> t1a -> [for-each:] -> t1b
                                                  llvm-spirv
                                                   aot-comp
t1
   \                  column:"Code"
    [file-table-tform:replace column] -> t2 -> [ClangOffloadWrapper]
   /
t1b

where t1b is ["Code"]  and t2 is  [Code|Symbols|Properties]
              a_0.bin             a_0.bin|a_0.sym|a_0.props
              a_1.bin             a_1.bin|a_1.sym|a_1.props

Note that the graph does not change with growing number of clusters, neither
it changes when more files are added to each cluster (e.g. a "Manifest" file).

* [SYCL] Process specialization constants in sycl-post-link tool.

Add a spec constant lowering pass to sycl-post-link tool. Support
file table output format.

* [SYCL] Temporarily disable spec_const_hw.cpp on CPU.

CPU OpenCL Runtime on build machines is not updated yet.

Signed-off-by: Konstantin S Bobrovsky <konstantin.s.bobrovsky@intel.com>
@hchilama hchilama closed this Mar 28, 2020
@hchilama hchilama deleted the intel_llvm branch November 19, 2021 19:37
aelovikov-intel pushed a commit to aelovikov-intel/llvm that referenced this pull request Feb 23, 2023
Test integration of kernel fusion into the SYCL runtime scheduler.
    
Check that cancellation of the fusion happens if required by synchronization rules, as described in the [extension proposal](https://github.com/intel/llvm/blob/sycl/sycl/doc/extensions/experimental/sycl_ext_codeplay_kernel_fusion.asciidoc#synchronization-in-the-sycl-application).

Spec: intel#7098
Implementation: intel#7531

Signed-off-by: Lukas Sommer <lukas.sommer@codeplay.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.