Conversation
0c7207d to
79b4f0d
Compare
|
Test |
* This work "short-cuts" supporting OpenCL by leveraging DBCSR's OpenCL backend to implement CP2K's Offload interface. - Future work (may be soon) can decide/settle where the actual glue code is hosted making the "OpenCL backend". - OpenCL is very close to CUDA (streams/queues, events, etc), however, an OpenCL backend accounts for differences. - Pointer arithmetic on the host using device pointers is not supported with OpenCL without additional effort. - OpenCL abstracts device memory being "cl_mem" structure and does not expose actual device pointers. * Some functions on CP2K's Offload interface (once derived from DBCSR's ACC interface) remain "TODO". * For the time being offload_runtime.h "disables" support for Grid and PW/FFTs - Defines for __NO_OFFLOAD_GRID and __NO_OFFLOAD_PW must be still given for the Fortran code. * Some changes remove CUDA and HIP specific code-paths by relying on Offload runtime (e.g., offload_create_buffer). * Some changes also address problematic code format (multi-line control-flow w/o curly braces/block, etc). * Further updating INSTALL.md and white-listing/removing certain items are part of follow-up changes. * Cleaned FLAG_EXCEPTIONS (tools/precommit/check_file_properties.py).
|
This is very exciting! Should we setup a dashboard test for it?
This should be straightforward to change. |
|
Thanks! There are more PRs to come, at least one covering the actual DBM changes/kernel. I mean this perfectly works on NVidia too, so it should be simple to finally setup a Dashboard entry. |
|
Also, I have to work on providing a clean/minimal ARCH file and support in our CMake infra ;-) |
|
To share a little sneak peek (@JWilhelm), I ran
The columns denote the job title, number of nodes, number of ranks per node, number of threads (no smt), and total time to solution (in seconds). My DBM-kernel so far is not optimized at this point, e.g., does not even use shared/local memory. |
DBM is implemented by relying on device pointer arithmetic.
for the time being offload_runtime.h "disables" support for Grid and PW/FFTs