Offload: support OpenCL by hfp · Pull Request #3315 · cp2k/cp2k

hfp · 2024-03-12T15:16:02Z

This work "short-cuts" supporting OpenCL by leveraging DBCSR's OpenCL backend to implement CP2K's Offload interface.
- Future work (may be soon) can decide/settle where the actual glue code is hosted ("OpenCL backend").
- OpenCL is very close to CUDA (streams/queues, events, etc), however, an OpenCL BE accounts for differences.
- Pointer arithmetic on the host using device pointers is not supported with OpenCL (without additional effort), e.g.,
  DBM is implemented by relying on device pointer arithmetic.
- OpenCL abstracts device memory with an opaque "cl_mem" structure (does not expose actual device pointers).
Some functions on CP2K's Offload interface (once derived from DBCSR's ACC interface) remain "TODO", i.e.,
for the time being offload_runtime.h "disables" support for Grid and PW/FFTs
- Defines for __NO_OFFLOAD_GRID and __NO_OFFLOAD_PW must be still given for the Fortran code.
Some changes remove CUDA and HIP specific code-paths by relying on Offload runtime (e.g., offload_create_buffer).
Some changes also address problematic code format (multi-line control-flow w/o curly braces/block, etc).
Further updating INSTALL.md and white-listing/removing certain items are part of follow-up changes.

hfp · 2024-03-13T08:15:01Z

Test QS/regtest-tddfpt-force-2/h2o_f13.inp with CUDA Pascal Regtest is flagged wrong. However, this is apparently unrelated to this PR. CUDA Pascal Regtest was mainly triggered to check successful compilation.

* This work "short-cuts" supporting OpenCL by leveraging DBCSR's OpenCL backend to implement CP2K's Offload interface. - Future work (may be soon) can decide/settle where the actual glue code is hosted making the "OpenCL backend". - OpenCL is very close to CUDA (streams/queues, events, etc), however, an OpenCL backend accounts for differences. - Pointer arithmetic on the host using device pointers is not supported with OpenCL without additional effort. - OpenCL abstracts device memory being "cl_mem" structure and does not expose actual device pointers. * Some functions on CP2K's Offload interface (once derived from DBCSR's ACC interface) remain "TODO". * For the time being offload_runtime.h "disables" support for Grid and PW/FFTs - Defines for __NO_OFFLOAD_GRID and __NO_OFFLOAD_PW must be still given for the Fortran code. * Some changes remove CUDA and HIP specific code-paths by relying on Offload runtime (e.g., offload_create_buffer). * Some changes also address problematic code format (multi-line control-flow w/o curly braces/block, etc). * Further updating INSTALL.md and white-listing/removing certain items are part of follow-up changes. * Cleaned FLAG_EXCEPTIONS (tools/precommit/check_file_properties.py).

oschuett · 2024-03-13T09:55:08Z

This is very exciting!

Should we setup a dashboard test for it?

DBM is implemented by relying on device pointer arithmetic.

This should be straightforward to change.

hfp · 2024-03-13T09:57:57Z

Thanks! There are more PRs to come, at least one covering the actual DBM changes/kernel. I mean this perfectly works on NVidia too, so it should be simple to finally setup a Dashboard entry.

hfp · 2024-03-13T09:59:05Z

Also, I have to work on providing a clean/minimal ARCH file and support in our CMake infra ;-)

hfp · 2024-03-13T10:13:43Z

To share a little sneak peek (@JWilhelm), I ran benchmarks/QS_low_scaling_GW/GW.inp on systems with four Intel GPUs which looks like:

job nn nr nt tts
--
cp2k-gwls-192481 1 16 7 340.494
cp2k-gwls-192547 2 32 3 162.532
cp2k-gwls-192549 4 16 7 105.373
cp2k-gwls-192483 16 16 7 56.934

The columns denote the job title, number of nodes, number of ranks per node, number of threads (no smt), and total time to solution (in seconds). My DBM-kernel so far is not optimized at this point, e.g., does not even use shared/local memory.

hfp force-pushed the offload branch 2 times, most recently from 0c7207d to 79b4f0d Compare March 12, 2024 19:25

hfp marked this pull request as ready for review March 12, 2024 20:12

hfp force-pushed the offload branch from 79b4f0d to a7ad68b Compare March 13, 2024 08:19

hfp merged commit 4b070db into cp2k:master Mar 13, 2024

hfp deleted the offload branch March 15, 2024 09:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Offload: support OpenCL#3315

Offload: support OpenCL#3315
hfp merged 1 commit intocp2k:masterfrom
hfp:offload

hfp commented Mar 12, 2024 •

edited

Loading

Uh oh!

hfp commented Mar 13, 2024 •

edited

Loading

Uh oh!

oschuett commented Mar 13, 2024

Uh oh!

hfp commented Mar 13, 2024

Uh oh!

hfp commented Mar 13, 2024

Uh oh!

hfp commented Mar 13, 2024 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hfp commented Mar 12, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hfp commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oschuett commented Mar 13, 2024

Uh oh!

hfp commented Mar 13, 2024

Uh oh!

hfp commented Mar 13, 2024

Uh oh!

hfp commented Mar 13, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hfp commented Mar 12, 2024 •

edited

Loading

hfp commented Mar 13, 2024 •

edited

Loading

hfp commented Mar 13, 2024 •

edited

Loading