[Caffe2] Enabling AMD GPU Backend for Caffe2 by petrex · Pull Request #7566 · pytorch/pytorch

petrex · 2018-05-15T06:03:22Z

The goal of this PR is to enable AMD GPU backend for Caffe2.

Major changes include :

Add AMD GPU device to protocol buffer
Makefile scaffolding for AMD software stack (ROCM)
Implement Caffe2 core/test for AMD GPU backend

…e2_core_hip * 'caffe2_core_hip' of github.com:petrex/pytorch: caffe2 PB update for AMD/ROCM HIP device

ezyang · 2018-05-15T19:45:36Z

CMakeLists.txt

    USE_GLOO "Use Gloo" ON
    "BUILD_CAFFE2" OFF)
 option(USE_GLOO_IBVERBS "Use Gloo IB verbs for distributed support" OFF)  # New option
+option(USE_HIP "Use HIP" ON)


cmake/public/LoadHIP.cmake

-FIND_PACKAGE(HIP 1.0 REQUIRED)
+FIND_PACKAGE(HIP 1.0)
+
+IF(HIP_FOUND)


cmake/public/LoadHIP.cmake

@@ -1,42 +1,86 @@
+set(PYTORCH_FOUND_HIP FALSE)


cmake/Dependencies.cmake

+    set(Caffe2_HIP_INCLUDES
+      ${hip_INCLUDE_DIRS} ${rocrand_INCLUDE_DIRS} ${hiprand_INCLUDE_DIRS} ${rocblas_INCLUDE_DIRS} ${miopen_INCLUDE_DIRS} ${Caffe2_HIP_INCLUDES} ${thrust_INCLUDE_DIRS})
+    set(Caffe2_HIP_DEPENDENCY_LIBS
+      ${rocrand_LIBRARIES} ${hiprand_LIBRARIES} ${PYTORCH_HIP_HCC_LIBRARIES} ${PYTORCH_MIOPEN_LIBRARIES})


aten/CMakeLists.txt

 # Find the HIP package, set the HIP paths, load the HIP CMake.
 IF(WITH_ROCM)
  include(LoadHIP)
+  if (NOT PYTORCH_FOUND_HIP)


…e2_core_hip * 'caffe2_core_hip' of github.com:petrex/pytorch: (40 commits) [auto] Update onnx to 52f7528 - add more shape inference tests (onnx/onnx#971) onnx/onnx@52f7528 JIT cleanup (pytorch#7631) fix to build sleef when using cmake 3.11.1 (pytorch#7679) Fix typo in document (pytorch#7725) [auto] Update onnx to 6f4b1b1 - Tests for Gemm operator (onnx/onnx#885) onnx/onnx@6f4b1b1 [auto] Update onnx to c6c6aad - Enhance the 1-element broadcast case (onnx/onnx#902) onnx/onnx@c6c6aad serialization for torch.device (pytorch#7713) Fix compile flags for MSVC (pytorch#7703) Fix exporting Sum to onnx (pytorch#7685) Renanme ZFNet to ZFNet512 (pytorch#7723) Implement __reduce__ for torch.dtype (pytorch#7699) Remove unnecessary include in vec256_float.h (pytorch#7711) Update from facebook (pytorch#7696) fix for cuda 9.2 builds (pytorch#7709) make BatchSampler subclass of Sampler, and expose (pytorch#7707) Dont emit warning for ABI incompatibility when PyTorch was built from source (pytorch#7681) remove index from python bindings (fixes: pytorch#7639) (pytorch#7690) Update _torch_docs.py (pytorch#7700) Fix the wrong usage of environment variables detection in cmake Changes from D7881937 and D7963936 plus an edit (pytorch#7605) ...

bddppq · 2018-05-22T17:57:19Z

@Jorghi12 Do my explanations make senses to you?

@soumith @ezyang Since I have changed two cmake files outside of the caffe2 subdirectories, I need you guys' stamp.

ezyang · 2018-05-23T08:57:16Z

If we're working around a bug in the upstream HIP files, we should say so in the code that is implementing the workaround, so that when HIP fixes their cmake we know what to eliminate.

bddppq · 2018-05-23T16:41:39Z

@ezyang @Jorghi12 Ok let me explain here again, PYTORCH_HIP_HCC_LIBRARIES and PYTORCH_MIOPEN_LIBRARIES are workaround upstream cmake files bug and I do have put two TODO comments at the bottom of cmake/public/LoadHIP.cmake with explanations. PYTORCH_FOUND_HIP is not a workaround, it's because we have extra logic (and there will be more in the future) on top of the native find HIP, so it's worth to have its own name.

soumith

stamping approval

bddppq · 2018-05-23T19:21:31Z

@petrex Let's first get this initial version in so we can parallel the work of polishing the core and adding hip ops

@generated

…e2_core_hip * 'caffe2_core_hip' of github.com:petrex/pytorch: (24 commits) Allow empty storage for the 'Edge' class. (pytorch#7595) Process group base class and Gloo implementation (pytorch#7628) _LRSchedulers getstate include optimizer info (pytorch#7757) [PyTorch] [gradcheck] change backward() to grad() (pytorch#7710) Update test_nn.py (pytorch#7787) Define general default scheduler for TBB and fix ppc64le bug (pytorch#7761) Add support for accepting Tensor as input in clip_grad_* functions. (pytorch#7769) [Easy] Remove unused code (pytorch#7782) Update tbb (pytorch#7734) Add @generated annotation (pytorch#7780) fix legacy comment after variable tensor merge (pytorch#7771) Revert pytorch#7750 and pytorch#7762 to fix Windows CI on master (pytorch#7772) Temporarily disable build env check (pytorch#7768) Add missing brace (pytorch#7762) [C++ API] Add backward() to Tensor and Variable (pytorch#7750) [auto] Update onnx to d43b550 - Fix .gitignore and add missing files (onnx/onnx#1005) onnx/onnx@d43b550 [auto] Update onnx to ea1aa13 - add tests for reduce ops (onnx/onnx#675) onnx/onnx@ea1aa13 include cudnn_h (pytorch#7749) [C++ API] Using new registration mechanism (pytorch#7663) [auto] Update onnx to 5dd68e6 - Add a util function: polish_model (onnx/onnx#1000) onnx/onnx@5dd68e6 ...

This reverts commit 6e89ad4.

petrex · 2018-05-23T21:51:03Z

@bddppq Just reverted change for the operators. Let's keep this PR for Caffe2 core and CI only.

This reverts commit 2ebcf4b.

* Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) onnx/onnx@4898c9e" This reverts commit 9c679da. * Revert "Add BiasCHW fallback for GPU (#7738)" This reverts commit 14ad2e7. * Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)" This reverts commit 2ebcf4b.

* origin: [Caffe2] Enabling AMD GPU Backend for Caffe2 (pytorch#7566) Call grad_mode.py context managers as decorators (pytorch#7737) catch CPU tensors in checkSameGPU (fixes pytorch#7689) (pytorch#7767) Mark stack as non-executable in NNPACK (pytorch#7752) small fixes in fusion_compiler (pytorch#7776) Run clang-format on c10d (pytorch#7791)

* Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4.

* Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) onnx/onnx@4898c9e" This reverts commit 9c679da. * Revert "Add BiasCHW fallback for GPU (pytorch#7738)" This reverts commit 14ad2e7. * Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (pytorch#7566)" This reverts commit 2ebcf4b.

Peter Yeh added 20 commits April 26, 2018 14:41

Add hip support for caffe2 core

ca6a441

Add MIOPEN header/wrapper to caffe2 core

cb41397

Add HIP device into caffe2 PB

18858cb

top level makefile change for rocm/hip

51a49c1

makefile scaffolding for AMD/RocM/HIP

3a48b1e

Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files

b45a633

caffe2 PB update for AMD/ROCM HIP device

cda1da6

Add AMD/RocM/Thrust dependency

8f9c63a

Merge branch 'caffe2_core_hip' of github.com:petrex/pytorch into caff…

a1171f0

…e2_core_hip * 'caffe2_core_hip' of github.com:petrex/pytorch: caffe2 PB update for AMD/ROCM HIP device

HIP threadpool update

d867613

Fix makefile macro

666173d

makefile fix: duplicate test/binary name

4609bc6

makefile clean-up

e9bdd6a

makefile clean-up

f5496be

add HIP operator registry

7653aff

add utilities for hip device

8c7da75

Add USE_HIP to config summary

3db165a

makefile fix for BUILD_TEST

6525317

merge latest and fix test build

7606aa1

merge latest

0c0aa1b

bddppq self-requested a review May 15, 2018 06:18

Peter Yeh and others added 3 commits May 15, 2018 08:43

Fix indentation

e19f33d

code clean-up

de710b4

Merge branch 'master' into caffe2_core_hip

8bfa47f

Jorghi12 self-requested a review May 15, 2018 20:40

Merge branch 'master' into caffe2_core_hip

00ebc31

Jorghi12 reviewed May 16, 2018

View reviewed changes

Merge branch 'master' into caffe2_core_hip

5b01dc0

dzhulgakov requested a review from ajtulloch May 16, 2018 07:02

bddppq added 2 commits May 19, 2018 01:00

Merge branch 'master' into caffe2_core_hip

f77b1ca

Fix the wrong usage of environment variables detection in cmake

bf7446a

Jorghi12 reviewed May 19, 2018

View reviewed changes

cmake/public/LoadHIP.cmake

@@ -1,42 +1,86 @@

set(PYTORCH_FOUND_HIP FALSE)

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

Jorghi12 reviewed May 19, 2018

View reviewed changes

aten/CMakeLists.txt

# Find the HIP package, set the HIP paths, load the HIP CMake.

IF(WITH_ROCM)

include(LoadHIP)

if (NOT PYTORCH_FOUND_HIP)

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

bddppq and others added 2 commits May 21, 2018 12:12

Merge branch 'master' into caffe2_core_hip

8fbb954

bddppq approved these changes May 22, 2018

View reviewed changes

bddppq added 2 commits May 21, 2018 17:25

Merge branch 'master' into caffe2_core_hip

758e5b2

Merge branch 'master' into caffe2_core_hip

50eadcf

Merge branch 'master' into caffe2_core_hip

e88cf91

Merge branch 'master' into caffe2_core_hip

f76bd00

soumith approved these changes May 23, 2018

View reviewed changes

Peter Yeh added 3 commits May 23, 2018 14:29

Add MIOPEN CNN operators

6e89ad4

Revert "Add MIOPEN CNN operators"

3eb604e

This reverts commit 6e89ad4.

bddppq merged commit 2ebcf4b into pytorch:master May 23, 2018

bddppq added a commit that referenced this pull request May 24, 2018

Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)"

95cfc01

This reverts commit 2ebcf4b.

bddppq mentioned this pull request May 24, 2018

Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2" #7802

Merged

ezyang added the open source label Jun 24, 2019

Conversation

petrex commented May 15, 2018

Uh oh!

ezyang commented May 15, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

bddppq commented May 22, 2018

Uh oh!

ezyang commented May 23, 2018

Uh oh!

bddppq commented May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

soumith left a comment

Choose a reason for hiding this comment

Uh oh!

bddppq commented May 23, 2018

Uh oh!

petrex commented May 23, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

bddppq commented May 23, 2018 •

edited

Loading

petrex commented May 23, 2018 •

edited

Loading