[Caffe2] Update hip files by rohithkrn · Pull Request #9826 · pytorch/pytorch

rohithkrn · 2018-07-25T17:09:05Z

The goal of this PR is to update the hip files to reflect relevant changes in cuda source files.

rohithkrn · 2018-07-25T17:12:15Z

@bddppq This also adds THCCachingAllocator for hip. But, THCCachingAllocator_hip.cc includes THCCachingAllocator.h has code specific to cuda. Could we rename that to THCCachingAllocator_gpu.h, so that we can add corresponding hip file.

@petrex

petrex · 2018-07-25T17:20:38Z

@bddppq is there an update on the base docker img?

Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/conda3-cuda9.0-cudnn7-ubuntu16.04:126 not found

caffe2/core/hip/context_hip.h

  hipStream_t GetStream(int gpu, int stream_id) {
    vector<hipStream_t>& gpu_streams = hip_streams_[gpu];
-    if (gpu_streams.size() <= stream_id) {
+    if (gpu_streams.size() <= (unsigned)stream_id) {


caffe2/operators/hip/softmax_op_miopen.cc

    const int D = X.size_from_dim(canonical_axis);

    Y->ResizeLike(X);
+    auto* Y_data = Y->template mutable_data<T>();


ezyang · 2018-07-25T17:30:38Z

There is currently an outage for caffe2 rocm builds. It will be fixed in the next hour.

ezyang · 2018-07-25T17:37:22Z

@pytorchbot retest this please

caffe2/core/hip/THCCachingAllocator_hip.cc

@@ -0,0 +1,309 @@
+#include "caffe2/core/THCCachingAllocator.h"


caffe2/operators/hip/softmax_op_miopen.cc

    const int D = X.size_from_dim(canonical_axis);

    Y->ResizeLike(X);
+    auto* Y_data = Y->template mutable_data<T>();


bddppq · 2018-07-26T03:28:01Z

@pytorchbot retest this please

bddppq · 2018-07-27T17:34:10Z

better to hipify THCCachingAllocator files

bddppq · 2018-07-27T17:56:47Z

@rohithkrn I just checked there are only two places in aten using cudaError instead of cudaErorr_t, I think we can simply change those two places and remove the "cudaError" -> "hipError_t" mapping

@ezyang @Jorghi12

rohithkrn · 2018-07-27T17:59:26Z

@bddppq I have changed the mapping order to make it work in the current state. But yes, changing aten also should work.

bddppq

LG

facebook-github-bot

bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: This was introduced in #9826 following the corresponding cuda file context_gpu.cu file, tests have passed in the PR, at that point master was 94439d7. However during the long landing process, a new master commit aebf3b4 has come in that removed the `CAFFE_KNOWN_TYPE(Tensor<HIPContext>)` in context_hip.cc file, which then has broken the HIP BlobStatGetter, and we did NOT run tests again during merge and so when #9826 later landed to master the rocm tests start breaking. Pull Request resolved: #9973 Differential Revision: D9040671 Pulled By: bddppq fbshipit-source-id: f3b16cabaf681fc0535ca733db0b48430868f922

Summary: The goal of this PR is to update the hip files to reflect relevant changes in cuda source files. Pull Request resolved: pytorch#9826 Differential Revision: D9032840 Pulled By: bddppq fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f

) Summary: This was introduced in pytorch#9826 following the corresponding cuda file context_gpu.cu file, tests have passed in the PR, at that point master was 94439d7. However during the long landing process, a new master commit aebf3b4 has come in that removed the `CAFFE_KNOWN_TYPE(Tensor<HIPContext>)` in context_hip.cc file, which then has broken the HIP BlobStatGetter, and we did NOT run tests again during merge and so when pytorch#9826 later landed to master the rocm tests start breaking. Pull Request resolved: pytorch#9973 Differential Revision: D9040671 Pulled By: bddppq fbshipit-source-id: f3b16cabaf681fc0535ca733db0b48430868f922

Summary: The goal of this PR is to update the hip files to reflect relevant changes in cuda source files. Pull Request resolved: pytorch#9826 Differential Revision: D9032840 Pulled By: bddppq fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f

) Summary: This was introduced in pytorch#9826 following the corresponding cuda file context_gpu.cu file, tests have passed in the PR, at that point master was 94439d7. However during the long landing process, a new master commit aebf3b4 has come in that removed the `CAFFE_KNOWN_TYPE(Tensor<HIPContext>)` in context_hip.cc file, which then has broken the HIP BlobStatGetter, and we did NOT run tests again during merge and so when pytorch#9826 later landed to master the rocm tests start breaking. Pull Request resolved: pytorch#9973 Differential Revision: D9040671 Pulled By: bddppq fbshipit-source-id: f3b16cabaf681fc0535ca733db0b48430868f922

rohithkrn added 6 commits July 24, 2018 18:44

sync with cuda source

ad7afb8

sync with cuda source

4e65ffd

sync with cuda source

b1c7d46

sync with cuda source, fix empty batch

af46980

sync with cuda source, fix empty batch

b67801d

add THCCachingAllocator for hip

59932bb

petrex suggested changes Jul 25, 2018

View reviewed changes

weiyangfb added the caffe2 label Jul 25, 2018

bddppq reviewed Jul 26, 2018

View reviewed changes

Nallamaddi, Rohith and others added 9 commits July 25, 2018 21:14

rename to THCCachingAllocator.h -> THCCachingAllocator_gpu.h

9637cf6

update include path

2a88922

update include path

3f3a83e

add THCCachingAllocator_hip.h

f93549f

update include path

9bba378

update include path

e05fd8f

remove batch_size=0 filter

10828d1

Merge remote-tracking branch 'upstream/master' into update-hip-files

d38b168

Merge remote-tracking branch 'upstream/master' into update-hip-files

aab27ed

rohithkrn added 4 commits July 27, 2018 10:51

include THCCachingAllocator

9273654

modify mapping order to prevent cudaError substitution

f4f21c9

hipify THCCachingAllocator_hip.h

09f1541

hipify THCCachingAllocator_hip.cc

3cd7b0f

rohithkrn requested review from apaszke and colesbury as code owners July 27, 2018 17:54

rohithkrn requested review from ezyang, gchanan, soumith and zdevito as code owners July 27, 2018 17:54

bddppq approved these changes Jul 27, 2018

View reviewed changes

facebook-github-bot reviewed Jul 27, 2018

View reviewed changes

facebook-github-bot closed this in c3fe071 Jul 27, 2018

bddppq mentioned this pull request Jul 28, 2018

Fix BlobStatRegistry HIP BlobStatGetter registration issue #9973

Closed

ezyang added open source merged labels Jun 24, 2019

		@@ -0,0 +1,309 @@
		#include "caffe2/core/THCCachingAllocator.h"

Conversation

rohithkrn commented Jul 25, 2018

Uh oh!

rohithkrn commented Jul 25, 2018

Uh oh!

petrex commented Jul 25, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang commented Jul 25, 2018

Uh oh!

ezyang commented Jul 25, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

bddppq commented Jul 26, 2018

Uh oh!

bddppq commented Jul 27, 2018

Uh oh!

bddppq commented Jul 27, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rohithkrn commented Jul 27, 2018

Uh oh!

bddppq left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

bddppq commented Jul 27, 2018 •

edited

Loading