Skip to content

[Caffe2] Update hip files#9826

Closed
rohithkrn wants to merge 19 commits intopytorch:masterfrom
rohithkrn:update-hip-files
Closed

[Caffe2] Update hip files#9826
rohithkrn wants to merge 19 commits intopytorch:masterfrom
rohithkrn:update-hip-files

Conversation

@rohithkrn
Copy link
Contributor

The goal of this PR is to update the hip files to reflect relevant changes in cuda source files.

@rohithkrn
Copy link
Contributor Author

@bddppq This also adds THCCachingAllocator for hip. But, THCCachingAllocator_hip.cc includes THCCachingAllocator.h has code specific to cuda. Could we rename that to THCCachingAllocator_gpu.h, so that we can add corresponding hip file.

@petrex

@petrex
Copy link
Contributor

petrex commented Jul 25, 2018

@bddppq is there an update on the base docker img?

Error response from daemon: manifest for 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/conda3-cuda9.0-cudnn7-ubuntu16.04:126 not found

hipStream_t GetStream(int gpu, int stream_id) {
vector<hipStream_t>& gpu_streams = hip_streams_[gpu];
if (gpu_streams.size() <= stream_id) {
if (gpu_streams.size() <= (unsigned)stream_id) {

This comment was marked as off-topic.

This comment was marked as off-topic.

const int D = X.size_from_dim(canonical_axis);

Y->ResizeLike(X);
auto* Y_data = Y->template mutable_data<T>();

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@ezyang
Copy link
Contributor

ezyang commented Jul 25, 2018

There is currently an outage for caffe2 rocm builds. It will be fixed in the next hour.

@ezyang
Copy link
Contributor

ezyang commented Jul 25, 2018

@pytorchbot retest this please

@@ -0,0 +1,309 @@
#include "caffe2/core/THCCachingAllocator.h"

This comment was marked as off-topic.

This comment was marked as off-topic.

const int D = X.size_from_dim(canonical_axis);

Y->ResizeLike(X);
auto* Y_data = Y->template mutable_data<T>();

This comment was marked as off-topic.

@bddppq
Copy link
Contributor

bddppq commented Jul 26, 2018

@pytorchbot retest this please

@bddppq
Copy link
Contributor

bddppq commented Jul 27, 2018

better to hipify THCCachingAllocator files

@bddppq
Copy link
Contributor

bddppq commented Jul 27, 2018

@rohithkrn I just checked there are only two places in aten using cudaError instead of cudaErorr_t, I think we can simply change those two places and remove the "cudaError" -> "hipError_t" mapping

@ezyang @Jorghi12

@rohithkrn
Copy link
Contributor Author

@bddppq I have changed the mapping order to make it work in the current state. But yes, changing aten also should work.

Copy link
Contributor

@bddppq bddppq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LG

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bddppq has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot pushed a commit that referenced this pull request Jul 28, 2018
Summary:
This was introduced in #9826 following the corresponding cuda file context_gpu.cu file, tests have passed in the PR, at that point master was 94439d7. However during the long landing process, a new master commit aebf3b4 has come in that removed the `CAFFE_KNOWN_TYPE(Tensor<HIPContext>)` in context_hip.cc file, which then has broken the HIP BlobStatGetter, and we did NOT run tests again during merge and so when #9826 later landed to master the rocm tests start breaking.
Pull Request resolved: #9973

Differential Revision: D9040671

Pulled By: bddppq

fbshipit-source-id: f3b16cabaf681fc0535ca733db0b48430868f922
jramseyer pushed a commit to jramseyer/pytorch that referenced this pull request Jul 30, 2018
Summary:
The goal of this PR is to update the hip files to reflect relevant changes in cuda source files.
Pull Request resolved: pytorch#9826

Differential Revision: D9032840

Pulled By: bddppq

fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f
jramseyer pushed a commit to jramseyer/pytorch that referenced this pull request Jul 30, 2018
)

Summary:
This was introduced in pytorch#9826 following the corresponding cuda file context_gpu.cu file, tests have passed in the PR, at that point master was 94439d7. However during the long landing process, a new master commit aebf3b4 has come in that removed the `CAFFE_KNOWN_TYPE(Tensor<HIPContext>)` in context_hip.cc file, which then has broken the HIP BlobStatGetter, and we did NOT run tests again during merge and so when pytorch#9826 later landed to master the rocm tests start breaking.
Pull Request resolved: pytorch#9973

Differential Revision: D9040671

Pulled By: bddppq

fbshipit-source-id: f3b16cabaf681fc0535ca733db0b48430868f922
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary:
The goal of this PR is to update the hip files to reflect relevant changes in cuda source files.
Pull Request resolved: pytorch#9826

Differential Revision: D9032840

Pulled By: bddppq

fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
)

Summary:
This was introduced in pytorch#9826 following the corresponding cuda file context_gpu.cu file, tests have passed in the PR, at that point master was 94439d7. However during the long landing process, a new master commit aebf3b4 has come in that removed the `CAFFE_KNOWN_TYPE(Tensor<HIPContext>)` in context_hip.cc file, which then has broken the HIP BlobStatGetter, and we did NOT run tests again during merge and so when pytorch#9826 later landed to master the rocm tests start breaking.
Pull Request resolved: pytorch#9973

Differential Revision: D9040671

Pulled By: bddppq

fbshipit-source-id: f3b16cabaf681fc0535ca733db0b48430868f922
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants