[ROCm] improve use of ROCm libraries, enable more tests, small fixes#10406
[ROCm] improve use of ROCm libraries, enable more tests, small fixes#10406iotamudelta wants to merge 335 commits intopytorch:masterfrom
Conversation
iotamudelta
commented
Aug 10, 2018
- some small leftovers from the last PR review
- enable more unit test sets for CI
- replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
- use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
- use strided_batched gemm interface also from the batched internal interface
- re-enable Dropout.cu as we now have philox w/ rocRAND
Add workarounds in pyHIPIFY for __forceinline__ and std:: math functions in HIP
This requires changes to the hipify script (essentially making it a little smarter by only statically casting arguments 5 and onwards (which are the actual kernel arguments, ignoring the launch arguments) while still doing the templating for the actual kernel. Co-production w/ Jithun to get the logic right.
Merge from upstream
No more patches
As per review, change cast to just int.
Removed the change to cudaOccupancyMaxActiveBlocksPerMultiprocessor in SoftMax.cu inside disable_features.yaml.
…hin hipify-python
Merge from upstream
Refactoring & Fixing the pyhipify script.
… work for multi-GPU setup anyway, and gives a seg fault on call to getNumGPUs()
… baseline tests for ROCm CI; add override so user can run all tests if desired
Hardcode getNumGPUs() to 1 for ROCm builds …
Merge from upstream
hcRNG is not supported any longer, rocRAND is the library replacing it. This requires us to disable one function, add some include directories to setup.py and add an include to THCTensorRandom if we are in the ROCm context.
Disabling KMTHINLTO.
Including HCC issue #.
Enable test_torch, test_dataloader, test_indexing and test_utils …
Merge from upstream
| THError("Cublas_SgemmBatched only supports m, n, k, lda, ldb, ldc, batchCount" | ||
| "with the bound [val] <= %d", INT_MAX); | ||
| } | ||
|
|
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| double alpha, const double *a[], int64_t lda, const double *b[], int64_t ldb, | ||
| double beta, double *c[], int64_t ldc, int64_t batchCount) | ||
| { | ||
| if( (m >= INT_MAX) || (n >= INT_MAX) || (k >= INT_MAX) || (lda >= INT_MAX) || (ldb >= INT_MAX) || (ldc >= INT_MAX) || (batchCount >= INT_MAX) ) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
ezyang
left a comment
There was a problem hiding this comment.
Lint failed. This must be fixed before we can merge.
facebook-github-bot
left a comment
There was a problem hiding this comment.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Remove duplicate definition of skipIfRocm
Scope ifdef down as per review.
facebook-github-bot
left a comment
There was a problem hiding this comment.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
| FIND_LIBRARY(HIPRAND_LIBRARY hiprand HINTS ${HIPRAND_PATH}/lib) | ||
|
|
||
| list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${HIPBLAS_LIBRARY} ${HIPRNG_LIBRARY}) | ||
| list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${ROCBLAS_LIBRARY} ${HIPRAND_LIBRARY}) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
Summary: * some small leftovers from the last PR review * enable more unit test sets for CI * replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND) * use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2 * use strided_batched gemm interface also from the batched internal interface * re-enable Dropout.cu as we now have philox w/ rocRAND Pull Request resolved: pytorch/pytorch#10406 Reviewed By: Jorghi12 Differential Revision: D9277093 Pulled By: ezyang fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
…h#10406) Summary: * some small leftovers from the last PR review * enable more unit test sets for CI * replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND) * use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2 * use strided_batched gemm interface also from the batched internal interface * re-enable Dropout.cu as we now have philox w/ rocRAND Pull Request resolved: pytorch#10406 Reviewed By: Jorghi12 Differential Revision: D9277093 Pulled By: ezyang fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2