Skip to content

[ROCm] improve use of ROCm libraries, enable more tests, small fixes#10406

Closed
iotamudelta wants to merge 335 commits intopytorch:masterfrom
ROCm:libraries_PR
Closed

[ROCm] improve use of ROCm libraries, enable more tests, small fixes#10406
iotamudelta wants to merge 335 commits intopytorch:masterfrom
ROCm:libraries_PR

Conversation

@iotamudelta
Copy link
Contributor

  • some small leftovers from the last PR review
  • enable more unit test sets for CI
  • replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
  • use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
  • use strided_batched gemm interface also from the batched internal interface
  • re-enable Dropout.cu as we now have philox w/ rocRAND

iotamudelta and others added 30 commits July 13, 2018 10:47
Add workarounds in pyHIPIFY for __forceinline__ and std:: math functions in HIP
This requires changes to the hipify script (essentially making it a
little smarter by only statically casting arguments 5 and onwards (which
are the actual kernel arguments, ignoring the launch arguments) while
still doing the templating for the actual kernel.

Co-production w/ Jithun to get the logic right.
As per review, change cast to just int.
Removed the change to cudaOccupancyMaxActiveBlocksPerMultiprocessor in SoftMax.cu inside disable_features.yaml.
Refactoring & Fixing the pyhipify script.
… work for multi-GPU setup anyway, and gives a seg fault on call to getNumGPUs()
… baseline tests for ROCm CI; add override so user can run all tests if desired
Hardcode getNumGPUs() to 1 for ROCm builds …
hcRNG is not supported any longer, rocRAND is the library replacing it.

This requires us to disable one function, add some include directories
to setup.py and add an include to THCTensorRandom if we are in the ROCm
context.
Disabling KMTHINLTO.
Including HCC issue #.
THError("Cublas_SgemmBatched only supports m, n, k, lda, ldb, ldc, batchCount"
"with the bound [val] <= %d", INT_MAX);
}

This comment was marked as off-topic.

double alpha, const double *a[], int64_t lda, const double *b[], int64_t ldb,
double beta, double *c[], int64_t ldc, int64_t batchCount)
{
if( (m >= INT_MAX) || (n >= INT_MAX) || (k >= INT_MAX) || (lda >= INT_MAX) || (ldb >= INT_MAX) || (ldc >= INT_MAX) || (batchCount >= INT_MAX) )

This comment was marked as off-topic.

ezyang
ezyang previously requested changes Aug 10, 2018
Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lint failed. This must be fixed before we can merge.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

FIND_LIBRARY(HIPRAND_LIBRARY hiprand HINTS ${HIPRAND_PATH}/lib)

list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${HIPBLAS_LIBRARY} ${HIPRNG_LIBRARY})
list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${ROCBLAS_LIBRARY} ${HIPRAND_LIBRARY})

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Aug 13, 2018
Summary:
* some small leftovers from the last PR review
* enable more unit test sets for CI
* replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
* use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
* use strided_batched gemm interface also from the batched internal interface
* re-enable Dropout.cu as we now have philox w/ rocRAND
Pull Request resolved: pytorch/pytorch#10406

Reviewed By: Jorghi12

Differential Revision: D9277093

Pulled By: ezyang

fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
…h#10406)

Summary:
* some small leftovers from the last PR review
* enable more unit test sets for CI
* replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND)
* use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2
* use strided_batched gemm interface also from the batched internal interface
* re-enable Dropout.cu as we now have philox w/ rocRAND
Pull Request resolved: pytorch#10406

Reviewed By: Jorghi12

Differential Revision: D9277093

Pulled By: ezyang

fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2
@jithunnair-amd jithunnair-amd deleted the libraries_PR branch September 25, 2025 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants