Fix ROCm build in CUDACachingAllocator#176136
Conversation
- Add USE_ROCM guards for handle type properties which use HIP-specific enum values instead of CUDA driver API ones - Fix pointer arithmetic on void* ptr_ by casting to char* for hipMemMap, hipMemUnmap, and hipMemSetAccess calls - Refactor hipMemImportFromShareableHandle to clarify the ROCM_VERSION-dependent fd pointer cast Signed-off-by: Jean Schmidt <contato@jschmidt.me>
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176136
Note: Links to docs will display an error until the docs builds have been completed. ❌ 5 New Failures, 7 Unrelated FailuresAs of commit 1cae14d with merge base 87f052c ( NEW FAILURES - The following jobs have failed:
FLAKY - The following jobs failed but were likely due to flakiness present on trunk:
BROKEN TRUNK - The following job failed but were present on the merge base:👉 Rebase onto the `viable/strict` branch to avoid these failures
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
| if (enable_ipc_handles) { | ||
| if (CUDAAllocatorConfig::expandable_segments_handle_type() != | ||
| Expandable_Segments_Handle_Type::FABRIC_HANDLE) { | ||
| #ifdef USE_ROCM |
There was a problem hiding this comment.
Why hipify is to taking care of this type of changes?
Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: rocm-mi300 / linux-noble-rocm-py3.12-mi300 / test (default, 2, 6, linux.rocm.gpu.gfx942.1) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Fixes the ROCm (HIP) build of CUDACachingAllocator.cpp which was broken by two categories of errors: 1. Embedded preprocessor directives inside macro arguments — The #if ROCM_VERSION >= 70100 / #endif block was nested inside a C10_CUDA_CHECK(hipMemImportFromShareableHandle(...)) macro call. Clang rejects this with -Werror,-Wembedded-directive. Fixed by hoisting the conditional into a separate variable (myfd_ptr) before the macro invocation. 2. Void pointer arithmetic — The HIP API uses void* for ptr_, unlike the CUDA driver API which uses CUdeviceptr (an integer type). Arithmetic on void* is undefined behavior and rejected by Clang. Fixed by casting ptr_ to char* via reinterpret_cast in the three call sites (hipMemSetAccess, hipMemMap, hipMemUnmap). Also fixes requestedHandleTypes → requestedHandleType for the HIP CUmemAllocationProp struct, which uses a different field name than the CUDA equivalent. This is a FF for issues introduced in pytorch#173330 Pull Request resolved: pytorch#176136 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Use `ptr()` helper (returns `char*`) instead of raw `ptr_` for pointer arithmetic in `cuMemSetAccess`, `cuMemMap`, and `cuMemUnmap` calls. After hipify converts `CUdeviceptr` (unsigned long long) to `hipDeviceptr_t` (void*), arithmetic on `ptr_` becomes void pointer arithmetic which Clang rejects with `-Werror`. Using `ptr()` + `reinterpret_cast` avoids this while keeping CUDA driver API compatibility. Also improves the USE_ROCM blocks to use `ptr()` instead of `reinterpret_cast<char*>(ptr_)` for consistency. This is a forward fix for void* arithmetic errors introduced by #173330 that #176136 attempted to fix via `#ifdef USE_ROCM` blocks. Co-Authored-By: Claude <noreply@anthropic.com>
Use `ptr()` helper (returns `char*`) instead of raw `ptr_` for pointer arithmetic in `cuMemSetAccess`, `cuMemMap`, and `cuMemUnmap` calls. After hipify converts `CUdeviceptr` (unsigned long long) to `hipDeviceptr_t` (void*), arithmetic on `ptr_` becomes void pointer arithmetic which Clang rejects with `-Werror`. Using `ptr()` + `reinterpret_cast` avoids this while keeping CUDA driver API compatibility. Also improves the USE_ROCM blocks to use `ptr()` instead of `reinterpret_cast<char*>(ptr_)` for consistency. This is a forward fix for void* arithmetic errors introduced by #173330 that #176136 attempted to fix via `#ifdef USE_ROCM` blocks. Co-Authored-By: Claude <noreply@anthropic.com>
|
@pytorchmergebot revert -m "reverting to put back on top of the trunk and develop a full forward fix" -c ghfirst |
|
@pytorchbot successfully started a revert job. Check the current status here. |
This reverts commit 4bdfaaa. Reverted #176136 on behalf of https://github.com/wdvr due to reverting to put back on top of the trunk and develop a full forward fix ([comment](#176136 (comment)))
|
@jeanschmidt your PR has been successfully reverted. |
This reverts commit d49571b. Reverted #173330 on behalf of https://github.com/wdvr due to sorry having issues with fixing build issues internally -- will revert and reland on top of master, and have a combined improved build fix instead of #176136 ([comment](#173330 (comment)))
Pull Request resolved: #173330 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>
|
This will be incorporated into the original expandable segments PR because it was reverted. Closing. |
Fixes the ROCm (HIP) build of CUDACachingAllocator.cpp which was broken by two categories of errors: 1. Embedded preprocessor directives inside macro arguments — The #if ROCM_VERSION >= 70100 / #endif block was nested inside a C10_CUDA_CHECK(hipMemImportFromShareableHandle(...)) macro call. Clang rejects this with -Werror,-Wembedded-directive. Fixed by hoisting the conditional into a separate variable (myfd_ptr) before the macro invocation. 2. Void pointer arithmetic — The HIP API uses void* for ptr_, unlike the CUDA driver API which uses CUdeviceptr (an integer type). Arithmetic on void* is undefined behavior and rejected by Clang. Fixed by casting ptr_ to char* via reinterpret_cast in the three call sites (hipMemSetAccess, hipMemMap, hipMemUnmap). Also fixes requestedHandleTypes → requestedHandleType for the HIP CUmemAllocationProp struct, which uses a different field name than the CUDA equivalent. This is a FF for issues introduced in pytorch#173330 Pull Request resolved: pytorch#176136 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
Use `ptr()` helper (returns `char*`) instead of raw `ptr_` for pointer arithmetic in `cuMemSetAccess`, `cuMemMap`, and `cuMemUnmap` calls. After hipify converts `CUdeviceptr` (unsigned long long) to `hipDeviceptr_t` (void*), arithmetic on `ptr_` becomes void pointer arithmetic which Clang rejects with `-Werror`. Using `ptr()` + `reinterpret_cast` avoids this while keeping CUDA driver API compatibility. Also improves the USE_ROCM blocks to use `ptr()` instead of `reinterpret_cast<char*>(ptr_)` for consistency. This is a forward fix for void* arithmetic errors introduced by pytorch#173330 that pytorch#176136 attempted to fix via `#ifdef USE_ROCM` blocks. Co-Authored-By: Claude <noreply@anthropic.com>
This reverts commit 4bdfaaa. Reverted pytorch#176136 on behalf of https://github.com/wdvr due to reverting to put back on top of the trunk and develop a full forward fix ([comment](pytorch#176136 (comment)))
This reverts commit d49571b. Reverted pytorch#173330 on behalf of https://github.com/wdvr due to sorry having issues with fixing build issues internally -- will revert and reland on top of master, and have a combined improved build fix instead of pytorch#176136 ([comment](pytorch#173330 (comment)))
Fixes the ROCm (HIP) build of CUDACachingAllocator.cpp which was broken by two categories of errors:
Also fixes requestedHandleTypes → requestedHandleType for the HIP CUmemAllocationProp struct, which uses a different field name than the CUDA equivalent.
This is a FF for issues introduced in #173330
cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang