Skip to content

Fix ROCm build in CUDACachingAllocator#176136

Closed
jeanschmidt wants to merge 3 commits intomainfrom
jeanschmidt-fix-rocm-build
Closed

Fix ROCm build in CUDACachingAllocator#176136
jeanschmidt wants to merge 3 commits intomainfrom
jeanschmidt-fix-rocm-build

Conversation

@jeanschmidt
Copy link
Copy Markdown
Contributor

@jeanschmidt jeanschmidt commented Mar 2, 2026

Fixes the ROCm (HIP) build of CUDACachingAllocator.cpp which was broken by two categories of errors:

  1. Embedded preprocessor directives inside macro arguments — The #if ROCM_VERSION >= 70100 / #endif block was nested inside a C10_CUDA_CHECK(hipMemImportFromShareableHandle(...)) macro call. Clang rejects this with -Werror,-Wembedded-directive. Fixed by hoisting the conditional into a separate variable (myfd_ptr) before the macro invocation.
  2. Void pointer arithmetic — The HIP API uses void* for ptr_, unlike the CUDA driver API which uses CUdeviceptr (an integer type). Arithmetic on void* is undefined behavior and rejected by Clang. Fixed by casting ptr_ to char* via reinterpret_cast in the three call sites (hipMemSetAccess, hipMemMap, hipMemUnmap).

Also fixes requestedHandleTypes → requestedHandleType for the HIP CUmemAllocationProp struct, which uses a different field name than the CUDA equivalent.

This is a FF for issues introduced in #173330

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

- Add USE_ROCM guards for handle type properties which use
HIP-specific enum values instead of CUDA driver API ones
- Fix pointer arithmetic on void* ptr_ by casting to char* for
hipMemMap, hipMemUnmap, and hipMemSetAccess calls
- Refactor hipMemImportFromShareableHandle to clarify the
ROCM_VERSION-dependent fd pointer cast

Signed-off-by: Jean Schmidt <contato@jschmidt.me>
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 2, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176136

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 7 Unrelated Failures

As of commit 1cae14d with merge base 87f052c (image):

NEW FAILURES - The following jobs have failed:

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 module: rocm AMD GPU support for Pytorch labels Mar 2, 2026
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Mar 2, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@jeanschmidt jeanschmidt self-assigned this Mar 2, 2026
@jeanschmidt jeanschmidt added the topic: not user facing topic category label Mar 2, 2026
if (enable_ipc_handles) {
if (CUDAAllocatorConfig::expandable_segments_handle_type() !=
Expandable_Segments_Handle_Type::FABRIC_HANDLE) {
#ifdef USE_ROCM
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why hipify is to taking care of this type of changes?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤷🏻‍♂️

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
@jeanschmidt
Copy link
Copy Markdown
Contributor Author

@pytorchbot merge -i

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 2, 2026
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: rocm-mi300 / linux-noble-rocm-py3.12-mi300 / test (default, 2, 6, linux.rocm.gpu.gfx942.1)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

postmath pushed a commit to postmath/pytorch that referenced this pull request Mar 3, 2026
Fixes the ROCm (HIP) build of CUDACachingAllocator.cpp which was broken by two categories of errors:

1. Embedded preprocessor directives inside macro arguments — The #if ROCM_VERSION >= 70100 / #endif block was nested inside a C10_CUDA_CHECK(hipMemImportFromShareableHandle(...)) macro call. Clang rejects this with -Werror,-Wembedded-directive. Fixed by hoisting the conditional into a separate variable (myfd_ptr) before the macro invocation.
2. Void pointer arithmetic — The HIP API uses void* for ptr_, unlike the CUDA driver API which uses CUdeviceptr (an integer type). Arithmetic on void* is undefined behavior and rejected by Clang. Fixed by casting ptr_ to char* via reinterpret_cast in the three call sites (hipMemSetAccess, hipMemMap, hipMemUnmap).

Also fixes requestedHandleTypes → requestedHandleType for the HIP CUmemAllocationProp struct, which uses a different field name than the CUDA equivalent.

This is a FF for issues introduced in pytorch#173330

Pull Request resolved: pytorch#176136
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
wdvr added a commit that referenced this pull request Mar 6, 2026
Use `ptr()` helper (returns `char*`) instead of raw `ptr_` for pointer
arithmetic in `cuMemSetAccess`, `cuMemMap`, and `cuMemUnmap` calls.

After hipify converts `CUdeviceptr` (unsigned long long) to
`hipDeviceptr_t` (void*), arithmetic on `ptr_` becomes void pointer
arithmetic which Clang rejects with `-Werror`. Using `ptr()` +
`reinterpret_cast` avoids this while keeping CUDA driver API
compatibility.

Also improves the USE_ROCM blocks to use `ptr()` instead of
`reinterpret_cast<char*>(ptr_)` for consistency.

This is a forward fix for void* arithmetic errors introduced by #173330
that #176136 attempted to fix via `#ifdef USE_ROCM` blocks.

Co-Authored-By: Claude <noreply@anthropic.com>
wdvr added a commit that referenced this pull request Mar 6, 2026
Use `ptr()` helper (returns `char*`) instead of raw `ptr_` for pointer
arithmetic in `cuMemSetAccess`, `cuMemMap`, and `cuMemUnmap` calls.

After hipify converts `CUdeviceptr` (unsigned long long) to
`hipDeviceptr_t` (void*), arithmetic on `ptr_` becomes void pointer
arithmetic which Clang rejects with `-Werror`. Using `ptr()` +
`reinterpret_cast` avoids this while keeping CUDA driver API
compatibility.

Also improves the USE_ROCM blocks to use `ptr()` instead of
`reinterpret_cast<char*>(ptr_)` for consistency.

This is a forward fix for void* arithmetic errors introduced by #173330
that #176136 attempted to fix via `#ifdef USE_ROCM` blocks.

Co-Authored-By: Claude <noreply@anthropic.com>
@wdvr
Copy link
Copy Markdown
Contributor

wdvr commented Mar 9, 2026

@pytorchmergebot revert -m "reverting to put back on top of the trunk and develop a full forward fix" -c ghfirst

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Mar 9, 2026
This reverts commit 4bdfaaa.

Reverted #176136 on behalf of https://github.com/wdvr due to reverting to put back on top of the trunk and develop a full forward fix ([comment](#176136 (comment)))
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@jeanschmidt your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added the ci-no-td Do not run TD on this PR label Mar 9, 2026
pytorchmergebot added a commit that referenced this pull request Mar 9, 2026
This reverts commit d49571b.

Reverted #173330 on behalf of https://github.com/wdvr due to sorry having issues with fixing build issues internally -- will revert and reland on top of master, and have a combined improved build fix instead of #176136 ([comment](#173330 (comment)))
pytorchmergebot referenced this pull request Mar 9, 2026
Pull Request resolved: #173330
Approved by: https://github.com/jeffdaily

Co-authored-by: Jeff Daily <jeff.daily@amd.com>
@jeffdaily
Copy link
Copy Markdown
Collaborator

This will be incorporated into the original expandable segments PR because it was reverted. Closing.

@jeffdaily jeffdaily closed this Mar 11, 2026
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
Fixes the ROCm (HIP) build of CUDACachingAllocator.cpp which was broken by two categories of errors:

1. Embedded preprocessor directives inside macro arguments — The #if ROCM_VERSION >= 70100 / #endif block was nested inside a C10_CUDA_CHECK(hipMemImportFromShareableHandle(...)) macro call. Clang rejects this with -Werror,-Wembedded-directive. Fixed by hoisting the conditional into a separate variable (myfd_ptr) before the macro invocation.
2. Void pointer arithmetic — The HIP API uses void* for ptr_, unlike the CUDA driver API which uses CUdeviceptr (an integer type). Arithmetic on void* is undefined behavior and rejected by Clang. Fixed by casting ptr_ to char* via reinterpret_cast in the three call sites (hipMemSetAccess, hipMemMap, hipMemUnmap).

Also fixes requestedHandleTypes → requestedHandleType for the HIP CUmemAllocationProp struct, which uses a different field name than the CUDA equivalent.

This is a FF for issues introduced in pytorch#173330

Pull Request resolved: pytorch#176136
Approved by: https://github.com/malfet

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
Use `ptr()` helper (returns `char*`) instead of raw `ptr_` for pointer
arithmetic in `cuMemSetAccess`, `cuMemMap`, and `cuMemUnmap` calls.

After hipify converts `CUdeviceptr` (unsigned long long) to
`hipDeviceptr_t` (void*), arithmetic on `ptr_` becomes void pointer
arithmetic which Clang rejects with `-Werror`. Using `ptr()` +
`reinterpret_cast` avoids this while keeping CUDA driver API
compatibility.

Also improves the USE_ROCM blocks to use `ptr()` instead of
`reinterpret_cast<char*>(ptr_)` for consistency.

This is a forward fix for void* arithmetic errors introduced by pytorch#173330
that pytorch#176136 attempted to fix via `#ifdef USE_ROCM` blocks.

Co-Authored-By: Claude <noreply@anthropic.com>
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
This reverts commit 4bdfaaa.

Reverted pytorch#176136 on behalf of https://github.com/wdvr due to reverting to put back on top of the trunk and develop a full forward fix ([comment](pytorch#176136 (comment)))
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
This reverts commit d49571b.

Reverted pytorch#173330 on behalf of https://github.com/wdvr due to sorry having issues with fixing build issues internally -- will revert and reland on top of master, and have a combined improved build fix instead of pytorch#176136 ([comment](pytorch#173330 (comment)))
@github-actions github-actions bot deleted the jeanschmidt-fix-rocm-build branch April 11, 2026 02:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/trunk Trigger trunk jobs on your pull request Merged module: rocm AMD GPU support for Pytorch Reverted topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants