Fix ROCm build in CUDACachingAllocator by jeanschmidt · Pull Request #176136 · pytorch/pytorch

jeanschmidt · 2026-03-02T03:53:17Z

Fixes the ROCm (HIP) build of CUDACachingAllocator.cpp which was broken by two categories of errors:

Embedded preprocessor directives inside macro arguments — The #if ROCM_VERSION >= 70100 / #endif block was nested inside a C10_CUDA_CHECK(hipMemImportFromShareableHandle(...)) macro call. Clang rejects this with -Werror,-Wembedded-directive. Fixed by hoisting the conditional into a separate variable (myfd_ptr) before the macro invocation.
Void pointer arithmetic — The HIP API uses void* for ptr_, unlike the CUDA driver API which uses CUdeviceptr (an integer type). Arithmetic on void* is undefined behavior and rejected by Clang. Fixed by casting ptr_ to char* via reinterpret_cast in the three call sites (hipMemSetAccess, hipMemMap, hipMemUnmap).

Also fixes requestedHandleTypes → requestedHandleType for the HIP CUmemAllocationProp struct, which uses a different field name than the CUDA equivalent.

This is a FF for issues introduced in #173330

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang

- Add USE_ROCM guards for handle type properties which use HIP-specific enum values instead of CUDA driver API ones - Fix pointer arithmetic on void* ptr_ by casting to char* for hipMemMap, hipMemUnmap, and hipMemSetAccess calls - Refactor hipMemImportFromShareableHandle to clarify the ROCM_VERSION-dependent fd pointer cast Signed-off-by: Jean Schmidt <contato@jschmidt.me>

pytorch-bot · 2026-03-02T03:53:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176136

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 5 New Failures, 7 Unrelated Failures

As of commit 1cae14d with merge base 87f052c ():

NEW FAILURES - The following jobs have failed:

pull / linux-jammy-py3.10-clang15 / test (openreg, 1, 1, linux.2xlarge) (gh)
RuntimeError: test_openreg 1/1 failed!
pull / linux-jammy-py3.10-clang18-asan / test (openreg, 1, 1, linux.4xlarge) (gh)
RuntimeError: test_openreg 1/1 failed!
pull / linux-jammy-py3.10-gcc11 / test (openreg, 1, 1, linux.2xlarge) (gh)
RuntimeError: test_openreg 1/1 failed!
pull / linux-jammy-py3.14-clang15 / test (openreg, 1, 1, linux.2xlarge) (gh)
RuntimeError: test_openreg 1/1 failed!
pull / linux-jammy-py3.14t-clang15 / test (openreg, 1, 1, linux.2xlarge) (gh)
RuntimeError: test_openreg 1/1 failed!

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

rocm-mi300 / linux-noble-rocm-py3.12-mi300 / test (default, 5, 6, linux.rocm.gpu.gfx942.1) (gh) (disabled by #106027 but the issue was closed recently and a rebase is needed to make it pass)
test/profiler/test_profiler.py::TestProfiler::test_profiler_cuda_sync_events
rocm-mi300 / linux-noble-rocm-py3.12-mi300 / test (default, 6, 6, linux.rocm.gpu.gfx942.1) (gh) (disabled by #107893 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_inductor_freezing.py::FreezingCpuTests::test_conv_with_as_strided_cpu
trunk / linux-jammy-rocm-py3.10 / test (default, 3, 6, linux.rocm.gpu.gfx950.1) (gh) (disabled by #107893 but the issue was closed recently and a rebase is needed to make it pass)
test/inductor/test_inductor_freezing.py::FreezingCpuTests::test_conv_with_as_strided_cpu
trunk / linux-jammy-rocm-py3.10 / test (default, 6, 6, linux.rocm.gpu.gfx950.1) (gh) (disabled by #106027 but the issue was closed recently and a rebase is needed to make it pass)
test/profiler/test_profiler.py::TestProfiler::test_profiler_cuda_sync_events
trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 3, linux.rocm.gpu.gfx950.4) (gh) (detected as infra flaky with no log or failing log classifier)
trunk / macos-py3-arm64 / test (default, 3, 3, macos-m1-stable) (gh) (disabled by #176123 but the issue was closed recently and a rebase is needed to make it pass)
test/test_indexing.py::TestIndexingMPS::test_index_reduce_reduce_mean_mps_float32

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

rocm-mi300 / linux-noble-rocm-py3.12-mi300 / test (default, 3, 6, linux.rocm.gpu.gfx942.1) (gh) (trunk failure)
test/dynamo/test_structured_trace.py::StructuredTraceTest::test_ddp_graphs

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-03-02T03:53:24Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

malfet · 2026-03-02T05:21:50Z

c10/cuda/CUDACachingAllocator.cpp

      if (enable_ipc_handles) {
        if (CUDAAllocatorConfig::expandable_segments_handle_type() !=
            Expandable_Segments_Handle_Type::FABRIC_HANDLE) {
+#ifdef USE_ROCM


Why hipify is to taking care of this type of changes?

🤷🏻‍♂️

c10/cuda/CUDACachingAllocator.cpp

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

jeanschmidt · 2026-03-02T14:53:46Z

@pytorchbot merge -i

pytorchmergebot · 2026-03-02T14:55:58Z

Merge started

Your change will be merged while ignoring the following 1 checks: rocm-mi300 / linux-noble-rocm-py3.12-mi300 / test (default, 2, 6, linux.rocm.gpu.gfx942.1)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Fixes the ROCm (HIP) build of CUDACachingAllocator.cpp which was broken by two categories of errors: 1. Embedded preprocessor directives inside macro arguments — The #if ROCM_VERSION >= 70100 / #endif block was nested inside a C10_CUDA_CHECK(hipMemImportFromShareableHandle(...)) macro call. Clang rejects this with -Werror,-Wembedded-directive. Fixed by hoisting the conditional into a separate variable (myfd_ptr) before the macro invocation. 2. Void pointer arithmetic — The HIP API uses void* for ptr_, unlike the CUDA driver API which uses CUdeviceptr (an integer type). Arithmetic on void* is undefined behavior and rejected by Clang. Fixed by casting ptr_ to char* via reinterpret_cast in the three call sites (hipMemSetAccess, hipMemMap, hipMemUnmap). Also fixes requestedHandleTypes → requestedHandleType for the HIP CUmemAllocationProp struct, which uses a different field name than the CUDA equivalent. This is a FF for issues introduced in pytorch#173330 Pull Request resolved: pytorch#176136 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Use `ptr()` helper (returns `char*`) instead of raw `ptr_` for pointer arithmetic in `cuMemSetAccess`, `cuMemMap`, and `cuMemUnmap` calls. After hipify converts `CUdeviceptr` (unsigned long long) to `hipDeviceptr_t` (void*), arithmetic on `ptr_` becomes void pointer arithmetic which Clang rejects with `-Werror`. Using `ptr()` + `reinterpret_cast` avoids this while keeping CUDA driver API compatibility. Also improves the USE_ROCM blocks to use `ptr()` instead of `reinterpret_cast<char*>(ptr_)` for consistency. This is a forward fix for void* arithmetic errors introduced by #173330 that #176136 attempted to fix via `#ifdef USE_ROCM` blocks. Co-Authored-By: Claude <noreply@anthropic.com>

wdvr · 2026-03-09T04:58:12Z

@pytorchmergebot revert -m "reverting to put back on top of the trunk and develop a full forward fix" -c ghfirst

pytorchmergebot · 2026-03-09T05:00:04Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 4bdfaaa. Reverted #176136 on behalf of https://github.com/wdvr due to reverting to put back on top of the trunk and develop a full forward fix ([comment](#176136 (comment)))

pytorchmergebot · 2026-03-09T05:00:10Z

@jeanschmidt your PR has been successfully reverted.

This reverts commit d49571b. Reverted #173330 on behalf of https://github.com/wdvr due to sorry having issues with fixing build issues internally -- will revert and reland on top of master, and have a combined improved build fix instead of #176136 ([comment](#173330 (comment)))

Pull Request resolved: #173330 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>

jeffdaily · 2026-03-11T22:19:49Z

This will be incorporated into the original expandable segments PR because it was reverted. Closing.

Fixes the ROCm (HIP) build of CUDACachingAllocator.cpp which was broken by two categories of errors: 1. Embedded preprocessor directives inside macro arguments — The #if ROCM_VERSION >= 70100 / #endif block was nested inside a C10_CUDA_CHECK(hipMemImportFromShareableHandle(...)) macro call. Clang rejects this with -Werror,-Wembedded-directive. Fixed by hoisting the conditional into a separate variable (myfd_ptr) before the macro invocation. 2. Void pointer arithmetic — The HIP API uses void* for ptr_, unlike the CUDA driver API which uses CUdeviceptr (an integer type). Arithmetic on void* is undefined behavior and rejected by Clang. Fixed by casting ptr_ to char* via reinterpret_cast in the three call sites (hipMemSetAccess, hipMemMap, hipMemUnmap). Also fixes requestedHandleTypes → requestedHandleType for the HIP CUmemAllocationProp struct, which uses a different field name than the CUDA equivalent. This is a FF for issues introduced in pytorch#173330 Pull Request resolved: pytorch#176136 Approved by: https://github.com/malfet Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

Use `ptr()` helper (returns `char*`) instead of raw `ptr_` for pointer arithmetic in `cuMemSetAccess`, `cuMemMap`, and `cuMemUnmap` calls. After hipify converts `CUdeviceptr` (unsigned long long) to `hipDeviceptr_t` (void*), arithmetic on `ptr_` becomes void pointer arithmetic which Clang rejects with `-Werror`. Using `ptr()` + `reinterpret_cast` avoids this while keeping CUDA driver API compatibility. Also improves the USE_ROCM blocks to use `ptr()` instead of `reinterpret_cast<char*>(ptr_)` for consistency. This is a forward fix for void* arithmetic errors introduced by pytorch#173330 that pytorch#176136 attempted to fix via `#ifdef USE_ROCM` blocks. Co-Authored-By: Claude <noreply@anthropic.com>

This reverts commit 4bdfaaa. Reverted pytorch#176136 on behalf of https://github.com/wdvr due to reverting to put back on top of the trunk and develop a full forward fix ([comment](pytorch#176136 (comment)))

This reverts commit d49571b. Reverted pytorch#173330 on behalf of https://github.com/wdvr due to sorry having issues with fixing build issues internally -- will revert and reland on top of master, and have a combined improved build fix instead of pytorch#176136 ([comment](pytorch#173330 (comment)))

jeanschmidt requested review from Aidyn-A, eqy and syed-ahmed as code owners March 2, 2026 03:53

pytorch-bot bot added ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 module: rocm AMD GPU support for Pytorch labels Mar 2, 2026

jeanschmidt self-assigned this Mar 2, 2026

jeanschmidt added the topic: not user facing topic category label Mar 2, 2026

Fixing lint

f1c4e0d

malfet approved these changes Mar 2, 2026

View reviewed changes

Update c10/cuda/CUDACachingAllocator.cpp

1cae14d

Co-authored-by: Nikita Shulga <2453524+malfet@users.noreply.github.com>

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 2, 2026

pytorchmergebot added the merging label Mar 2, 2026

pytorchmergebot added the Merged label Mar 2, 2026

pytorchmergebot closed this in 4bdfaaa Mar 2, 2026

pytorchmergebot removed the merging label Mar 2, 2026

wdvr mentioned this pull request Mar 6, 2026

[ROCm] Fix void pointer arithmetic in CUDACachingAllocator for HIP build #176698

Closed

pytorchmergebot added the Reverted label Mar 9, 2026

pytorchmergebot added the ci-no-td Do not run TD on this PR label Mar 9, 2026

pytorchmergebot reopened this Mar 9, 2026

wdvr mentioned this pull request Mar 9, 2026

[ROCm] Enable expandable segments #173330

Closed

pytorchmergebot referenced this pull request Mar 9, 2026

[ROCm] Enable expandable segments (#173330)

d49571b

Pull Request resolved: #173330 Approved by: https://github.com/jeffdaily Co-authored-by: Jeff Daily <jeff.daily@amd.com>

wdvr mentioned this pull request Mar 9, 2026

[ROCm] Fix build errors in CUDACachingAllocator for HIP #176860

Closed

jeffdaily closed this Mar 11, 2026

github-actions bot deleted the jeanschmidt-fix-rocm-build branch April 11, 2026 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix ROCm build in CUDACachingAllocator#176136

Fix ROCm build in CUDACachingAllocator#176136
jeanschmidt wants to merge 3 commits intomainfrom
jeanschmidt-fix-rocm-build

jeanschmidt commented Mar 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 2, 2026

Uh oh!

malfet Mar 2, 2026

Uh oh!

jeanschmidt Mar 2, 2026

Uh oh!

Uh oh!

jeanschmidt commented Mar 2, 2026

Uh oh!

pytorchmergebot commented Mar 2, 2026

Uh oh!

wdvr commented Mar 9, 2026

Uh oh!

pytorchmergebot commented Mar 9, 2026

Uh oh!

pytorchmergebot commented Mar 9, 2026

Uh oh!

jeffdaily commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jeanschmidt commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176136

❌ 5 New Failures, 7 Unrelated Failures

Uh oh!

pytorch-bot bot commented Mar 2, 2026

This PR needs a release notes: label

Uh oh!

malfet Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

jeanschmidt Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jeanschmidt commented Mar 2, 2026

Uh oh!

pytorchmergebot commented Mar 2, 2026

Merge started

Uh oh!

wdvr commented Mar 9, 2026

Uh oh!

pytorchmergebot commented Mar 9, 2026

Uh oh!

pytorchmergebot commented Mar 9, 2026

Uh oh!

jeffdaily commented Mar 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jeanschmidt commented Mar 2, 2026 •

edited

Loading

pytorch-bot bot commented Mar 2, 2026 •

edited

Loading

This PR needs a `release notes:` label