Skip to content

[release/2.11][ROCm] Reland: Enable expandable segments (#173330) (#177974)#3106

Merged
pragupta merged 1 commit intoROCm:release/2.11from
pragupta:pg-expandable-segments-2.11
Mar 27, 2026
Merged

[release/2.11][ROCm] Reland: Enable expandable segments (#173330) (#177974)#3106
pragupta merged 1 commit intoROCm:release/2.11from
pragupta:pg-expandable-segments-2.11

Conversation

@pragupta
Copy link
Copy Markdown
Collaborator

Summary:
Original pull request: pytorch#173330 Fixes pytorch#168737. Fixes pytorch#168736.

The original diff enabled expandable segments for ROCm by adding #ifdef USE_ROCM guards throughout CUDACachingAllocator.cpp to use HIP APIs (hipMemAddressReserve, hipMemCreate, hipMemMap, etc.) instead of CUDA driver APIs when building for ROCm.

Root cause: In HIP/ROCm 6.2.1, the field name for memory allocation properties is requestedHandleType (singular), not requestedHandleTypes (plural) as in CUDA. Additionally, hipMemHandleTypeFabric does not exist in HIP, so the CU_MEM_HANDLE_TYPE_FABRIC assignment must be skipped on ROCm.

Fix applied on top of the original diff (from D96652342):

  • Use prop.requestedHandleType = hipMemHandleTypePosixFileDescriptor under #ifdef USE_ROCM (singular field name, HIP constant)
  • Use prop.requestedHandleTypes = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR for CUDA (plural field name, CUDA constant)
  • Skip the CU_MEM_HANDLE_TYPE_FABRIC assignment entirely on ROCm under #ifndef USE_ROCM, as hipMemHandleTypeFabric does not exist in HIP

Co-authored-by: Prachi Gupta prachi.gupta@amd.com
Co-authored-by: Jeff Daily jeff.daily@amd.com
Co-authored-by: moonshadow-25 moonshadow-25@users.noreply.github.com
Co-authored-by: Vighanesh Sharma vighaneshsharma@gmail.com

Test Plan:

fbpkg build //aps_models/ads/ecosystem/eval/cogwheel_tests/amd:cogwheel_aps_ads_icvr_kd_eval_amd_test_harness --build-remote

https://www.internalfb.com/sandcastle/workflow/1049338713192153464

Differential Revision: D97211385

Pull Request resolved: pytorch#177974
Approved by: https://github.com/jeffdaily, https://github.com/echen4096

(cherry picked from commit 5792701)

Motivation

Technical Details

Test Plan

Test Result

Submission Checklist

…77974)

Summary:
Original pull request: pytorch#173330
Fixes pytorch#168737.
Fixes pytorch#168736.

The original diff enabled expandable segments for ROCm by adding `#ifdef USE_ROCM`
guards throughout CUDACachingAllocator.cpp to use HIP APIs (hipMemAddressReserve,
hipMemCreate, hipMemMap, etc.) instead of CUDA driver APIs when building for ROCm.

Root cause: In HIP/ROCm 6.2.1, the field name for memory allocation properties is
`requestedHandleType` (singular), not `requestedHandleTypes` (plural) as in CUDA.
Additionally, `hipMemHandleTypeFabric` does not exist in HIP, so the
`CU_MEM_HANDLE_TYPE_FABRIC` assignment must be skipped on ROCm.

Fix applied on top of the original diff (from D96652342):
- Use `prop.requestedHandleType = hipMemHandleTypePosixFileDescriptor` under
  `#ifdef USE_ROCM` (singular field name, HIP constant)
- Use `prop.requestedHandleTypes = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR` for
  CUDA (plural field name, CUDA constant)
- Skip the `CU_MEM_HANDLE_TYPE_FABRIC` assignment entirely on ROCm under
  `#ifndef USE_ROCM`, as `hipMemHandleTypeFabric` does not exist in HIP

Co-authored-by: Prachi Gupta prachi.gupta@amd.com
Co-authored-by: Jeff Daily jeff.daily@amd.com
Co-authored-by: moonshadow-25 moonshadow-25@users.noreply.github.com
Co-authored-by: Vighanesh Sharma vighaneshsharma@gmail.com

Test Plan:
```
fbpkg build //aps_models/ads/ecosystem/eval/cogwheel_tests/amd:cogwheel_aps_ads_icvr_kd_eval_amd_test_harness --build-remote
```

https://www.internalfb.com/sandcastle/workflow/1049338713192153464

Differential Revision: D97211385

Pull Request resolved: pytorch#177974
Approved by: https://github.com/jeffdaily, https://github.com/echen4096

(cherry picked from commit 5792701)
@pragupta pragupta force-pushed the pg-expandable-segments-2.11 branch from f9cef24 to 34e107f Compare March 26, 2026 22:07
@pragupta pragupta requested a review from jeffdaily March 26, 2026 22:07
@rocm-repo-management-api
Copy link
Copy Markdown

rocm-repo-management-api bot commented Mar 26, 2026

Jenkins build for 34e107f4157ae600bcb7e86132214e46aec35ff8 commit finished as FAILURE
Links: Pipeline Overview / Build artifacts / Test Results

@pragupta pragupta merged commit 2fea146 into ROCm:release/2.11 Mar 27, 2026
0 of 2 checks passed
@pragupta pragupta changed the title [ROCm] Reland: Enable expandable segments (#173330) (#177974) [release/2.11][ROCm] Reland: Enable expandable segments (#173330) (#177974) Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants