[release/2.11][ROCm] Reland: Enable expandable segments (#173330) (#177974)#3106
Merged
pragupta merged 1 commit intoROCm:release/2.11from Mar 27, 2026
Merged
Conversation
…77974) Summary: Original pull request: pytorch#173330 Fixes pytorch#168737. Fixes pytorch#168736. The original diff enabled expandable segments for ROCm by adding `#ifdef USE_ROCM` guards throughout CUDACachingAllocator.cpp to use HIP APIs (hipMemAddressReserve, hipMemCreate, hipMemMap, etc.) instead of CUDA driver APIs when building for ROCm. Root cause: In HIP/ROCm 6.2.1, the field name for memory allocation properties is `requestedHandleType` (singular), not `requestedHandleTypes` (plural) as in CUDA. Additionally, `hipMemHandleTypeFabric` does not exist in HIP, so the `CU_MEM_HANDLE_TYPE_FABRIC` assignment must be skipped on ROCm. Fix applied on top of the original diff (from D96652342): - Use `prop.requestedHandleType = hipMemHandleTypePosixFileDescriptor` under `#ifdef USE_ROCM` (singular field name, HIP constant) - Use `prop.requestedHandleTypes = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR` for CUDA (plural field name, CUDA constant) - Skip the `CU_MEM_HANDLE_TYPE_FABRIC` assignment entirely on ROCm under `#ifndef USE_ROCM`, as `hipMemHandleTypeFabric` does not exist in HIP Co-authored-by: Prachi Gupta prachi.gupta@amd.com Co-authored-by: Jeff Daily jeff.daily@amd.com Co-authored-by: moonshadow-25 moonshadow-25@users.noreply.github.com Co-authored-by: Vighanesh Sharma vighaneshsharma@gmail.com Test Plan: ``` fbpkg build //aps_models/ads/ecosystem/eval/cogwheel_tests/amd:cogwheel_aps_ads_icvr_kd_eval_amd_test_harness --build-remote ``` https://www.internalfb.com/sandcastle/workflow/1049338713192153464 Differential Revision: D97211385 Pull Request resolved: pytorch#177974 Approved by: https://github.com/jeffdaily, https://github.com/echen4096 (cherry picked from commit 5792701)
f9cef24 to
34e107f
Compare
|
Jenkins build for 34e107f4157ae600bcb7e86132214e46aec35ff8 commit finished as FAILURE |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Original pull request: pytorch#173330 Fixes pytorch#168737. Fixes pytorch#168736.
The original diff enabled expandable segments for ROCm by adding
#ifdef USE_ROCMguards throughout CUDACachingAllocator.cpp to use HIP APIs (hipMemAddressReserve, hipMemCreate, hipMemMap, etc.) instead of CUDA driver APIs when building for ROCm.Root cause: In HIP/ROCm 6.2.1, the field name for memory allocation properties is
requestedHandleType(singular), notrequestedHandleTypes(plural) as in CUDA. Additionally,hipMemHandleTypeFabricdoes not exist in HIP, so theCU_MEM_HANDLE_TYPE_FABRICassignment must be skipped on ROCm.Fix applied on top of the original diff (from D96652342):
prop.requestedHandleType = hipMemHandleTypePosixFileDescriptorunder#ifdef USE_ROCM(singular field name, HIP constant)prop.requestedHandleTypes = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTORfor CUDA (plural field name, CUDA constant)CU_MEM_HANDLE_TYPE_FABRICassignment entirely on ROCm under#ifndef USE_ROCM, ashipMemHandleTypeFabricdoes not exist in HIPCo-authored-by: Prachi Gupta prachi.gupta@amd.com
Co-authored-by: Jeff Daily jeff.daily@amd.com
Co-authored-by: moonshadow-25 moonshadow-25@users.noreply.github.com
Co-authored-by: Vighanesh Sharma vighaneshsharma@gmail.com
Test Plan:
https://www.internalfb.com/sandcastle/workflow/1049338713192153464
Differential Revision: D97211385
Pull Request resolved: pytorch#177974
Approved by: https://github.com/jeffdaily, https://github.com/echen4096
(cherry picked from commit 5792701)
Motivation
Technical Details
Test Plan
Test Result
Submission Checklist