[ROCm] Reland: Enable expandable segments (#173330) by haoyuz · Pull Request #177974 · pytorch/pytorch

haoyuz · 2026-03-20T16:27:21Z

Summary:
Original pull request: #173330
Fixes #168737.
Fixes #168736.

The original diff enabled expandable segments for ROCm by adding #ifdef USE_ROCM
guards throughout CUDACachingAllocator.cpp to use HIP APIs (hipMemAddressReserve,
hipMemCreate, hipMemMap, etc.) instead of CUDA driver APIs when building for ROCm.

Root cause: In HIP/ROCm 6.2.1, the field name for memory allocation properties is
requestedHandleType (singular), not requestedHandleTypes (plural) as in CUDA.
Additionally, hipMemHandleTypeFabric does not exist in HIP, so the
CU_MEM_HANDLE_TYPE_FABRIC assignment must be skipped on ROCm.

Fix applied on top of the original diff (from D96652342):

Use prop.requestedHandleType = hipMemHandleTypePosixFileDescriptor under
#ifdef USE_ROCM (singular field name, HIP constant)
Use prop.requestedHandleTypes = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR for
CUDA (plural field name, CUDA constant)
Skip the CU_MEM_HANDLE_TYPE_FABRIC assignment entirely on ROCm under
#ifndef USE_ROCM, as hipMemHandleTypeFabric does not exist in HIP

Co-authored-by: Prachi Gupta prachi.gupta@amd.com
Co-authored-by: Jeff Daily jeff.daily@amd.com
Co-authored-by: moonshadow-25 moonshadow-25@users.noreply.github.com
Co-authored-by: Vighanesh Sharma vighaneshsharma@gmail.com

Test Plan:

fbpkg build //aps_models/ads/ecosystem/eval/cogwheel_tests/amd:cogwheel_aps_ads_icvr_kd_eval_amd_test_harness --build-remote

https://www.internalfb.com/sandcastle/workflow/1049338713192153464

Differential Revision: D97211385

cc @jeffdaily @sunway513 @jithunnair-amd @pruthvistony @ROCmSupport @jataylo @hongxiayang @naromero77amd @pragupta @jerrymannil @xinyazhang @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @kadeng @chauhang @amjames @Lucaskabela

Summary: Original pull request: pytorch#173330 Fixes pytorch#168737. Fixes pytorch#168736. The original diff enabled expandable segments for ROCm by adding `#ifdef USE_ROCM` guards throughout CUDACachingAllocator.cpp to use HIP APIs (hipMemAddressReserve, hipMemCreate, hipMemMap, etc.) instead of CUDA driver APIs when building for ROCm. Root cause: In HIP/ROCm 6.2.1, the field name for memory allocation properties is `requestedHandleType` (singular), not `requestedHandleTypes` (plural) as in CUDA. Additionally, `hipMemHandleTypeFabric` does not exist in HIP, so the `CU_MEM_HANDLE_TYPE_FABRIC` assignment must be skipped on ROCm. Fix applied on top of the original diff (from D96652342): - Use `prop.requestedHandleType = hipMemHandleTypePosixFileDescriptor` under `#ifdef USE_ROCM` (singular field name, HIP constant) - Use `prop.requestedHandleTypes = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR` for CUDA (plural field name, CUDA constant) - Skip the `CU_MEM_HANDLE_TYPE_FABRIC` assignment entirely on ROCm under `#ifndef USE_ROCM`, as `hipMemHandleTypeFabric` does not exist in HIP Co-authored-by: Prachi Gupta prachi.gupta@amd.com Co-authored-by: Jeff Daily jeff.daily@amd.com Co-authored-by: moonshadow-25 moonshadow-25@users.noreply.github.com Co-authored-by: Vighanesh Sharma vighaneshsharma@gmail.com Test Plan: ``` fbpkg build //aps_models/ads/ecosystem/eval/cogwheel_tests/amd:cogwheel_aps_ads_icvr_kd_eval_amd_test_harness --build-remote ``` https://www.internalfb.com/sandcastle/workflow/1049338713192153464 Differential Revision: D97211385

pytorch-bot · 2026-03-20T16:27:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177974

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit a250f65 with merge base 47ae16a ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / inductor-test / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
vision_maskrcnn

UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:

inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 1, 2, linux.2xlarge.amx, unstable) (gh) (#174929)
detectron2_maskrcnn_r_50_fpn
Lint OSDC (unstable) / lintrunner-noclang-all / lint (gh)
Error computing the main repository mapping: Encountered error while reading extension file 'requirements.bzl': no such package '@pip_deps//': no such package '@python3_10_x86_64-unknown-linux-gnu//': The current user is root, please run as non-root when using the hermetic Python interpreter. See https://github.com/bazelbuild/rules_python/pull/713.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-03-20T16:27:29Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-codesync · 2026-03-20T16:27:31Z

@haoyuz has exported this pull request. If you are a Meta employee, you can view the originating Diff in D97211385.

facebook-github-tools · 2026-03-21T08:24:44Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2026-03-21T08:27:10Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

pytorch-bot · 2026-03-21T08:27:15Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

facebook-github-tools · 2026-03-21T14:33:23Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2026-03-21T14:35:23Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

pytorch-bot · 2026-03-21T14:35:27Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

facebook-github-tools · 2026-03-21T18:31:46Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorch-bot · 2026-03-22T19:58:47Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

facebook-github-tools · 2026-03-22T22:48:45Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2026-03-22T22:50:43Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

pytorch-bot · 2026-03-22T22:50:50Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

facebook-github-tools · 2026-03-23T09:12:02Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2026-03-23T09:14:08Z

Merge failed

Reason: This PR needs a release notes: label
If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Details for Dev Infra team

Raised by workflow job

pytorch-bot · 2026-03-23T09:14:14Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

jeffdaily · 2026-03-23T16:02:14Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2026-03-23T16:04:21Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…77974) Summary: Original pull request: pytorch#173330 Fixes pytorch#168737. Fixes pytorch#168736. The original diff enabled expandable segments for ROCm by adding `#ifdef USE_ROCM` guards throughout CUDACachingAllocator.cpp to use HIP APIs (hipMemAddressReserve, hipMemCreate, hipMemMap, etc.) instead of CUDA driver APIs when building for ROCm. Root cause: In HIP/ROCm 6.2.1, the field name for memory allocation properties is `requestedHandleType` (singular), not `requestedHandleTypes` (plural) as in CUDA. Additionally, `hipMemHandleTypeFabric` does not exist in HIP, so the `CU_MEM_HANDLE_TYPE_FABRIC` assignment must be skipped on ROCm. Fix applied on top of the original diff (from D96652342): - Use `prop.requestedHandleType = hipMemHandleTypePosixFileDescriptor` under `#ifdef USE_ROCM` (singular field name, HIP constant) - Use `prop.requestedHandleTypes = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR` for CUDA (plural field name, CUDA constant) - Skip the `CU_MEM_HANDLE_TYPE_FABRIC` assignment entirely on ROCm under `#ifndef USE_ROCM`, as `hipMemHandleTypeFabric` does not exist in HIP Co-authored-by: Prachi Gupta prachi.gupta@amd.com Co-authored-by: Jeff Daily jeff.daily@amd.com Co-authored-by: moonshadow-25 moonshadow-25@users.noreply.github.com Co-authored-by: Vighanesh Sharma vighaneshsharma@gmail.com Test Plan: ``` fbpkg build //aps_models/ads/ecosystem/eval/cogwheel_tests/amd:cogwheel_aps_ads_icvr_kd_eval_amd_test_harness --build-remote ``` https://www.internalfb.com/sandcastle/workflow/1049338713192153464 Differential Revision: D97211385 Pull Request resolved: pytorch#177974 Approved by: https://github.com/jeffdaily, https://github.com/echen4096 (cherry picked from commit 5792701)

…77974) (#3106) Summary: Original pull request: pytorch#173330 Fixes pytorch#168737. Fixes pytorch#168736. The original diff enabled expandable segments for ROCm by adding `#ifdef USE_ROCM` guards throughout CUDACachingAllocator.cpp to use HIP APIs (hipMemAddressReserve, hipMemCreate, hipMemMap, etc.) instead of CUDA driver APIs when building for ROCm. Root cause: In HIP/ROCm 6.2.1, the field name for memory allocation properties is `requestedHandleType` (singular), not `requestedHandleTypes` (plural) as in CUDA. Additionally, `hipMemHandleTypeFabric` does not exist in HIP, so the `CU_MEM_HANDLE_TYPE_FABRIC` assignment must be skipped on ROCm. Fix applied on top of the original diff (from D96652342): - Use `prop.requestedHandleType = hipMemHandleTypePosixFileDescriptor` under `#ifdef USE_ROCM` (singular field name, HIP constant) - Use `prop.requestedHandleTypes = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR` for CUDA (plural field name, CUDA constant) - Skip the `CU_MEM_HANDLE_TYPE_FABRIC` assignment entirely on ROCm under `#ifndef USE_ROCM`, as `hipMemHandleTypeFabric` does not exist in HIP Co-authored-by: Prachi Gupta prachi.gupta@amd.com Co-authored-by: Jeff Daily jeff.daily@amd.com Co-authored-by: moonshadow-25 moonshadow-25@users.noreply.github.com Co-authored-by: Vighanesh Sharma vighaneshsharma@gmail.com Test Plan: ``` fbpkg build //aps_models/ads/ecosystem/eval/cogwheel_tests/amd:cogwheel_aps_ads_icvr_kd_eval_amd_test_harness --build-remote ``` https://www.internalfb.com/sandcastle/workflow/1049338713192153464 Differential Revision: D97211385 Pull Request resolved: pytorch#177974 Approved by: https://github.com/jeffdaily, https://github.com/echen4096 (cherry picked from commit 5792701) ## Motivation  ## Technical Details  ## Test Plan  ## Test Result  ## Submission Checklist - [ ] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. Co-authored-by: Haoyu Zhang <haoyuz@meta.com>

…77974) Summary: Original pull request: pytorch#173330 Fixes pytorch#168737. Fixes pytorch#168736. The original diff enabled expandable segments for ROCm by adding `#ifdef USE_ROCM` guards throughout CUDACachingAllocator.cpp to use HIP APIs (hipMemAddressReserve, hipMemCreate, hipMemMap, etc.) instead of CUDA driver APIs when building for ROCm. Root cause: In HIP/ROCm 6.2.1, the field name for memory allocation properties is `requestedHandleType` (singular), not `requestedHandleTypes` (plural) as in CUDA. Additionally, `hipMemHandleTypeFabric` does not exist in HIP, so the `CU_MEM_HANDLE_TYPE_FABRIC` assignment must be skipped on ROCm. Fix applied on top of the original diff (from D96652342): - Use `prop.requestedHandleType = hipMemHandleTypePosixFileDescriptor` under `#ifdef USE_ROCM` (singular field name, HIP constant) - Use `prop.requestedHandleTypes = CU_MEM_HANDLE_TYPE_POSIX_FILE_DESCRIPTOR` for CUDA (plural field name, CUDA constant) - Skip the `CU_MEM_HANDLE_TYPE_FABRIC` assignment entirely on ROCm under `#ifndef USE_ROCM`, as `hipMemHandleTypeFabric` does not exist in HIP Co-authored-by: Prachi Gupta prachi.gupta@amd.com Co-authored-by: Jeff Daily jeff.daily@amd.com Co-authored-by: moonshadow-25 moonshadow-25@users.noreply.github.com Co-authored-by: Vighanesh Sharma vighaneshsharma@gmail.com Test Plan: ``` fbpkg build //aps_models/ads/ecosystem/eval/cogwheel_tests/amd:cogwheel_aps_ads_icvr_kd_eval_amd_test_harness --build-remote ``` https://www.internalfb.com/sandcastle/workflow/1049338713192153464 Differential Revision: D97211385 Pull Request resolved: pytorch#177974 Approved by: https://github.com/jeffdaily, https://github.com/echen4096

haoyuz requested review from Aidyn-A, eqy and syed-ahmed as code owners March 20, 2026 16:27

pytorch-bot bot added ci-no-td Do not run TD on this PR ciflow/inductor ciflow/rocm-mi300 Trigger "default" config CI on ROCm MI300 ciflow/torchtitan Run TorchTitan integration tests module: dynamo module: rocm AMD GPU support for Pytorch labels Mar 20, 2026

meta-codesync bot added fb-exported meta-exported labels Mar 20, 2026

haoyuz requested a review from jeffdaily March 20, 2026 17:16

jeffdaily approved these changes Mar 20, 2026

View reviewed changes

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 20, 2026

echen4096 approved these changes Mar 20, 2026

View reviewed changes

pytorchmergebot added the merging label Mar 21, 2026

pytorchmergebot removed the merging label Mar 21, 2026

pytorchmergebot added the merging label Mar 21, 2026

pytorchmergebot removed the merging label Mar 21, 2026

pytorchmergebot removed the merging label Mar 22, 2026

pytorchmergebot added the merging label Mar 22, 2026

pytorchmergebot removed the merging label Mar 22, 2026

pytorchmergebot added the merging label Mar 23, 2026

pytorchmergebot removed the merging label Mar 23, 2026

jeffdaily added the release notes: rocm mandatorylabel label Mar 23, 2026

pytorchmergebot added the merging label Mar 23, 2026

pytorchmergebot added the Merged label Mar 23, 2026

pytorchmergebot closed this in 5792701 Mar 23, 2026

pytorchmergebot removed the merging label Mar 23, 2026

pragupta mentioned this pull request Mar 24, 2026

[ROCm] Enable expandable segments #173330

Closed

jeffdaily mentioned this pull request Mar 26, 2026

[On-call] Fix MI300X conveyor build failure (#177503) #177503

Open

pragupta mentioned this pull request Mar 26, 2026

[release/2.11][ROCm] Reland: Enable expandable segments (#173330) (#177974) ROCm/pytorch#3106

Merged

1 task

Conversation

haoyuz commented Mar 20, 2026 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177974

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

pytorch-bot bot commented Mar 20, 2026

This PR needs a release notes: label

Uh oh!

meta-codesync bot commented Mar 20, 2026

Uh oh!

facebook-github-tools bot commented Mar 21, 2026

Uh oh!

pytorchmergebot commented Mar 21, 2026

Merge failed

Uh oh!

pytorch-bot bot commented Mar 21, 2026

This PR needs a release notes: label

Uh oh!

facebook-github-tools bot commented Mar 21, 2026

Uh oh!

pytorchmergebot commented Mar 21, 2026

Merge failed

Uh oh!

pytorch-bot bot commented Mar 21, 2026

This PR needs a release notes: label

Uh oh!

facebook-github-tools bot commented Mar 21, 2026

Uh oh!

pytorch-bot bot commented Mar 22, 2026

This PR needs a release notes: label

Uh oh!

facebook-github-tools bot commented Mar 22, 2026

Uh oh!

pytorchmergebot commented Mar 22, 2026

Merge failed

Uh oh!

pytorch-bot bot commented Mar 22, 2026

This PR needs a release notes: label

Uh oh!

facebook-github-tools bot commented Mar 23, 2026

Uh oh!

pytorchmergebot commented Mar 23, 2026

Merge failed

Uh oh!

pytorch-bot bot commented Mar 23, 2026

This PR needs a release notes: label

Uh oh!

jeffdaily commented Mar 23, 2026

Uh oh!

pytorchmergebot commented Mar 23, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

haoyuz commented Mar 20, 2026 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 20, 2026 •

edited

Loading

This PR needs a `release notes:` label

This PR needs a `release notes:` label

This PR needs a `release notes:` label

This PR needs a `release notes:` label

This PR needs a `release notes:` label

This PR needs a `release notes:` label