Add DeviceAllocator as the base device allocator by guangyey · Pull Request #138222 · pytorch/pytorch

guangyey · 2024-10-17T15:06:29Z

Stack from ghstack (oldest at bottom):

Motivation

In line with [RFC] A device-agnostic Python device memory related API design for stream-based accelerators, some memory-related APIs are widely used in popular repositories, such as HuggingFace so many if-else conditional code. We would like to introduce a generic API set under torch.accelerator namespace to generalize these user cases.

Device-specific memory APIs torch.xxx.foo	Device-agnostic memory APIs torch.accelerator.foo
torch.xxx.empty_cache	torch.accelerator.empty_cache
torch.xxx.reset_peak_memory_stats	torch.accelerator.reset_peak_memory_stats
torch.xxx.reset_accumulated_memory_stats	torch.accelerator.reset_accumulated_memory_stats
torch.xxx.memory_stats	torch.accelerator.memory_stats
torch.xxx.memory_allocated	torch.accelerator.memory_allocated
torch.xxx.max_memory_allocated	torch.accelerator.max_memory_allocated
torch.xxx.memory_reserved	torch.accelerator.memory_reserved
torch.xxx.max_memory_reserved	torch.accelerator.max_memory_reserved

Solution

This design follows a similar pattern to HostAllocator. We're introducing a base class DeviceAllocator, from which CUDAAllocator and XPUAllocator will inherit. This allows us to provide a unified call path like: torch.accelerator.empty_cache() -> GetDeviceAllocator(allocator)->empty_cache().

cc @albanD @EikanWang

pytorch-bot · 2024-10-17T15:06:34Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138222

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

⏳ 5 Pending, 2 Unrelated Failures

As of commit 5521326 with merge base 178515d ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

s390x-periodic / linux-manylinux-2_28-py3-cpu-s390x / test (default, 6, 10, linux.s390x) (gh) (similar failure)
test_proxy_tensor.py::TestSymbolicTracing::test_constant_specialization
xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 6, 6, linux.idc.xpu) (gh) (similar failure)
inductor/test_aot_inductor.py::AOTInductorTestABICompatibleGpu::test_copy_non_blocking_is_pinned_xpu

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: 2634951 Pull Request resolved: #138222

[ghstack-poisoned]

github-actions · 2024-12-16T18:40:07Z

Looks like this PR hasn't been updated in a while so we're going to go ahead and mark this as Stale.
Feel free to remove the Stale label if you feel this was a mistake.
If you are unable to remove the Stale label please contact a maintainer in order to do so.
If you want the bot to never mark this PR stale again, add the no-stale label.
Stale pull requests will automatically be closed after 30 days of inactivity.

ghstack-source-id: f0a1690 Pull Request resolved: #138222

[ghstack-poisoned]

ghstack-source-id: 0841241 Pull Request resolved: #138222

ghstack-source-id: ba52b22 Pull Request resolved: #138222

[ghstack-poisoned]

ghstack-source-id: 865b7cf Pull Request resolved: #138222

ghstack-source-id: 1c6bfe4 Pull Request resolved: #138222

[ghstack-poisoned]

ghstack-source-id: 4a2803a Pull Request resolved: #138222

Camyll · 2025-07-25T20:37:55Z

torch.version.hip

it's hundreds of tests, so it's not a trivial change. But the issue persists in tests that use this method of detecting ROCm which is why I'm still confused how the change impacted it

guangyey · 2025-07-26T09:57:57Z

it's hundreds of tests, so it's not a trivial change. But the issue persists in tests that use this method of detecting ROCm which is why I'm still confused how the change impacted it

Do you mean the internal test is already using torch.version.hip to detect CUDA or ROCm, but it will return an unexpected value when integrating this PR?
The logic of generating torch.version.hip is listed below and should remain unaffected by this PR.

pytorch/torch/CMakeLists.txt

Lines 485 to 495 in 3db8623

    
           add_custom_target( 
        
             gen_torch_version ALL 
        
             "${Python_EXECUTABLE}" "${TOOLS_PATH}/generate_torch_version.py" 
        
               --is-debug=${TORCH_VERSION_DEBUG} 
        
               --cuda-version=${CUDA_VERSION} 
        
               --hip-version=${HIP_VERSION} 
        
               --xpu-version=${SYCL_COMPILER_VERSION} 
        
             BYPRODUCTS ${TORCH_SRC_DIR}/version.py 
        
             COMMENT "Regenerating version file..." 
        
             WORKING_DIRECTORY ${TORCH_ROOT} 
        
           )

So I can't understand why this is the root case.

Camyll · 2025-07-28T20:30:16Z

@guangyey could the change have anything to do with this error? I wonder if the issue isn't checking for ROCm but maybe specifically checking for cuda

RuntimeError: r.cuDeviceGetAttribute_ INTERNAL ASSERT FAILED at "fbcode/caffe2/c10/cuda/driver_api.cpp":23, please report a bug to PyTorch. Can't find cuDeviceGetAttribute

guangyey · 2025-07-29T03:19:59Z

I think the error is related to #158295, which introduces a CUDA version check. CUDA 12.5+ supports cudaGetDriverEntryPointByVersion and other situations, CUDA uses cudaGetDriverEntryPoint to check if cuDeviceGetAttribute exists.
And ROCm should not support those driver API. Refer to

pytorch/c10/cuda/driver_api.cpp

Lines 1 to 2 in 67e68e0

    
           #if !defined(USE_ROCM) && defined(PYTORCH_C10_DRIVER_API_SUPPORTED) 
        
           #include <c10/cuda/CUDAException.h>

Camyll · 2025-07-31T22:30:08Z

@guangyey sorry for the delayed response. It looks like the segfault stems from this file: aten/src/ATen/hip/impl/HIPAllocatorMasqueradingAsCUDA.h, specifically line 19

https://github.com/pytorch/pytorch/blob/b95cf5c91d8b4a9a2b905edd035ef331d7e0609a/aten/src/ATen/hip/impl/HIPAllocatorMasqueradingAsCUDA.h#L19C1-L20C1

guangyey · 2025-08-01T11:11:41Z

/HIPAllocatorMasqueradingAsCUDA.

@Camyll Thanks so much for the detailed debug info. Based on what you shared, I suspect this issue might be caused by a static initialization order problem—which I’ve been dealing with recently. Would you mind helping me verify if that’s the case by trying the following patch?

diff --git a/aten/src/ATen/hip/impl/HIPCachingAllocatorMasqueradingAsCUDA.cpp b/aten/src/ATen/hip/impl/HIPCachingAllocatorMasqueradingAsCUDA.cpp
index 19bc0a6b34e..68ddc09b53b 100644
--- a/aten/src/ATen/hip/impl/HIPCachingAllocatorMasqueradingAsCUDA.cpp
+++ b/aten/src/ATen/hip/impl/HIPCachingAllocatorMasqueradingAsCUDA.cpp
@@ -4,9 +4,10 @@
 namespace c10 { namespace hip {
 namespace HIPCachingAllocatorMasqueradingAsCUDA {
 
-static HIPAllocatorMasqueradingAsCUDA allocator(HIPCachingAllocator::get());
 
 Allocator* get() {
+  static HIPAllocatorMasqueradingAsCUDA allocator(HIPCachingAllocator::get());
   return &allocator;
 }
 
@@ -16,7 +17,7 @@ void recordStreamMasqueradingAsCUDA(const DataPtr& ptr, HIPStreamMasqueradingAsC
 
 // Register this HIP allocator as CUDA allocator to enable access through both
 // c10::GetAllocator(kCUDA) and c10::getDeviceAllocator(kCUDA) APIs
-REGISTER_ALLOCATOR(kCUDA, &allocator)
+// REGISTER_ALLOCATOR(kCUDA, &allocator)
 
 } // namespace HIPCachingAllocatorMasqueradingAsCUDA
 }} // namespace c10::hip

guangyey · 2025-08-01T15:02:35Z

c10/cuda/CUDACachingAllocator.cpp

+#define HIP_MASQUERADING_AS_CUDA \
+  "cud"                          \
+  "a"
+    at::SetAllocator(c10::Device(HIP_MASQUERADING_AS_CUDA).type(), r, 0);


@albanD As @Camyll pointed out here, I think the root cause is a static initialization order fiasco (SIOF)—something I’ve been struggling with recently (e.g., this PR).
To avoid static initializer dependencies during static init time, I addressed the issue in this commit: 4134108.
Additionally, I performed the CUDA masquerading here to support both c10::GetAllocator(kCUDA) and c10::getDeviceAllocator(kCUDA) for the HIP backend. Do you think it is acceptable.

@guangyey thanks for the changes! I tested this internally and the tests are no longer failing. We are good to reland this

@Camyll Thanks, I will rebase and then try to reland it.

[ghstack-poisoned]

albanD

Sounds good!

albanD · 2025-08-05T17:07:11Z

@pytorchbot merge

guangyey · 2025-08-05T17:08:36Z

@pytorchbot merge

pytorchmergebot · 2025-08-05T17:08:44Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2025-08-05T17:10:41Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-08-05T17:10:58Z

Merge failed

Reason: 1 jobs have failed, first few of them are: Check mergeability of ghstack PR / ghstack-mergeability-check

Details for Dev Infra team

Raised by workflow job

pytorchmergebot · 2025-08-05T17:11:09Z

Starting merge as part of PR stack under #155200

guangyey · 2025-08-06T00:27:50Z

@pytorchbot merge

pytorchmergebot · 2025-08-06T00:30:21Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-08-06T00:30:53Z

Merge failed

Reason: 2 jobs have failed, first few of them are: rocm / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2), xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 5, 6, linux.idc.xpu)

Details for Dev Infra team

Raised by workflow job

guangyey · 2025-08-06T00:32:21Z

@pytorchbot merge -i

pytorchmergebot · 2025-08-06T00:34:19Z

Merge started

Your change will be merged while ignoring the following 6 checks: Check Labels / Check labels, Check mergeability of ghstack PR / ghstack-mergeability-check, pull / linux-jammy-py3_9-clang9-xla / test (xla, 1, 1, linux.12xlarge, unstable), rocm / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.2), xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 2, 6, linux.idc.xpu), xpu / linux-jammy-xpu-2025.1-py3.9 / test (default, 5, 6, linux.idc.xpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

[ghstack-poisoned]

jithunnair-amd · 2025-08-07T16:32:41Z

@pytorchbot revert -c nosignal -m "Broke ROCm periodic runs on MI300 e.g. https://github.com/pytorch/pytorch/actions/runs/16764977800/job/47470050573"

cc @guangyey If this revert doesn't go through because it's part of a stack, please forward fix the issue.

pytorchmergebot · 2025-08-07T16:34:17Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot · 2025-08-07T16:34:47Z

@guangyey your PR has been successfully reverted.

guangyey · 2025-08-08T08:06:08Z

@jithunnair-amd I think the failure is not introduced by this PR, I see the same failure pynvml.NVMLError_LibraryNotFound: NVML Shared Library Not Found in File "/var/lib/jenkins/pytorch/test/distributed/test_c10d_nccl.py", line 689, in test_extra_cuda_context in the previous commit.
Anyway, let's add the label ciflow/periodic-rocm-mi300 to see what CI says. And I will reland this PR once this CI pass.

guangyey · 2025-08-08T14:12:28Z

Hi @jithunnair-amd , the ciflow/periodic-rocm-mi300 has passed on this PR, see https://github.com/pytorch/pytorch/actions/runs/16825034603/job/47659415983?pr=138222
I would like to reland this PR again.

[ghstack-poisoned]

pytorchmergebot · 2025-08-08T17:40:33Z

Starting merge as part of PR stack under #155200

guangyey added a commit that referenced this pull request Oct 17, 2024

Add CachingDeviceAllocatorInterface as the base device allocator

a4bc082

ghstack-source-id: 2634951 Pull Request resolved: #138222

guangyey marked this pull request as draft October 17, 2024 15:06

guangyey changed the title ~~Add CachingDeviceAllocatorInterface as the base device allocator~~ [WIP] Add CachingDeviceAllocatorInterface as the base device allocator Oct 17, 2024

pytorchbot added the open source label Oct 17, 2024

Update

ceb0601

[ghstack-poisoned]

github-actions bot added the Stale label Dec 16, 2024

guangyey added the no-stale label Dec 18, 2024

guangyey added a commit that referenced this pull request Mar 4, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

e82e5a3

ghstack-source-id: f0a1690 Pull Request resolved: #138222

Update

76fe045

[ghstack-poisoned]

guangyey added topic: improvements topic category topic: not user facing topic category labels Mar 18, 2025

guangyey mentioned this pull request Mar 18, 2025

Reuse format_size utils #149383

Closed

guangyey added a commit that referenced this pull request Mar 18, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

29d7597

ghstack-source-id: 0841241 Pull Request resolved: #138222

guangyey added a commit that referenced this pull request Mar 18, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

5add2d1

ghstack-source-id: ba52b22 Pull Request resolved: #138222

guangyey added 5 commits March 18, 2025 13:03

Update

d000cff

[ghstack-poisoned]

Update

c27a0c3

[ghstack-poisoned]

Update

37733da

[ghstack-poisoned]

Update

9d67fb3

[ghstack-poisoned]

Update

922288e

[ghstack-poisoned]

guangyey mentioned this pull request Mar 20, 2025

Introduce AcceleratorAllocatorConfig as the common class #149601

Closed

guangyey added a commit that referenced this pull request Mar 20, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

f06852e

ghstack-source-id: 865b7cf Pull Request resolved: #138222

guangyey added a commit that referenced this pull request Mar 20, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

0ead5dd

ghstack-source-id: 1c6bfe4 Pull Request resolved: #138222

guangyey added 2 commits March 20, 2025 13:24

Update

6d4ab7b

[ghstack-poisoned]

Update

ca80023

[ghstack-poisoned]

guangyey added a commit that referenced this pull request Mar 21, 2025

Add CachingDeviceAllocatorInterface as the base device allocator

ef279cd

ghstack-source-id: 4a2803a Pull Request resolved: #138222

guangyey commented Aug 1, 2025

View reviewed changes

guangyey added 2 commits August 1, 2025 21:49

Update

4134108

[ghstack-poisoned]

Update

325e8ea

[ghstack-poisoned]

Camyll approved these changes Aug 5, 2025

View reviewed changes

albanD approved these changes Aug 5, 2025

View reviewed changes

Update

3d2dfda

[ghstack-poisoned]

Update

5521326

[ghstack-poisoned]

Conversation

guangyey commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Solution

Uh oh!

pytorch-bot bot commented Oct 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/138222

⏳ 5 Pending, 2 Unrelated Failures

Uh oh!

github-actions bot commented Dec 16, 2024

Uh oh!

Camyll commented Jul 25, 2025

Uh oh!

guangyey commented Jul 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Camyll commented Jul 28, 2025

Uh oh!

guangyey commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Camyll commented Jul 31, 2025

Uh oh!

guangyey commented Aug 1, 2025

Uh oh!

guangyey Aug 1, 2025

Choose a reason for hiding this comment

Uh oh!

Camyll Aug 5, 2025

Choose a reason for hiding this comment

Uh oh!

guangyey Aug 6, 2025

Choose a reason for hiding this comment

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD commented Aug 5, 2025

Uh oh!

guangyey commented Aug 5, 2025

Uh oh!

pytorchmergebot commented Aug 5, 2025

Uh oh!

pytorchmergebot commented Aug 5, 2025

Merge started

Uh oh!

pytorchmergebot commented Aug 5, 2025

Merge failed

Uh oh!

pytorchmergebot commented Aug 5, 2025

Uh oh!

guangyey commented Aug 6, 2025

Uh oh!

pytorchmergebot commented Aug 6, 2025

Merge started

Uh oh!

pytorchmergebot commented Aug 6, 2025

Merge failed

Uh oh!

guangyey commented Aug 6, 2025

Uh oh!

pytorchmergebot commented Aug 6, 2025

Merge started

Uh oh!

jithunnair-amd commented Aug 7, 2025

Uh oh!

pytorchmergebot commented Aug 7, 2025

Uh oh!

pytorchmergebot commented Aug 7, 2025

Uh oh!

guangyey commented Aug 8, 2025

Uh oh!

guangyey commented Aug 8, 2025

Uh oh!

pytorchmergebot commented Aug 8, 2025

Uh oh!

Reviewers

guangyey commented Oct 17, 2024 •

edited

Loading

pytorch-bot bot commented Oct 17, 2024 •

edited

Loading

guangyey commented Jul 26, 2025 •

edited

Loading

guangyey commented Jul 29, 2025 •

edited

Loading