Refactor CUDAAllocatorConfig to reuse AcceleratorAllocatorConfig#150312
Refactor CUDAAllocatorConfig to reuse AcceleratorAllocatorConfig#150312guangyey wants to merge 101 commits intogh/guangyey/133/basefrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/150312
Note: Links to docs will display an error until the docs builds have been completed. ❗ 1 Active SEVsThere are 1 currently active SEVs. If your PR is affected, please view them below: ✅ You can merge normally! (3 Unrelated Failures)As of commit df9befb with merge base bb67660 ( UNSTABLE - The following jobs are marked as unstable, possibly due to flakiness on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Starting merge as part of PR stack under #156175 |
# Motivation This PR moves the implementation of `torch.cuda.memory._set_allocator_settings` to `torch._C._accelerator_setAllocatorSettings`. Since the original API was intended as a temporary/internal utility, I am not exposing the new function as a public API. Pull Request resolved: #156175 Approved by: https://github.com/albanD ghstack dependencies: #149601, #157908, #150312, #156165
|
Hi, I'm noticing downstream failures that a bisect tracked back to this commit: ROCm/TheRock#1155. We're building PyTorch with ROCm on Windows and at runtime we have new failures to load This commit does not cleanly revert because there is a sequence of other commits stacked on top: https://github.com/pytorch/pytorch/commits/main/c10/cuda I'm trying to debug what has changed and why the load is failing. Do you have any advice or ideas? |
|
Hi @ScottTodd, I think the big code change in this PR is that we reuse pytorch/c10/cuda/CUDACachingAllocator.cpp Lines 4130 to 4134 in c68ad1b This line ensures that CUDAAllocatorConfig is initialized at static initialization time. I don't think this will invoke the static initialization order fiasco issue. However, I’m not entirely sure how this behaves on ROCm under Windows. You might try commenting out this line and rebuilding to see if the issue persists.
|
|
Same errors with this diff applied: λ git diff
diff --git a/c10/cuda/CUDACachingAllocator.cpp b/c10/cuda/CUDACachingAllocator.cpp
index b0b1be8937a..052d6006034 100644
--- a/c10/cuda/CUDACachingAllocator.cpp
+++ b/c10/cuda/CUDACachingAllocator.cpp
@@ -4128,9 +4128,9 @@ CUDAAllocator* allocator();
struct BackendStaticInitializer {
CUDAAllocator* parseEnvForBackend() {
// If the environment variable is set, we use the CudaMallocAsync allocator.
- if (CUDAAllocatorConfig::use_async_allocator()) {
- return CudaMallocAsync::allocator();
- }
+ // if (CUDAAllocatorConfig::use_async_allocator()) {
+ // return CudaMallocAsync::allocator();
+ // }
return &Native::allocator;
}
diff --git a/c10/hip/HIPCachingAllocator.cpp b/c10/hip/HIPCachingAllocator.cpp
index a2455ac42d5..f4c61fcbe1d 100644
--- a/c10/hip/HIPCachingAllocator.cpp
+++ b/c10/hip/HIPCachingAllocator.cpp
@@ -4129,9 +4129,9 @@ HIPAllocator* allocator();
struct BackendStaticInitializer {
HIPAllocator* parseEnvForBackend() {
// If the environment variable is set, we use the HipMallocAsync allocator.
- if (HIPAllocatorConfig::use_async_allocator()) {
- return HipMallocAsync::allocator();
- }
+ // if (HIPAllocatorConfig::use_async_allocator()) {
+ // return HipMallocAsync::allocator();
+ // }
return &Native::allocator;
}I'd like to get more logs, even printfs, or a debugger attached somehow. This is tricky to debug when all I'm getting is
|
|
Alright, not much luck debugging, but on closer inspection of the code there is an initialization order issue here that is fixed by moving --- a/c10/hip/HIPAllocatorConfig.h
+++ b/c10/hip/HIPAllocatorConfig.h
@@ -112,7 +112,22 @@ class C10_HIP_API HIPAllocatorConfig {
}
static const std::unordered_set<std::string>& getKeys() {
- return keys_;
+
+ static std::unordered_set<std::string> keys = {
+ "backend",
+ // keep BC for Rocm: `cuda` -> `cud` `a`, to avoid hipify issues
+ // NOLINTBEGIN(bugprone-suspicious-missing-comma,-warnings-as-errors)
+ "release_lock_on_cud"
+ "amalloc",
+ "pinned_use_cud"
+ "a_host_register",
+ // NOLINTEND(bugprone-suspicious-missing-comma,-warnings-as-errors)
+ "release_lock_on_hipmalloc",
+ "pinned_use_hip_host_register",
+ "pinned_num_register_threads",
+ };
+
+ return keys;
}
static HIPAllocatorConfig& instance() {
@@ -164,18 +179,6 @@ class C10_HIP_API HIPAllocatorConfig {
std::atomic<bool> m_pinned_use_hip_host_register{false};
std::atomic<bool> m_use_async_allocator{false};
std::atomic<bool> m_is_allocator_loaded{false};
- inline static std::unordered_set<std::string> keys_{
- "backend",
- // keep BC for Rocm: `cuda` -> `cud` `a`, to avoid hipify issues
- // NOLINTBEGIN(bugprone-suspicious-missing-comma,-warnings-as-errors)
- "release_lock_on_cud"
- "amalloc",
- "pinned_use_cud"
- "a_host_register",
- // NOLINTEND(bugprone-suspicious-missing-comma,-warnings-as-errors)
- "release_lock_on_hipmalloc",
- "pinned_use_hip_host_register",
- "pinned_num_register_threads"};
};The C++ runtime must be fully initialized before STL containers like I can send a PR with the fix. |
|
Ah, though c8cf811 also mutates |
|
@pytorchbot revert -m 'Static initialization order issue impact the downstream repo' -c ghfirst |
|
@pytorchbot successfully started a revert job. Check the current status here. |
…56175)" This reverts commit d3ce450. Reverted #156175 on behalf of https://github.com/guangyey due to Static initialization order issue impact the downstream repo ([comment](#150312 (comment)))
…leratorAllocatorConfig instead (#156165)" This reverts commit 1fc010a. Reverted #156165 on behalf of https://github.com/guangyey due to Static initialization order issue impact the downstream repo ([comment](#150312 (comment)))
|
@guangyey your PR has been successfully reverted. |
Sorry for the disruption. Let's revert this PR first to unblock your progress. |
|
Starting merge as part of PR stack under #156175 |
3 similar comments
|
Starting merge as part of PR stack under #156175 |
|
Starting merge as part of PR stack under #156175 |
|
Starting merge as part of PR stack under #156175 |
Stack from ghstack (oldest at bottom):
Motivation
Refactor
CUDAAllocatorConfigto reuseAcceleratorAllocatorConfigandConfigTokenizer. We would deprecate those option that overleap withAcceleratorAllocatorConfigin the following PR and keep them only for BC.cc @H-Huang @awgu @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k