Add ctx manager: caching_allocator_disabled to temporarily disable CCA#177418
Add ctx manager: caching_allocator_disabled to temporarily disable CCA#177418ColinPeppler wants to merge 8 commits intogh/ColinPeppler/6/basefrom
Conversation
[ghstack-poisoned]
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177418
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 47570c7 with merge base a345892 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
… disable CCA" Useful for IMA debugging. Usually I do this. ``` # Disable CCA ... # Enable CCA ``` Other option is `PYTORCH_NO_CUDA_MEMORY_CACHING=1` but sometimes I like to set it in the script. cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo [ghstack-poisoned]
|
Hi @eee4017, can I get your review whenever you get a chance? Thanks! |
… disable CCA"
### Why
- An IMA debugging aid to specifically disable CCA on a targeted block of code.
- Another option is `PYTORCH_NO_CUDA_MEMORY_CACHING=1` but that is set globally.
Usually I'd do this.
```
torch.cuda.caching_allocator_enable(False)
try:
...
finally: # make sure to clean up even on exception
torch.cuda.caching_allocator_enable(True)
```
### What
Add a utility that
- Disables CUDA caching allocator (CCA) when entering the block.
- Restores the CCA state when exiting the block (even on exceptions).
```
with torch.cuda.caching_allocator_disabled():
...
```
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo
[ghstack-poisoned]
eee4017
left a comment
There was a problem hiding this comment.
Since caching_allocator_enable already exists and allocate() bakes the correct deleter (uncached_delete vs local_raw_delete) into each pointer at allocation time, tensors allocated inside the disabled region will always free correctly regardless of whether the allocator is re-enabled. This PR just add a context manager that is a straightforward save/restore wrapper.
|
@pytorchbot merge |
Merge failedReason: Approvers from one of the following sets are needed:
|
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
… disable CCA"
### Why
- An IMA debugging aid to specifically disable CCA on a targeted block of code.
- Another option is `PYTORCH_NO_CUDA_MEMORY_CACHING=1` but that is set globally.
Usually I'd do this.
```
torch.cuda.caching_allocator_enable(False)
try:
...
finally: # make sure to clean up even on exception
torch.cuda.caching_allocator_enable(True)
```
### What
Add a utility that
- Disables CUDA caching allocator (CCA) when entering the block.
- Restores the CCA state when exiting the block (even on exceptions).
```
with torch.cuda.caching_allocator_disabled():
...
```
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo
[ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 1 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
… disable CCA"
### Why
- An IMA debugging aid to specifically disable CCA on a targeted block of code.
- Another option is `PYTORCH_NO_CUDA_MEMORY_CACHING=1` but that is set globally.
Usually I'd do this.
```
torch.cuda.caching_allocator_enable(False)
try:
...
finally: # make sure to clean up even on exception
torch.cuda.caching_allocator_enable(True)
```
### What
Add a utility that
- Disables CUDA caching allocator (CCA) when entering the block.
- Restores the CCA state when exiting the block (even on exceptions).
```
with torch.cuda.caching_allocator_disabled():
...
```
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo
[ghstack-poisoned]
… disable CCA"
### Why
- An IMA debugging aid to specifically disable CCA on a targeted block of code.
- Another option is `PYTORCH_NO_CUDA_MEMORY_CACHING=1` but that is set globally.
Usually I'd do this.
```
torch.cuda.caching_allocator_enable(False)
try:
...
finally: # make sure to clean up even on exception
torch.cuda.caching_allocator_enable(True)
```
### What
Add a utility that
- Disables CUDA caching allocator (CCA) when entering the block.
- Restores the CCA state when exiting the block (even on exceptions).
```
with torch.cuda.caching_allocator_disabled():
...
```
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo
[ghstack-poisoned]
… disable CCA"
### Why
- An IMA debugging aid to specifically disable CCA on a targeted block of code.
- Another option is `PYTORCH_NO_CUDA_MEMORY_CACHING=1` but that is set globally.
Usually I'd do this.
```
torch.cuda.caching_allocator_enable(False)
try:
...
finally: # make sure to clean up even on exception
torch.cuda.caching_allocator_enable(True)
```
### What
Add a utility that
- Disables CUDA caching allocator (CCA) when entering the block.
- Restores the CCA state when exiting the block (even on exceptions).
```
with torch.cuda.caching_allocator_disabled():
...
```
cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx ipiszy kadeng muchulee8 amjames chauhang aakhundov coconutruben jataylo
[ghstack-poisoned]
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
pytorch#177418) ### Why - An IMA debugging aid to specifically disable CCA on a targeted block of code. - Another option is `PYTORCH_NO_CUDA_MEMORY_CACHING=1` but that is set globally. Usually I'd do this. ``` torch.cuda.caching_allocator_enable(False) try: ... finally: # make sure to clean up even on exception torch.cuda.caching_allocator_enable(True) ``` ### What Add a utility that - Disables CUDA caching allocator (CCA) when entering the block. - Restores the CCA state when exiting the block (even on exceptions). ``` with torch.cuda.caching_allocator_disabled(): ... ``` Pull Request resolved: pytorch#177418 Approved by: https://github.com/eee4017, https://github.com/laithsakka ghstack dependencies: pytorch#177308
…agnostic 8 AOTInductor tests fail on XPU because `caching_allocator_disabled()` (Intorduced by #177418) from `torch.cuda.memory` calls `torch._C._cuda_cudaCachingAllocator_is_enabled()` which doesn't exist in XPU-only builds. Replace the direct import of `torch.cuda.caching_allocator_disabled` with a device-aware wrapper that delegates to the CUDA implementation on CUDA builds and acts as a no-op on other GPU backends (XPU, etc.). ghstack-source-id: 81004c9 Pull-Request: #179659
…agnostic 8 AOTInductor tests fail on XPU because `caching_allocator_disabled()` (Intorduced by #177418) from `torch.cuda.memory` calls `torch._C._cuda_cudaCachingAllocator_is_enabled()` which doesn't exist in XPU-only builds. Replace the direct import of `torch.cuda.caching_allocator_disabled` with a device-aware wrapper that delegates to the CUDA implementation on CUDA builds and acts as a no-op on other GPU backends (XPU, etc.). ghstack-source-id: caf3d25 Pull-Request: #179659
…agnostic 8 AOTInductor tests fail on XPU because `caching_allocator_disabled()` (Intorduced by #177418) from `torch.cuda.memory` calls `torch._C._cuda_cudaCachingAllocator_is_enabled()` which doesn't exist in XPU-only builds. Replace the direct import of `torch.cuda.caching_allocator_disabled` with a device-aware wrapper that delegates to the CUDA implementation on CUDA builds and acts as a no-op on other GPU backends (XPU, etc.). ghstack-source-id: 340ee3f Pull-Request: #179659
…agnostic 8 AOTInductor tests fail on XPU because `caching_allocator_disabled()` (Intorduced by #177418) from `torch.cuda.memory` calls `torch._C._cuda_cudaCachingAllocator_is_enabled()` which doesn't exist in XPU-only builds. Replace the direct import of `torch.cuda.caching_allocator_disabled` with a device-aware wrapper that delegates to the CUDA implementation on CUDA builds and acts as a no-op on other GPU backends (XPU, etc.). ghstack-source-id: f567e0a Pull-Request: #179659
…agnostic 8 AOTInductor tests fail on XPU because `caching_allocator_disabled()` (Intorduced by #177418) from `torch.cuda.memory` calls `torch._C._cuda_cudaCachingAllocator_is_enabled()` which doesn't exist in XPU-only builds. Replace the direct import of `torch.cuda.caching_allocator_disabled` with a device-aware wrapper that delegates to the CUDA implementation on CUDA builds and acts as a no-op on other GPU backends (XPU, etc.). ghstack-source-id: a0c1904 Pull-Request: #179659
…agnostic 8 AOTInductor tests fail on XPU because `caching_allocator_disabled()` (Intorduced by #177418) from `torch.cuda.memory` calls `torch._C._cuda_cudaCachingAllocator_is_enabled()` which doesn't exist in XPU-only builds. Replace the direct import of `torch.cuda.caching_allocator_disabled` with a device-aware wrapper that delegates to the CUDA implementation on CUDA builds and acts as a no-op on other GPU backends (XPU, etc.). ghstack-source-id: a0c1904 Pull-Request: #179659
…agnostic 8 AOTInductor tests fail on XPU because `caching_allocator_disabled()` (Intorduced by #177418) from `torch.cuda.memory` calls `torch._C._cuda_cudaCachingAllocator_is_enabled()` which doesn't exist in XPU-only builds. Replace the direct import of `torch.cuda.caching_allocator_disabled` with a device-aware wrapper that delegates to the CUDA implementation on CUDA builds and acts as a no-op on other GPU backends (XPU, etc.). ghstack-source-id: 5a4f221 Pull-Request: #179659
Why
PYTORCH_NO_CUDA_MEMORY_CACHING=1but that is set globally.Usually I'd do this.
What
Add a utility that
Stack from ghstack (oldest at bottom):
cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo