Skip to content

Fix CUDA allocator SIOF causing torch.accelerator API failures#176817

Closed
jmswen wants to merge 1 commit intopytorch:mainfrom
jmswen:export-D95703075
Closed

Fix CUDA allocator SIOF causing torch.accelerator API failures#176817
jmswen wants to merge 1 commit intopytorch:mainfrom
jmswen:export-D95703075

Conversation

@jmswen
Copy link
Contributor

@jmswen jmswen commented Mar 8, 2026

Summary:
The CUDACachingAllocator (a DeviceAllocator) and Caffe2's legacy
DefaultCUDAAllocator (a plain Allocator) both registered for
DeviceType::CUDA at priority 0. Since SetAllocator uses >= comparison,
whichever static initializer ran last would win. When the legacy
allocator won the race, dynamic_cast<DeviceAllocator*> in
getDeviceAllocator() would fail, crashing torch.accelerator.empty_cache()
and other torch.accelerator APIs. To be clear, this is not an issue in
pure OSS PyTorch, where the Caffe2 legacy CUAD allocator does not exist.

Fix by bumping CUDACachingAllocator's registration priority to 1 so it
always takes precedence over the legacy Caffe2 allocator regardless of
static initialization order.

This SIOF surfaced recently in vLLM after some code was generalized to use
torch.accelerator.empty_cache() instead of torch.cuda.empty_cache() in
vllm-project/vllm#30681.

Test Plan:

buck test fbcode//mode/opt fbcode//vllm/omni:test_kernels_rotary_embedding -- --exact 'fbcode//vllm/omni:test_kernels_rotary_embedding - test_rotary_embedding.py::test_rotary_embedding_opcheck[False-False-1024-108-32-True-11-cuda]'

Previously: 1 passed, 1 error (RuntimeError during teardown)
Now: 2 passed, 0 errors

Errors/stack traces like the following are resolved after this change:

    def empty_cache() -> None:
        r"""Release all unoccupied cached memory currently held by the caching
        allocator so that those can be used in other application.
    
        .. note:: This function is a no-op if the memory allocator for the current
            :ref:`accelerator <accelerators>` has not been initialized.
        """
>       if not torch._C._accelerator_isAllocatorInitialized():
E       RuntimeError: device_allocator INTERNAL ASSERT FAILED at "fbcode/caffe2/c10/core/CachingDeviceAllocator.h":253, please report a bug to PyTorch. Allocator for cuda is not a DeviceAllocator.

Differential Revision: D95703075

Summary:
The `CUDACachingAllocator` (a `DeviceAllocator`) and Caffe2's legacy
`DefaultCUDAAllocator` (a plain `Allocator`) both registered for
`DeviceType::CUDA` at priority 0. Since `SetAllocator` uses `>=` comparison,
whichever static initializer ran last would win. When the legacy
allocator won the race, `dynamic_cast<DeviceAllocator*>` in
`getDeviceAllocator()` would fail, crashing `torch.accelerator.empty_cache()`
and other `torch.accelerator` APIs.

Fix by bumping `CUDACachingAllocator`'s registration priority to 1 so it
always takes precedence over the legacy Caffe2 allocator regardless of
static initialization order.

Test Plan:
```
buck test fbcode//mode/opt fbcode//vllm/omni:test_kernels_rotary_embedding -- --exact 'fbcode//vllm/omni:test_kernels_rotary_embedding - test_rotary_embedding.py::test_rotary_embedding_opcheck[False-False-1024-108-32-True-11-cuda]'
```

Previously: 1 passed, 1 error (`RuntimeError` during teardown)
Now: 2 passed, 0 errors

Differential Revision: D95703075
@pytorch-bot
Copy link

pytorch-bot bot commented Mar 8, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176817

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1cbe619 with merge base 2b8b4ff (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot
Copy link

pytorch-bot bot commented Mar 8, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@meta-codesync
Copy link

meta-codesync bot commented Mar 8, 2026

@jmswen has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95703075.

@jmswen jmswen changed the title [PyTorch] Fix CUDA allocator SIOF causing torch.accelerator API failures [PyTorch] Fix CUDA allocator SIOF causing torch.accelerator API failures Mar 8, 2026
@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 8, 2026
@jmswen jmswen changed the title [PyTorch] Fix CUDA allocator SIOF causing torch.accelerator API failures Fix CUDA allocator SIOF causing torch.accelerator API failures Mar 8, 2026
@jmswen jmswen requested a review from zou3519 March 8, 2026 03:36
@jmswen
Copy link
Contributor Author

jmswen commented Mar 8, 2026

@pytorchbot label "topic: not user facing"

@pytorch-bot pytorch-bot bot added the topic: not user facing topic category label Mar 8, 2026
@jmswen jmswen requested a review from houseroad March 8, 2026 04:25
@jmswen jmswen added the topic: bug fixes topic category label Mar 8, 2026
@zou3519 zou3519 requested a review from albanD March 8, 2026 16:42
@jmswen
Copy link
Contributor Author

jmswen commented Mar 8, 2026

@albanD Another approach that I think is functionally quite similar is simply to not REGISTER_ALLOCATOR the legacy DefaultCUDAAllocator. Let me know which approach you think is better, or if there's a third better option.

@zou3519
Copy link
Contributor

zou3519 commented Mar 9, 2026

@jmswen can you lower the priority of the caffe2 default allocator?

@jmswen
Copy link
Contributor Author

jmswen commented Mar 9, 2026

@zou3519 Unfortunately not since the priority is a uint8_t and the default allocator is already using the lowest priority (0).

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds ok!

@facebook-github-bot
Copy link
Contributor

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request fb-exported Merged meta-exported topic: bug fixes topic category topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants