Fix CUDA allocator SIOF causing `torch.accelerator` API failures by jmswen · Pull Request #176817 · pytorch/pytorch

jmswen · 2026-03-08T03:27:36Z

Summary:
The CUDACachingAllocator (a DeviceAllocator) and Caffe2's legacy
DefaultCUDAAllocator (a plain Allocator) both registered for
DeviceType::CUDA at priority 0. Since SetAllocator uses >= comparison,
whichever static initializer ran last would win. When the legacy
allocator won the race, dynamic_cast<DeviceAllocator*> in
getDeviceAllocator() would fail, crashing torch.accelerator.empty_cache()
and other torch.accelerator APIs. To be clear, this is not an issue in
pure OSS PyTorch, where the Caffe2 legacy CUAD allocator does not exist.

Fix by bumping CUDACachingAllocator's registration priority to 1 so it
always takes precedence over the legacy Caffe2 allocator regardless of
static initialization order.

This SIOF surfaced recently in vLLM after some code was generalized to use
torch.accelerator.empty_cache() instead of torch.cuda.empty_cache() in
vllm-project/vllm#30681.

Test Plan:

buck test fbcode//mode/opt fbcode//vllm/omni:test_kernels_rotary_embedding -- --exact 'fbcode//vllm/omni:test_kernels_rotary_embedding - test_rotary_embedding.py::test_rotary_embedding_opcheck[False-False-1024-108-32-True-11-cuda]'

Previously: 1 passed, 1 error (RuntimeError during teardown)
Now: 2 passed, 0 errors

Errors/stack traces like the following are resolved after this change:

    def empty_cache() -> None:
        r"""Release all unoccupied cached memory currently held by the caching
        allocator so that those can be used in other application.
    
        .. note:: This function is a no-op if the memory allocator for the current
            :ref:`accelerator <accelerators>` has not been initialized.
        """
>       if not torch._C._accelerator_isAllocatorInitialized():
E       RuntimeError: device_allocator INTERNAL ASSERT FAILED at "fbcode/caffe2/c10/core/CachingDeviceAllocator.h":253, please report a bug to PyTorch. Allocator for cuda is not a DeviceAllocator.

Differential Revision: D95703075

Summary: The `CUDACachingAllocator` (a `DeviceAllocator`) and Caffe2's legacy `DefaultCUDAAllocator` (a plain `Allocator`) both registered for `DeviceType::CUDA` at priority 0. Since `SetAllocator` uses `>=` comparison, whichever static initializer ran last would win. When the legacy allocator won the race, `dynamic_cast<DeviceAllocator*>` in `getDeviceAllocator()` would fail, crashing `torch.accelerator.empty_cache()` and other `torch.accelerator` APIs. Fix by bumping `CUDACachingAllocator`'s registration priority to 1 so it always takes precedence over the legacy Caffe2 allocator regardless of static initialization order. Test Plan: ``` buck test fbcode//mode/opt fbcode//vllm/omni:test_kernels_rotary_embedding -- --exact 'fbcode//vllm/omni:test_kernels_rotary_embedding - test_rotary_embedding.py::test_rotary_embedding_opcheck[False-False-1024-108-32-True-11-cuda]' ``` Previously: 1 passed, 1 error (`RuntimeError` during teardown) Now: 2 passed, 0 errors Differential Revision: D95703075

pytorch-bot · 2026-03-08T03:27:40Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176817

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 1cbe619 with merge base 2b8b4ff ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-03-08T03:27:42Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

meta-codesync · 2026-03-08T03:27:46Z

@jmswen has exported this pull request. If you are a Meta employee, you can view the originating Diff in D95703075.

jmswen · 2026-03-08T03:50:12Z

@pytorchbot label "topic: not user facing"

jmswen · 2026-03-08T18:31:07Z

@albanD Another approach that I think is functionally quite similar is simply to not REGISTER_ALLOCATOR the legacy DefaultCUDAAllocator. Let me know which approach you think is better, or if there's a third better option.

zou3519 · 2026-03-09T15:32:18Z

@jmswen can you lower the priority of the caffe2 default allocator?

jmswen · 2026-03-09T15:52:59Z

@zou3519 Unfortunately not since the priority is a uint8_t and the default allocator is already using the lowest priority (0).

albanD

Sounds ok!

facebook-github-bot · 2026-03-09T21:50:15Z

@pytorchbot merge

(Initiating merge automatically since Phabricator Diff has merged)

pytorchmergebot · 2026-03-09T21:53:22Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jmswen requested review from Aidyn-A, eqy and syed-ahmed as code owners March 8, 2026 03:27

meta-codesync bot added fb-exported meta-exported labels Mar 8, 2026

jmswen changed the title ~~[PyTorch] Fix CUDA allocator SIOF causing torch.accelerator API failures~~ [PyTorch] Fix CUDA allocator SIOF causing torch.accelerator API failures Mar 8, 2026

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 8, 2026

jmswen changed the title ~~[PyTorch] Fix CUDA allocator SIOF causing torch.accelerator API failures~~ Fix CUDA allocator SIOF causing torch.accelerator API failures Mar 8, 2026

jmswen requested a review from zou3519 March 8, 2026 03:36

pytorch-bot bot added the topic: not user facing topic category label Mar 8, 2026

jmswen requested a review from houseroad March 8, 2026 04:25

jmswen added the topic: bug fixes topic category label Mar 8, 2026

zou3519 requested a review from albanD March 8, 2026 16:42

albanD approved these changes Mar 9, 2026

View reviewed changes

pytorchmergebot added the merging label Mar 9, 2026

pytorchmergebot closed this in f7d3041 Mar 9, 2026

pytorchmergebot added Merged and removed merging labels Mar 9, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix CUDA allocator SIOF causing `torch.accelerator` API failures#176817

Fix CUDA allocator SIOF causing `torch.accelerator` API failures#176817
jmswen wants to merge 1 commit intopytorch:mainfrom
jmswen:export-D95703075

jmswen commented Mar 8, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 8, 2026 •

edited

Loading

Uh oh!

pytorch-bot bot commented Mar 8, 2026

Uh oh!

meta-codesync bot commented Mar 8, 2026

Uh oh!

jmswen commented Mar 8, 2026

Uh oh!

jmswen commented Mar 8, 2026 •

edited

Loading

Uh oh!

zou3519 commented Mar 9, 2026

Uh oh!

jmswen commented Mar 9, 2026

Uh oh!

albanD left a comment

Uh oh!

facebook-github-bot commented Mar 9, 2026

Uh oh!

pytorchmergebot commented Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

jmswen commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176817

✅ No Failures

Uh oh!

pytorch-bot bot commented Mar 8, 2026

This PR needs a release notes: label

Uh oh!

meta-codesync bot commented Mar 8, 2026

Uh oh!

jmswen commented Mar 8, 2026

Uh oh!

jmswen commented Mar 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zou3519 commented Mar 9, 2026

Uh oh!

jmswen commented Mar 9, 2026

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Mar 9, 2026

Uh oh!

pytorchmergebot commented Mar 9, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

jmswen commented Mar 8, 2026 •

edited

Loading

pytorch-bot bot commented Mar 8, 2026 •

edited

Loading

This PR needs a `release notes:` label

jmswen commented Mar 8, 2026 •

edited

Loading