Add functions to setup PrivateUse1 as a python backend device. by qihqi · Pull Request #157859 · pytorch/pytorch

qihqi · 2025-07-08T21:30:17Z

This PR setup the privateuseone key in Python to be used as a python backend for pytorch.
Meaning that, after calling setup_privateuseone_for_python_backend('npy'), one can use a subclass to with that device to hold arbitrary python data as "device data" and use torch.library to register ops that takes that Tensor.

Changes done in this PR:

Register an vanilla Device Guard: I extended NoOpDeviceGuard to have allow device index of 0 and to not raise errors when event related functions are accessed. If I don't do those, when calling backward I would get errors. (CPU backend uses NoOpDeviceGuard just fine, although there seems to be special treatment of CPU in the autograd engine.
Tensor subclass allows not having __torch_dispatch__ if the device is not CUDA or CPU. The comment of the check suggests it was to avoid segfault when calling into ops that expects a storage. Here we have a different device so will not call into those ops.
python function that invokes the other incantations to setup the privateusekey backend.

This took inspiration of https://github.com/bdhirsh/pytorch_open_registration_example and https://github.com/tinygrad/tinygrad/blob/master/extra/torch_backend/wrapped_tensor.cpp; great thanks to @bdhirsh and @geohot.

pytorch-bot · 2025-07-08T21:30:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157859

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 7464a8e with merge base c58e096 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-rocm-py3.10 / test (distributed, 1, 1, linux.rocm.gpu.gfx942.4) (gh) (similar failure)
distributed/test_c10d_nccl.py::TimeoutTest::test_default_store_timeout_nccl

This comment was automatically generated by Dr. CI and updates every 15 minutes.

albanD

Thanks for looking into this !

test/test_privateuseone_python_backend.py

c10/core/impl/DeviceGuardImplInterface.h

torch/csrc/autograd/python_variable.cpp

torch/utils/backend_registration.py

aten/src/ATen/detail/PrivateUse1HooksInterface.cpp

fffrog · 2025-07-09T10:15:45Z

Hmm... Specializing PrivateUse1HooksInterface and DeviceGuardImplInterface for PrivateUse1 is not a good idea directly . If you really want to implement this functionality based on pure Python, why not consider exposing these two classes to the Python side, and then let users inherit and implement the specific details of these two classes on the Python side? In addition, can __torch_function__ and __torch_dispatch__ basically achieve what you want?

qihqi · 2025-07-09T14:18:27Z

Hmm... Specializing PrivateUse1HooksInterface and DeviceGuardImplInterface for PrivateUse1 is not a good idea directly . If you really want to implement this functionality based on pure Python, why not consider exposing these two classes to the Python side, and then let users inherit and implement the specific details of these two classes on the Python side?

This is indeed a better idea. I'll try to do that. Initially I picked the least intrusive thing (everything optional, and do as little as it can).

In addition, can __torch_function__ and __torch_dispatch__ basically achieve what you want?

Indeed it does basically (https://dev-discuss.pytorch.org/t/embrace-tensor-subclass-as-a-python-device-registration-api/2771); and it is what I am using right now. There are places where it interacts poorly with torch, including the 2 issues that this PR fixes; as well as AMP integration.

c10/core/impl/DeviceGuardImplInterface.h

test/test_privateuseone_python_backend.py

torch/csrc/Module.cpp

fffrog · 2025-07-17T05:00:37Z

I have an idea. Apology if there are any problems.

Can we directly export PrivateUse1HooksInterface and DeviceGuardImplInterface to Python through pybind11 and trampoline class (the code can be put in torch/csrc/Modules.cpp, because this is the logic of binding between Python and C++), and then provide an API to register instances of the above two classes from the Python side;

In this way, users can integrate the classes on the Python side to implement the corresponding interfaces, and can also easily know which interfaces need to be implemented; whats more, there is no need to store PythonPrivateUse1DeviceGuardandPythonPrivateUse1HooksInterface` in PyTorch.

Of course, it is also possible to forward all operations from C++ back to Python like the current PR implementation, but we'd better provide a base class consistent with C++ for users to inherit, so that users can easily understand which interfaces may need to be implemented.

qihqi · 2025-07-27T03:43:33Z

@pytorchbot label "release notes: python_frontend"

qihqi · 2025-07-27T03:52:37Z

Hi @fffrog @albanD :

Made few changes as per discussion:

Made the Hook and Device class C++ classes exposed to python with trampoline as suggested. To make this work, I followed instructions on https://pybind11.readthedocs.io/en/stable/advanced/classes.html; and, had to update the version of pybind11 in third_party/ because the old version did not have the header <pybind11/trampoline_self_life_support.h>. LMK if this is OK or not.
Moved changes to a new directory torch/csrc/pytorch_custom_backend; as based on

pytorch/torch/csrc/README.md

Line 4 in f6c89c1

with Python. This is in contrast to lib, which contains the Torch

it seems more desired to have python binding stuff there than C10.
Changes in build_variables.bzl to make the file builds.

Let me know how it looks, also please start CI to see if any tests fails; I don't have the permission to do that.

Thanks!

fffrog · 2025-07-28T09:26:14Z

Thank you.

Made the Hook and Device class C++ classes exposed to python with trampoline as suggested. To make this work, I followed instructions on https://pybind11.readthedocs.io/en/stable/advanced/classes.html; and, had to update the version of pybind11 in third_party/ because the old version did not have the header <pybind11/trampoline_self_life_support.h>. LMK if this is OK or not.

The upgrade of pybind11 needs to be done very carefully, so we need to find other ways to solve this problem.

Back to the problem itself, we do need to pay attention to the variable life cycle in C++ and Python. In my opinion, using the holder_type of shared_ptr should be a feasible method.

torch/csrc/python_custom_backend/Module.cpp

qihqi · 2025-07-28T17:36:15Z

The upgrade of pybind11 needs to be done very carefully,

Indeed.

so we need to find other ways to solve this problem.

I can also make a separate PR to just update pybind11 just to be careful. would you think that is a good idea?

Back to the problem itself, we do need to pay attention to the variable life cycle in C++ and Python. In my opinion, using the holder_type of shared_ptr should be a feasible method.

would you describe what is the life cycle issue? I thought the reason of why we need the header is because that is how to get definitions of PYBIND11_OVERRIDE_PURE needed for setting the trampoline.

fffrog · 2025-07-29T01:54:14Z

would you describe what is the life cycle issue? I thought the reason of why we need the header is because that is how to get definitions of PYBIND11_OVERRIDE_PURE needed for setting the trampoline.

The definition of PYBIND11_OVERRIDE_PURE or PYBIND11_OVERRIDE is in the pybind11/pybind11 file, you can search for it in the PyTorch codebase.

Also, it seems to me that py::trampoline_self_life_support and py::smart_holder were introduced in v3.0 to solve the problem of variable lifetime. The user will inherit torch._C.PrivateUse1Hooks and register the custom class instance into C++ through register_python_privateuseone_hook, and other modules in C++ will use this C++ class instance throughout the entire process, so we need to always keep it active and cannot release it for any reason, including Python variables being released because they are out of scope, otherwise core dump may occur due to dangling pointers.

qihqi · 2025-09-30T03:19:51Z

@pytorchbot merge

pytorchmergebot · 2025-09-30T03:23:01Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

jeanschmidt · 2025-09-30T13:22:53Z

@pytorchbot revert -m "introduce linting errors https://github.com/pytorch/pytorch/actions/runs/18123993236/job/51574878473" -c nosignal

pytorchmergebot · 2025-09-30T13:24:26Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

#157859)" This reverts commit 1310d6a. Reverted #157859 on behalf of https://github.com/jeanschmidt due to introduce linting errors ([comment](#157859 (comment)))

pytorchmergebot · 2025-09-30T13:24:41Z

@qihqi your PR has been successfully reverted.

qihqi · 2025-09-30T16:37:55Z

Hi @jeanschmidt would you point out what linting error you were seeing? I am looking at https://hud.pytorch.org/pr/157859 and couldnt find any

jeanschmidt · 2025-09-30T18:13:50Z

Relevant to point out, the issue introduced seems to come from the job lintrunner-noclang / linux-job

>>> Lint for torch/_C/__init__.pyi:

  Warning (RUFF) W292
    No newline at end of file.
    See [https://beta.ruff.rs/docs/rules/.](https://beta.ruff.rs/docs/rules/)
    
    To disable, use `  # noqa: W292`

        12933  |        arg_types: str,
        12934  |        args: tuple[Any, ...],
        12935  |        stream: _int,
    >>> 12936  |    ) -> None: ...
  Warning (RUFF) format
    Run `lintrunner -a` to apply this patch.

    You can run `lintrunner -a` to apply this patch.

    12933  12933 |         arg_types: str,
    12934  12934 |         args: tuple[Any, ...],
    12935  12935 |         stream: _int,
    12935        |-    ) -> None: ...
           12936 |+    ) -> None: ...

  Warning (PYFMT) format
    Run `lintrunner -a` to apply this patch.

    You can run `lintrunner -a` to apply this patch.

    12933  12933 |         arg_types: str,
    12934  12934 |         args: tuple[Any, ...],
    12935  12935 |         stream: _int,
    12935        |-    ) -> None: ...
           12936 |+    ) -> None: ...

+ echo ''

qihqi · 2025-10-01T16:51:35Z

@pytorchbot merge

pytorchmergebot · 2025-10-01T16:53:38Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

@bdhirsh

…ch#157859) Fixes pytorch#156052 and pytorch#156444. This PR setup the privateuseone key in Python to be used as a python backend for pytorch. Meaning that, after calling `setup_privateuseone_for_python_backend('npy')`, one can use a subclass to with that device to hold arbitrary python data as "device data" and use `torch.library` to register ops that takes that Tensor. Changes done in this PR: 1. Register an vanilla Device Guard: I extended NoOpDeviceGuard to have allow device index of 0 and to not raise errors when event related functions are accessed. If I don't do those, when calling backward I would get errors. (CPU backend uses NoOpDeviceGuard just fine, although there seems to be special treatment of CPU in the autograd engine. 2. Tensor subclass allows not having `__torch_dispatch__` if the device is not CUDA or CPU. The comment of the check suggests it was to avoid segfault when calling into ops that expects a storage. Here we have a different device so will not call into those ops. 3. python function that invokes the other incantations to setup the privateusekey backend. This took inspiration of https://github.com/bdhirsh/pytorch_open_registration_example and https://github.com/tinygrad/tinygrad/blob/master/extra/torch_backend/wrapped_tensor.cpp; great thanks to @bdhirsh and @geohot. Pull Request resolved: pytorch#157859 Approved by: https://github.com/albanD

albanD · 2026-02-05T17:10:04Z

FYI the test added here has been failing in periodic CI since it was added in october: https://hud.pytorch.org/failure?name=periodic%20%2F%20linux-jammy-cuda12.8-py3.10-gcc11-debug%20%2F%20test%20(default%2C%201%2C%207%2C%20linux.g6.4xlarge.experimental.nvidia.gpu%2C%20oncall%3Adebug-build)&jobName=undefined&failureCaptures=test%2Ftest_privateuseone_python_backend.py%3A%3APrivateUse1BackendTest%3A%3Atest_backend_simple

We should fix this! (not super urgent).

fffrog · 2026-02-06T17:44:59Z

We should fix this! (not super urgent).

So sorry for this, i will fix it

pytorchbot added the open source label Jul 8, 2025

qihqi marked this pull request as ready for review July 8, 2025 22:21

qihqi requested review from albanD and soulitzer as code owners July 8, 2025 22:21

jerryzh168 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 8, 2025

albanD reviewed Jul 9, 2025

View reviewed changes

qihqi force-pushed the privateuseone branch 2 times, most recently from 28225fa to 44ce110 Compare July 16, 2025 22:25

qihqi requested a review from a team as a code owner July 16, 2025 22:25

qihqi requested a review from albanD July 16, 2025 22:30

fffrog reviewed Jul 17, 2025

View reviewed changes

c10/core/impl/DeviceGuardImplInterface.h Outdated Show resolved Hide resolved

fffrog reviewed Jul 17, 2025

View reviewed changes

test/test_privateuseone_python_backend.py Show resolved Hide resolved

fffrog reviewed Jul 17, 2025

View reviewed changes

torch/csrc/Module.cpp Outdated Show resolved Hide resolved

fffrog reviewed Jul 17, 2025

View reviewed changes

torch/csrc/Module.cpp Outdated Show resolved Hide resolved

qihqi force-pushed the privateuseone branch from 0deed7f to e1fe1b5 Compare July 27, 2025 03:39

pytorch-bot bot added the release notes: python_frontend python frontend release notes category label Jul 27, 2025

qihqi requested a review from fffrog July 27, 2025 03:43

fffrog reviewed Jul 28, 2025

View reviewed changes

torch/csrc/python_custom_backend/Module.cpp Outdated Show resolved Hide resolved

fffrog reviewed Jul 28, 2025

View reviewed changes

torch/csrc/python_custom_backend/Module.cpp Outdated Show resolved Hide resolved

qihqi requested review from fffrog July 28, 2025 17:27

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 30, 2025

pytorchmergebot added the merging label Sep 30, 2025

pytorchmergebot added the Merged label Sep 30, 2025

pytorchmergebot closed this in 1310d6a Sep 30, 2025

pytorchmergebot removed the merging label Sep 30, 2025

jeanschmidt mentioned this pull request Sep 30, 2025

[DO NOT CLOSE] Autorevert actions shadow mode stream #163650

Open

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Sep 30, 2025

pytorchmergebot reopened this Sep 30, 2025

rechecout

7464a8e

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Sep 30, 2025

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 1, 2025

pytorchmergebot added the merging label Oct 1, 2025

pytorchmergebot closed this in b5c4f46 Oct 1, 2025

pytorchmergebot removed the merging label Oct 1, 2025

Conversation

qihqi commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157859

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fffrog commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qihqi commented Jul 9, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fffrog commented Jul 17, 2025

Uh oh!

qihqi commented Jul 27, 2025

Uh oh!

qihqi commented Jul 27, 2025

Uh oh!

fffrog commented Jul 28, 2025

Uh oh!

Uh oh!

Uh oh!

qihqi commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fffrog commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qihqi commented Sep 30, 2025

Uh oh!

pytorchmergebot commented Sep 30, 2025

Merge started

Uh oh!

jeanschmidt commented Sep 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorchmergebot commented Sep 30, 2025

Uh oh!

pytorchmergebot commented Sep 30, 2025

Uh oh!

qihqi commented Sep 30, 2025

Uh oh!

jeanschmidt commented Sep 30, 2025

Uh oh!

qihqi commented Oct 1, 2025

Uh oh!

pytorchmergebot commented Oct 1, 2025

Merge started

Uh oh!

albanD commented Feb 5, 2026

Uh oh!

fffrog commented Feb 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

qihqi commented Jul 8, 2025 •

edited

Loading

pytorch-bot bot commented Jul 8, 2025 •

edited

Loading

fffrog commented Jul 9, 2025 •

edited

Loading

qihqi commented Jul 28, 2025 •

edited

Loading

fffrog commented Jul 29, 2025 •

edited

Loading

jeanschmidt commented Sep 30, 2025 •

edited

Loading