Skip to content

Add functions to setup PrivateUse1 as a python backend device.#157859

Closed
qihqi wants to merge 5 commits intopytorch:mainfrom
qihqi:privateuseone
Closed

Add functions to setup PrivateUse1 as a python backend device.#157859
qihqi wants to merge 5 commits intopytorch:mainfrom
qihqi:privateuseone

Conversation

@qihqi
Copy link
Contributor

@qihqi qihqi commented Jul 8, 2025

Fixes #156052 and #156444.

This PR setup the privateuseone key in Python to be used as a python backend for pytorch.
Meaning that, after calling setup_privateuseone_for_python_backend('npy'), one can use a subclass to with that device to hold arbitrary python data as "device data" and use torch.library to register ops that takes that Tensor.

Changes done in this PR:

  1. Register an vanilla Device Guard: I extended NoOpDeviceGuard to have allow device index of 0 and to not raise errors when event related functions are accessed. If I don't do those, when calling backward I would get errors. (CPU backend uses NoOpDeviceGuard just fine, although there seems to be special treatment of CPU in the autograd engine.
  2. Tensor subclass allows not having __torch_dispatch__ if the device is not CUDA or CPU. The comment of the check suggests it was to avoid segfault when calling into ops that expects a storage. Here we have a different device so will not call into those ops.
  3. python function that invokes the other incantations to setup the privateusekey backend.

This took inspiration of https://github.com/bdhirsh/pytorch_open_registration_example and https://github.com/tinygrad/tinygrad/blob/master/extra/torch_backend/wrapped_tensor.cpp; great thanks to @bdhirsh and @geohot.

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 8, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/157859

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 7464a8e with merge base c58e096 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@qihqi qihqi marked this pull request as ready for review July 8, 2025 22:21
@qihqi qihqi requested review from albanD and soulitzer as code owners July 8, 2025 22:21
@jerryzh168 jerryzh168 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 8, 2025
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into this !

@fffrog
Copy link
Collaborator

fffrog commented Jul 9, 2025

Hmm... Specializing PrivateUse1HooksInterface and DeviceGuardImplInterface for PrivateUse1 is not a good idea directly . If you really want to implement this functionality based on pure Python, why not consider exposing these two classes to the Python side, and then let users inherit and implement the specific details of these two classes on the Python side? In addition, can __torch_function__ and __torch_dispatch__ basically achieve what you want?

@qihqi
Copy link
Contributor Author

qihqi commented Jul 9, 2025

Hmm... Specializing PrivateUse1HooksInterface and DeviceGuardImplInterface for PrivateUse1 is not a good idea directly . If you really want to implement this functionality based on pure Python, why not consider exposing these two classes to the Python side, and then let users inherit and implement the specific details of these two classes on the Python side?

This is indeed a better idea. I'll try to do that. Initially I picked the least intrusive thing (everything optional, and do as little as it can).

In addition, can __torch_function__ and __torch_dispatch__ basically achieve what you want?

Indeed it does basically (https://dev-discuss.pytorch.org/t/embrace-tensor-subclass-as-a-python-device-registration-api/2771); and it is what I am using right now. There are places where it interacts poorly with torch, including the 2 issues that this PR fixes; as well as AMP integration.

@qihqi qihqi force-pushed the privateuseone branch 2 times, most recently from 28225fa to 44ce110 Compare July 16, 2025 22:25
@qihqi qihqi requested a review from a team as a code owner July 16, 2025 22:25
@qihqi qihqi requested a review from albanD July 16, 2025 22:30
@fffrog
Copy link
Collaborator

fffrog commented Jul 17, 2025

I have an idea. Apology if there are any problems.

Can we directly export PrivateUse1HooksInterface and DeviceGuardImplInterface to Python through pybind11 and trampoline class (the code can be put in torch/csrc/Modules.cpp, because this is the logic of binding between Python and C++), and then provide an API to register instances of the above two classes from the Python side;

In this way, users can integrate the classes on the Python side to implement the corresponding interfaces, and can also easily know which interfaces need to be implemented; whats more, there is no need to store PythonPrivateUse1DeviceGuardandPythonPrivateUse1HooksInterface` in PyTorch.

Of course, it is also possible to forward all operations from C++ back to Python like the current PR implementation, but we'd better provide a base class consistent with C++ for users to inherit, so that users can easily understand which interfaces may need to be implemented.

@qihqi
Copy link
Contributor Author

qihqi commented Jul 27, 2025

@pytorchbot label "release notes: python_frontend"

@pytorch-bot pytorch-bot bot added the release notes: python_frontend python frontend release notes category label Jul 27, 2025
@qihqi qihqi requested a review from fffrog July 27, 2025 03:43
@qihqi
Copy link
Contributor Author

qihqi commented Jul 27, 2025

Hi @fffrog @albanD :

Made few changes as per discussion:

  1. Made the Hook and Device class C++ classes exposed to python with trampoline as suggested. To make this work, I followed instructions on https://pybind11.readthedocs.io/en/stable/advanced/classes.html; and, had to update the version of pybind11 in third_party/ because the old version did not have the header <pybind11/trampoline_self_life_support.h>. LMK if this is OK or not.
  2. Moved changes to a new directory torch/csrc/pytorch_custom_backend; as based on
    with Python. This is in contrast to lib, which contains the Torch
    it seems more desired to have python binding stuff there than C10.
  3. Changes in build_variables.bzl to make the file builds.

Let me know how it looks, also please start CI to see if any tests fails; I don't have the permission to do that.

Thanks!

@fffrog
Copy link
Collaborator

fffrog commented Jul 28, 2025

Thank you.

Made the Hook and Device class C++ classes exposed to python with trampoline as suggested. To make this work, I followed instructions on https://pybind11.readthedocs.io/en/stable/advanced/classes.html; and, had to update the version of pybind11 in third_party/ because the old version did not have the header <pybind11/trampoline_self_life_support.h>. LMK if this is OK or not.

The upgrade of pybind11 needs to be done very carefully, so we need to find other ways to solve this problem.

Back to the problem itself, we do need to pay attention to the variable life cycle in C++ and Python. In my opinion, using the holder_type of shared_ptr should be a feasible method.

@qihqi qihqi requested review from fffrog July 28, 2025 17:27
@qihqi
Copy link
Contributor Author

qihqi commented Jul 28, 2025

The upgrade of pybind11 needs to be done very carefully,

Indeed.

so we need to find other ways to solve this problem.

I can also make a separate PR to just update pybind11 just to be careful. would you think that is a good idea?

Back to the problem itself, we do need to pay attention to the variable life cycle in C++ and Python. In my opinion, using the holder_type of shared_ptr should be a feasible method.

would you describe what is the life cycle issue? I thought the reason of why we need the header is because that is how to get definitions of PYBIND11_OVERRIDE_PURE needed for setting the trampoline.

@fffrog
Copy link
Collaborator

fffrog commented Jul 29, 2025

would you describe what is the life cycle issue? I thought the reason of why we need the header is because that is how to get definitions of PYBIND11_OVERRIDE_PURE needed for setting the trampoline.

The definition of PYBIND11_OVERRIDE_PURE or PYBIND11_OVERRIDE is in the pybind11/pybind11 file, you can search for it in the PyTorch codebase.

image

Also, it seems to me that py::trampoline_self_life_support and py::smart_holder were introduced in v3.0 to solve the problem of variable lifetime. The user will inherit torch._C.PrivateUse1Hooks and register the custom class instance into C++ through register_python_privateuseone_hook, and other modules in C++ will use this C++ class instance throughout the entire process, so we need to always keep it active and cannot release it for any reason, including Python variables being released because they are out of scope, otherwise core dump may occur due to dangling pointers.

@qihqi
Copy link
Contributor Author

qihqi commented Sep 30, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Sep 30, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@jeanschmidt
Copy link
Contributor

jeanschmidt commented Sep 30, 2025

@pytorchbot revert -m "introduce linting errors https://github.com/pytorch/pytorch/actions/runs/18123993236/job/51574878473" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

pytorchmergebot added a commit that referenced this pull request Sep 30, 2025
#157859)"

This reverts commit 1310d6a.

Reverted #157859 on behalf of https://github.com/jeanschmidt due to introduce linting errors ([comment](#157859 (comment)))
@pytorchmergebot
Copy link
Collaborator

@qihqi your PR has been successfully reverted.

@pytorchmergebot pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Sep 30, 2025
@qihqi
Copy link
Contributor Author

qihqi commented Sep 30, 2025

Hi @jeanschmidt would you point out what linting error you were seeing? I am looking at https://hud.pytorch.org/pr/157859 and couldnt find any
image

@jeanschmidt
Copy link
Contributor

Relevant to point out, the issue introduced seems to come from the job lintrunner-noclang / linux-job

>>> Lint for torch/_C/__init__.pyi:

  Warning (RUFF) W292
    No newline at end of file.
    See [https://beta.ruff.rs/docs/rules/.](https://beta.ruff.rs/docs/rules/)
    
    To disable, use `  # noqa: W292`

        12933  |        arg_types: str,
        12934  |        args: tuple[Any, ...],
        12935  |        stream: _int,
    >>> 12936  |    ) -> None: ...
  Warning (RUFF) format
    Run `lintrunner -a` to apply this patch.

    You can run `lintrunner -a` to apply this patch.

    12933  12933 |         arg_types: str,
    12934  12934 |         args: tuple[Any, ...],
    12935  12935 |         stream: _int,
    12935        |-    ) -> None: ...
           12936 |+    ) -> None: ...

  Warning (PYFMT) format
    Run `lintrunner -a` to apply this patch.

    You can run `lintrunner -a` to apply this patch.

    12933  12933 |         arg_types: str,
    12934  12934 |         args: tuple[Any, ...],
    12935  12935 |         stream: _int,
    12935        |-    ) -> None: ...
           12936 |+    ) -> None: ...

+ echo ''

@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Sep 30, 2025
@qihqi
Copy link
Contributor Author

qihqi commented Oct 1, 2025

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 1, 2025
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Chao1Han pushed a commit to Chao1Han/pytorch that referenced this pull request Oct 21, 2025
…ch#157859)

Fixes pytorch#156052 and pytorch#156444.

This PR setup the privateuseone key in Python to be used as a python backend for pytorch.
Meaning that, after calling `setup_privateuseone_for_python_backend('npy')`, one can use a subclass to with that device to hold arbitrary python data as "device data" and use `torch.library` to register ops that takes that Tensor.

Changes done in this PR:

1. Register an vanilla Device Guard: I extended NoOpDeviceGuard to have allow device index of 0 and to not raise errors when event related functions are accessed. If I don't do those, when calling backward I would get errors. (CPU backend uses NoOpDeviceGuard just fine, although there seems to be special treatment of CPU in the autograd engine.
2. Tensor subclass allows not having `__torch_dispatch__` if the device is not CUDA or CPU. The comment of the check suggests it was to avoid segfault when calling into ops that expects a storage. Here we have a different device so will not call into those ops.
3. python function that invokes the other incantations to setup the privateusekey backend.

This took inspiration of https://github.com/bdhirsh/pytorch_open_registration_example and https://github.com/tinygrad/tinygrad/blob/master/extra/torch_backend/wrapped_tensor.cpp; great thanks to @bdhirsh and @geohot.
Pull Request resolved: pytorch#157859
Approved by: https://github.com/albanD
@albanD
Copy link
Collaborator

albanD commented Feb 5, 2026

@fffrog
Copy link
Collaborator

fffrog commented Feb 6, 2026

We should fix this! (not super urgent).

So sorry for this, i will fix it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci-no-td Do not run TD on this PR ciflow/trunk Trigger trunk jobs on your pull request Merged open source release notes: python_frontend python frontend release notes category Reverted triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ability to set device guard in Python

8 participants