[DLPack] C Functions for DLPack Speed Exchange and Stream Handling by Kathryn-cat · Pull Request #165483 · pytorch/pytorch

Kathryn-cat · 2025-10-14T22:27:03Z

Addressed Issue

Summary of Changes

This PR introduces a unified DLPackExchangeAPI struct as described in proposal 175. This new convention replaces the previous mechanism of separate function pointers, and aligns with the latest DLPack standard as shown in PR 174.

Specifically, the new DLPackExchangeAPI struct is exposed as torch.Tensor.__c_dlpack_exchange_api__, which stores and exposes the following function pointers:

managed_tensor_allocator
managed_tensor_from_py_object_no_sync
managed_tensor_to_py_object_no_sync
dltensor_from_py_object_no_sync
current_work_stream

Within the new DLPackExchangeAPI struct, the new current_work_stream function pointer allows more robust and integrated querying of the current device stream (e.g., CUDA stream) during DLPack tensor exchanges. All the conversion from/to DLPack has been updated to _no_sync, meaning you should use current_work_stream to explicitly handle stream synchronization. It also includes a non-owning DLTensor conversion dltensor_from_py_object_no_sync to avoid unnecessary reference counting.

Following this change, the dlpack.h has been updated to the latest DLPack.

Unit tests are added using torch.utils.cpp_extension.load_inline to avoid GIL release issues
when calling THPVariable_Wrap.

pytorch-bot · 2025-10-14T22:27:07Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165483

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 286f075 with merge base 93fef4b ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / linux-jammy-rocm-py3.10 / test (default, 1, 6, linux.rocm.gpu.gfx942.1) (gh) (disabled by #163689)
test/inductor/test_cuda_repro.py::CudaReproTests::test_qwen2_7b_sdpa_input_alignment_requires_recompile

This comment was automatically generated by Dr. CI and updates every 15 minutes.

Kathryn-cat · 2025-10-15T01:17:37Z

@pytorchbot label "topic: not user facing"

Kathryn-cat · 2025-10-15T01:17:44Z

@pytorchbot label "module: dlpack"

Kathryn-cat · 2025-10-15T02:10:59Z

/gemini review

Kathryn-cat · 2025-10-15T02:14:23Z

@eqy would you like to help me trigger CI?

aten/src/ATen/DLConvertor.cpp

aten/src/ATen/dlpack.h

test/test_dlpack.py

Kathryn-cat · 2025-10-15T02:56:08Z

@eqy I addressed some comments, and would like to rerun the CI

test/test_dlpack.py

Kathryn-cat · 2025-10-15T20:16:19Z

@eqy would you like to help me trigger CI again?

Kathryn-cat · 2025-10-17T01:51:25Z

@malfet @ngimel Would you like to take a look at this PR?

malfet · 2025-10-17T15:29:39Z

@Kathryn-cat your PR is try to do too many things at once. Do you mind separating functional change from minor improvements (i.e. if you'll submit separate PR with those improvements I'll be happy to review it right away)

aten/src/ATen/DLConvertor.cpp

Kathryn-cat · 2025-10-27T18:35:27Z

@eqy would you like to re-run CI?

Kathryn-cat · 2025-10-27T19:00:06Z

@malfet I separated out a PR with minor changes only: #166325.

Would you like to review it and get it in first?

Kathryn-cat · 2025-10-30T07:41:09Z

@malfet @ngimel @albanD @eqy Just wanted to add a gentle ping on this PR and see if it can be merged for downstream use. If needed, we can merge the slight changes first #166325.

mxz297 · 2025-10-30T16:46:55Z

Hi, i am trying to use this PR along with flashinfer and tvm-ffi. With PR, i started to get error by just doing import torch.distributed.tensor

  File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ad36f14394ff4d25/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/transformers/model_debugging_utils.py", line 35, in <module>
    import torch.distributed.tensor
  File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ad36f14394ff4d25/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/torch/distributed/tensor/__init__.py", line 4, in <module>
    import torch.distributed.tensor._ops  # force import all built-in dtensor ops
  File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ad36f14394ff4d25/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/torch/distributed/tensor/_ops/__init__.py", line 2, in <module>
    from ._conv_ops import *  # noqa: F403
  File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ad36f14394ff4d25/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/torch/distributed/tensor/_ops/_conv_ops.py", line 5, in <module>
    from torch.distributed.tensor._dtensor_spec import DTensorSpec, TensorMeta
  File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ad36f14394ff4d25/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/torch/distributed/tensor/_dtensor_spec.py", line 8, in <module>
    from torch.distributed.tensor.placement_types import (
  File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ad36f14394ff4d25/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/torch/distributed/tensor/placement_types.py", line 9, in <module>
    from torch._C._distributed import Placement
ModuleNotFoundError: No module named 'torch._C._distributed'; 'torch._C' is not a package

Things seem to work run with import torch. Can you check if you repro this error on your side by just trying import torch.distributed.tensor?

albanD · 2025-10-30T18:43:44Z

@Kathryn-cat In the spirit of not having you spending a lot of time polishing this PR if it's not going to be landed, I think it would be best to take a step back here and discuss on the issue before talking about the specifics of the implementation here.
In particular, as you can see in the https://github.com/pytorch/pytorch/wiki/The-Ultimate-Guide-to-PyTorch-Contributions we recommend waiting for an issue to be discussed and marked actionable before working on a PR for it. To make sure we all agree on the design and we have someone who will be able to properly review the PR without it sitting idle for a long time.

Sorry for not posting this here earlier, but I'm sure you know that we have a lot of PRs and I'm a bit swamped these days.

pytorchmergebot · 2025-12-01T15:21:06Z

Successfully rebased kathy/upstream-dlpack-exchange-api onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout kathy/upstream-dlpack-exchange-api && git pull --rebase)

albanD

Small question on typing but otherwise SGTM
Will let you decide if it's ok to merge the PyCapsule even though it's technically under discussion on the dlpack side. No blockers from my side to merge this one at least.

albanD · 2025-12-04T19:46:45Z

torch/_C/__init__.pyi.in

 def _torchDeviceToDLDevice(
    device: torch.device,
 ) -> tuple[_int, _int]: ...  # THPModule_torchDeviceToDLDevice
+def _dlpack_exchange_api() -> object: ...  # THPModule_DLPackExchangeAPI


I guess PyCapsule is not a type we can use here?

seems there is no way to type PyCapsule on python side

albanD · 2025-12-04T22:45:08Z

@pytorchbot merge

pytorchmergebot · 2025-12-04T22:47:04Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ytorch#165483) ## Addressed Issue Issue pytorch#162845 ## Summary of Changes This PR introduces a unified `DLPackExchangeAPI` struct as described in proposal [175](dmlc/dlpack#175). This new convention replaces the previous mechanism of separate function pointers, and aligns with the latest DLPack standard as shown in PR [174](dmlc/dlpack#174). Specifically, the new `DLPackExchangeAPI` struct is exposed as `torch.Tensor.__c_dlpack_exchange_api__`, which stores and exposes the following function pointers: * `managed_tensor_allocator` * `managed_tensor_from_py_object_no_sync` * `managed_tensor_to_py_object_no_sync` * `dltensor_from_py_object_no_sync` * `current_work_stream` Within the new `DLPackExchangeAPI` struct, the new `current_work_stream` function pointer allows more robust and integrated querying of the current device stream (e.g., CUDA stream) during DLPack tensor exchanges. All the conversion from/to DLPack has been updated to `_no_sync`, meaning you should use `current_work_stream` to explicitly handle stream synchronization. It also includes a non-owning DLTensor conversion `dltensor_from_py_object_no_sync` to avoid unnecessary reference counting. Following this change, the `dlpack.h` has been updated to the latest DLPack. Unit tests are added using `torch.utils.cpp_extension.load_inline` to avoid GIL release issues when calling `THPVariable_Wrap`. Pull Request resolved: pytorch#165483 Approved by: https://github.com/tqchen, https://github.com/albanD

…165483) ## Addressed Issue Issue #162845 ## Summary of Changes This PR introduces a unified `DLPackExchangeAPI` struct as described in proposal [175](dmlc/dlpack#175). This new convention replaces the previous mechanism of separate function pointers, and aligns with the latest DLPack standard as shown in PR [174](dmlc/dlpack#174). Specifically, the new `DLPackExchangeAPI` struct is exposed as `torch.Tensor.__c_dlpack_exchange_api__`, which stores and exposes the following function pointers: * `managed_tensor_allocator` * `managed_tensor_from_py_object_no_sync` * `managed_tensor_to_py_object_no_sync` * `dltensor_from_py_object_no_sync` * `current_work_stream` Within the new `DLPackExchangeAPI` struct, the new `current_work_stream` function pointer allows more robust and integrated querying of the current device stream (e.g., CUDA stream) during DLPack tensor exchanges. All the conversion from/to DLPack has been updated to `_no_sync`, meaning you should use `current_work_stream` to explicitly handle stream synchronization. It also includes a non-owning DLTensor conversion `dltensor_from_py_object_no_sync` to avoid unnecessary reference counting. Following this change, the `dlpack.h` has been updated to the latest DLPack. Unit tests are added using `torch.utils.cpp_extension.load_inline` to avoid GIL release issues when calling `THPVariable_Wrap`. Pull Request resolved: #165483 Approved by: https://github.com/tqchen, https://github.com/albanD

pytorchbot added the open source label Oct 14, 2025

Kathryn-cat marked this pull request as ready for review October 15, 2025 00:57

Kathryn-cat changed the title ~~wip: dlpack exchange cpi~~ [DLPack] C Functions for DLPack Speed Exchange and Stream Handling Oct 15, 2025

pytorch-bot bot added topic: not user facing topic category module: dlpack labels Oct 15, 2025

tqchen reviewed Oct 15, 2025

View reviewed changes

aten/src/ATen/DLConvertor.cpp Show resolved Hide resolved

aten/src/ATen/DLConvertor.cpp Outdated Show resolved Hide resolved

aten/src/ATen/dlpack.h Outdated Show resolved Hide resolved

test/test_dlpack.py Outdated Show resolved Hide resolved

tqchen reviewed Oct 15, 2025

View reviewed changes

test/test_dlpack.py Outdated Show resolved Hide resolved

eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 15, 2025

eqy requested review from malfet and ngimel October 15, 2025 23:06

Kathryn-cat mentioned this pull request Oct 15, 2025

[RFC] Bring up DLPack C Functions for Speedup and Streamline Exchange #162845

Closed

mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 17, 2025

malfet reviewed Oct 17, 2025

View reviewed changes

aten/src/ATen/DLConvertor.cpp Outdated Show resolved Hide resolved

pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Oct 27, 2025

eqy requested a review from albanD October 27, 2025 18:30

Kathryn-cat force-pushed the kathy/upstream-dlpack-exchange-api branch from aff98b2 to 4f6fe45 Compare October 27, 2025 20:37

Kathryn-cat added 15 commits December 1, 2025 15:21

upd

d3e4b45

upd

5545d42

tests

e3707f5

upd

0c8b7cd

upd

fafee28

upd

2a67c15

upd

ca6dc5f

address comment

c02a300

address lint

f781d34

remove ATen/cuda/CUDAStream.h dependency

18437a4

update naming

9203b09

resolve lint issue

f4c23bd

update error type

d129807

pycapsule

d68f978

test_stream_exchange

286f075

pytorchmergebot force-pushed the kathy/upstream-dlpack-exchange-api branch from c10de5e to 286f075 Compare December 1, 2025 15:21

eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 1, 2025

albanD approved these changes Dec 4, 2025

View reviewed changes

pytorchmergebot added the merging label Dec 4, 2025

pytorchmergebot added the Merged label Dec 4, 2025

pytorchmergebot closed this in 8da5d29 Dec 4, 2025

pytorchmergebot removed the merging label Dec 4, 2025

This was referenced Dec 5, 2025

[VOTE] Release Apache TVM FFI v0.1.5-rc3 apache/tvm-ffi#314

Closed

[VOTE] Release Apache TVM FFI v0.1.5-rc4 apache/tvm-ffi#316

Closed

Conversation

Kathryn-cat commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Addressed Issue

Summary of Changes

Uh oh!

pytorch-bot bot commented Oct 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165483

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Kathryn-cat commented Oct 15, 2025

Uh oh!

Kathryn-cat commented Oct 15, 2025

Uh oh!

Kathryn-cat commented Oct 15, 2025

Uh oh!

Kathryn-cat commented Oct 15, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Kathryn-cat commented Oct 15, 2025

Uh oh!

Uh oh!

Kathryn-cat commented Oct 15, 2025

Uh oh!

Kathryn-cat commented Oct 17, 2025

Uh oh!

malfet commented Oct 17, 2025

Uh oh!

Uh oh!

Kathryn-cat commented Oct 27, 2025

Uh oh!

Kathryn-cat commented Oct 27, 2025

Uh oh!

Kathryn-cat commented Oct 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mxz297 commented Oct 30, 2025

Uh oh!

albanD commented Oct 30, 2025

Uh oh!

pytorchmergebot commented Dec 1, 2025

Uh oh!

albanD left a comment

Choose a reason for hiding this comment

Uh oh!

albanD Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

tqchen Dec 4, 2025

Choose a reason for hiding this comment

Uh oh!

albanD commented Dec 4, 2025

Uh oh!

pytorchmergebot commented Dec 4, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

11 participants

Kathryn-cat commented Oct 14, 2025 •

edited

Loading

pytorch-bot bot commented Oct 14, 2025 •

edited

Loading

Kathryn-cat commented Oct 30, 2025 •

edited

Loading