Skip to content

[DLPack] C Functions for DLPack Speed Exchange and Stream Handling#165483

Closed
Kathryn-cat wants to merge 21 commits intopytorch:mainfrom
Kathryn-cat:kathy/upstream-dlpack-exchange-api
Closed

[DLPack] C Functions for DLPack Speed Exchange and Stream Handling#165483
Kathryn-cat wants to merge 21 commits intopytorch:mainfrom
Kathryn-cat:kathy/upstream-dlpack-exchange-api

Conversation

@Kathryn-cat
Copy link
Contributor

@Kathryn-cat Kathryn-cat commented Oct 14, 2025

Addressed Issue

Issue #162845

Summary of Changes

This PR introduces a unified DLPackExchangeAPI struct as described in proposal 175. This new convention replaces the previous mechanism of separate function pointers, and aligns with the latest DLPack standard as shown in PR 174.

Specifically, the new DLPackExchangeAPI struct is exposed as torch.Tensor.__c_dlpack_exchange_api__, which stores and exposes the following function pointers:

  • managed_tensor_allocator
  • managed_tensor_from_py_object_no_sync
  • managed_tensor_to_py_object_no_sync
  • dltensor_from_py_object_no_sync
  • current_work_stream

Within the new DLPackExchangeAPI struct, the new current_work_stream function pointer allows more robust and integrated querying of the current device stream (e.g., CUDA stream) during DLPack tensor exchanges. All the conversion from/to DLPack has been updated to _no_sync, meaning you should use current_work_stream to explicitly handle stream synchronization. It also includes a non-owning DLTensor conversion dltensor_from_py_object_no_sync to avoid unnecessary reference counting.

Following this change, the dlpack.h has been updated to the latest DLPack.

Unit tests are added using torch.utils.cpp_extension.load_inline to avoid GIL release issues
when calling THPVariable_Wrap.

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 14, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165483

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 286f075 with merge base 93fef4b (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@Kathryn-cat Kathryn-cat marked this pull request as ready for review October 15, 2025 00:57
@Kathryn-cat Kathryn-cat changed the title wip: dlpack exchange cpi [DLPack] C Functions for DLPack Speed Exchange and Stream Handling Oct 15, 2025
@Kathryn-cat
Copy link
Contributor Author

@pytorchbot label "topic: not user facing"

@Kathryn-cat
Copy link
Contributor Author

@pytorchbot label "module: dlpack"

@Kathryn-cat
Copy link
Contributor Author

/gemini review

@Kathryn-cat
Copy link
Contributor Author

@eqy would you like to help me trigger CI?

@Kathryn-cat
Copy link
Contributor Author

@eqy I addressed some comments, and would like to rerun the CI

@Kathryn-cat
Copy link
Contributor Author

@eqy would you like to help me trigger CI again?

@eqy eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 15, 2025
@eqy eqy requested review from malfet and ngimel October 15, 2025 23:06
@Kathryn-cat
Copy link
Contributor Author

@malfet @ngimel Would you like to take a look at this PR?

@mikaylagawarecki mikaylagawarecki added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 17, 2025
@malfet
Copy link
Contributor

malfet commented Oct 17, 2025

@Kathryn-cat your PR is try to do too many things at once. Do you mind separating functional change from minor improvements (i.e. if you'll submit separate PR with those improvements I'll be happy to review it right away)

@pytorch-bot pytorch-bot bot removed the ciflow/trunk Trigger trunk jobs on your pull request label Oct 27, 2025
@eqy eqy requested a review from albanD October 27, 2025 18:30
@Kathryn-cat
Copy link
Contributor Author

@eqy would you like to re-run CI?

@Kathryn-cat
Copy link
Contributor Author

@malfet I separated out a PR with minor changes only: #166325.

Would you like to review it and get it in first?

@Kathryn-cat Kathryn-cat force-pushed the kathy/upstream-dlpack-exchange-api branch from aff98b2 to 4f6fe45 Compare October 27, 2025 20:37
@Kathryn-cat
Copy link
Contributor Author

Kathryn-cat commented Oct 30, 2025

@malfet @ngimel @albanD @eqy Just wanted to add a gentle ping on this PR and see if it can be merged for downstream use. If needed, we can merge the slight changes first #166325.

@mxz297
Copy link
Contributor

mxz297 commented Oct 30, 2025

Hi, i am trying to use this PR along with flashinfer and tvm-ffi. With PR, i started to get error by just doing import torch.distributed.tensor

  File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ad36f14394ff4d25/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/transformers/model_debugging_utils.py", line 35, in <module>
    import torch.distributed.tensor
  File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ad36f14394ff4d25/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/torch/distributed/tensor/__init__.py", line 4, in <module>
    import torch.distributed.tensor._ops  # force import all built-in dtensor ops
  File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ad36f14394ff4d25/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/torch/distributed/tensor/_ops/__init__.py", line 2, in <module>
    from ._conv_ops import *  # noqa: F403
  File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ad36f14394ff4d25/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/torch/distributed/tensor/_ops/_conv_ops.py", line 5, in <module>
    from torch.distributed.tensor._dtensor_spec import DTensorSpec, TensorMeta
  File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ad36f14394ff4d25/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/torch/distributed/tensor/_dtensor_spec.py", line 8, in <module>
    from torch.distributed.tensor.placement_types import (
  File "/data/users/mxz/fbsource/buck-out/v2/gen/fbcode/ad36f14394ff4d25/smart/inference_platform_sp/llm_predictor_gpu/__service__/service#link-tree/torch/distributed/tensor/placement_types.py", line 9, in <module>
    from torch._C._distributed import Placement
ModuleNotFoundError: No module named 'torch._C._distributed'; 'torch._C' is not a package

Things seem to work run with import torch. Can you check if you repro this error on your side by just trying import torch.distributed.tensor?

@albanD
Copy link
Collaborator

albanD commented Oct 30, 2025

@Kathryn-cat In the spirit of not having you spending a lot of time polishing this PR if it's not going to be landed, I think it would be best to take a step back here and discuss on the issue before talking about the specifics of the implementation here.
In particular, as you can see in the https://github.com/pytorch/pytorch/wiki/The-Ultimate-Guide-to-PyTorch-Contributions we recommend waiting for an issue to be discussed and marked actionable before working on a PR for it. To make sure we all agree on the design and we have someone who will be able to properly review the PR without it sitting idle for a long time.

Sorry for not posting this here earlier, but I'm sure you know that we have a lot of PRs and I'm a bit swamped these days.

@pytorchmergebot
Copy link
Collaborator

Successfully rebased kathy/upstream-dlpack-exchange-api onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout kathy/upstream-dlpack-exchange-api && git pull --rebase)

@pytorchmergebot pytorchmergebot force-pushed the kathy/upstream-dlpack-exchange-api branch from c10de5e to 286f075 Compare December 1, 2025 15:21
@eqy eqy added the ciflow/trunk Trigger trunk jobs on your pull request label Dec 1, 2025
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small question on typing but otherwise SGTM
Will let you decide if it's ok to merge the PyCapsule even though it's technically under discussion on the dlpack side. No blockers from my side to merge this one at least.

def _torchDeviceToDLDevice(
device: torch.device,
) -> tuple[_int, _int]: ... # THPModule_torchDeviceToDLDevice
def _dlpack_exchange_api() -> object: ... # THPModule_DLPackExchangeAPI
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess PyCapsule is not a type we can use here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems there is no way to type PyCapsule on python side

@albanD
Copy link
Collaborator

albanD commented Dec 4, 2025

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

umechand-amd pushed a commit to ROCm/pytorch that referenced this pull request Dec 8, 2025
…ytorch#165483)

## Addressed Issue

Issue pytorch#162845

## Summary of Changes

This PR introduces a unified `DLPackExchangeAPI` struct as described in proposal [175](dmlc/dlpack#175). This new convention replaces the previous mechanism of separate function pointers, and aligns with the latest DLPack standard as shown in PR [174](dmlc/dlpack#174).

Specifically, the new `DLPackExchangeAPI` struct is exposed as `torch.Tensor.__c_dlpack_exchange_api__`, which stores and exposes the following function pointers:

* `managed_tensor_allocator`
* `managed_tensor_from_py_object_no_sync`
* `managed_tensor_to_py_object_no_sync`
* `dltensor_from_py_object_no_sync`
* `current_work_stream`

Within the new `DLPackExchangeAPI` struct, the new `current_work_stream` function pointer allows more robust and integrated querying of the current device stream (e.g., CUDA stream) during DLPack tensor exchanges. All the conversion from/to DLPack has been updated to `_no_sync`, meaning you should use `current_work_stream` to explicitly handle stream synchronization. It also includes a non-owning DLTensor conversion `dltensor_from_py_object_no_sync` to avoid unnecessary reference counting.

Following this change, the `dlpack.h` has been updated to the latest DLPack.

Unit tests are added using `torch.utils.cpp_extension.load_inline` to avoid GIL release issues
when calling `THPVariable_Wrap`.
Pull Request resolved: pytorch#165483
Approved by: https://github.com/tqchen, https://github.com/albanD
JacobSzwejbka pushed a commit that referenced this pull request Dec 8, 2025
…165483)

## Addressed Issue

Issue #162845

## Summary of Changes

This PR introduces a unified `DLPackExchangeAPI` struct as described in proposal [175](dmlc/dlpack#175). This new convention replaces the previous mechanism of separate function pointers, and aligns with the latest DLPack standard as shown in PR [174](dmlc/dlpack#174).

Specifically, the new `DLPackExchangeAPI` struct is exposed as `torch.Tensor.__c_dlpack_exchange_api__`, which stores and exposes the following function pointers:

* `managed_tensor_allocator`
* `managed_tensor_from_py_object_no_sync`
* `managed_tensor_to_py_object_no_sync`
* `dltensor_from_py_object_no_sync`
* `current_work_stream`

Within the new `DLPackExchangeAPI` struct, the new `current_work_stream` function pointer allows more robust and integrated querying of the current device stream (e.g., CUDA stream) during DLPack tensor exchanges. All the conversion from/to DLPack has been updated to `_no_sync`, meaning you should use `current_work_stream` to explicitly handle stream synchronization. It also includes a non-owning DLTensor conversion `dltensor_from_py_object_no_sync` to avoid unnecessary reference counting.

Following this change, the `dlpack.h` has been updated to the latest DLPack.

Unit tests are added using `torch.utils.cpp_extension.load_inline` to avoid GIL release issues
when calling `THPVariable_Wrap`.
Pull Request resolved: #165483
Approved by: https://github.com/tqchen, https://github.com/albanD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged module: dlpack open source topic: not user facing topic category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.