[DLPack] C Functions for DLPack Speed Exchange and Stream Handling#165483
[DLPack] C Functions for DLPack Speed Exchange and Stream Handling#165483Kathryn-cat wants to merge 21 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165483
Note: Links to docs will display an error until the docs builds have been completed. ✅ You can merge normally! (1 Unrelated Failure)As of commit 286f075 with merge base 93fef4b ( FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@pytorchbot label "topic: not user facing" |
|
@pytorchbot label "module: dlpack" |
|
/gemini review |
|
@eqy would you like to help me trigger CI? |
|
@eqy I addressed some comments, and would like to rerun the CI |
|
@eqy would you like to help me trigger CI again? |
|
@Kathryn-cat your PR is try to do too many things at once. Do you mind separating functional change from minor improvements (i.e. if you'll submit separate PR with those improvements I'll be happy to review it right away) |
|
@eqy would you like to re-run CI? |
aff98b2 to
4f6fe45
Compare
|
Hi, i am trying to use this PR along with flashinfer and tvm-ffi. With PR, i started to get error by just doing Things seem to work run with |
|
@Kathryn-cat In the spirit of not having you spending a lot of time polishing this PR if it's not going to be landed, I think it would be best to take a step back here and discuss on the issue before talking about the specifics of the implementation here. Sorry for not posting this here earlier, but I'm sure you know that we have a lot of PRs and I'm a bit swamped these days. |
|
Successfully rebased |
c10de5e to
286f075
Compare
albanD
left a comment
There was a problem hiding this comment.
Small question on typing but otherwise SGTM
Will let you decide if it's ok to merge the PyCapsule even though it's technically under discussion on the dlpack side. No blockers from my side to merge this one at least.
| def _torchDeviceToDLDevice( | ||
| device: torch.device, | ||
| ) -> tuple[_int, _int]: ... # THPModule_torchDeviceToDLDevice | ||
| def _dlpack_exchange_api() -> object: ... # THPModule_DLPackExchangeAPI |
There was a problem hiding this comment.
I guess PyCapsule is not a type we can use here?
There was a problem hiding this comment.
seems there is no way to type PyCapsule on python side
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
…ytorch#165483) ## Addressed Issue Issue pytorch#162845 ## Summary of Changes This PR introduces a unified `DLPackExchangeAPI` struct as described in proposal [175](dmlc/dlpack#175). This new convention replaces the previous mechanism of separate function pointers, and aligns with the latest DLPack standard as shown in PR [174](dmlc/dlpack#174). Specifically, the new `DLPackExchangeAPI` struct is exposed as `torch.Tensor.__c_dlpack_exchange_api__`, which stores and exposes the following function pointers: * `managed_tensor_allocator` * `managed_tensor_from_py_object_no_sync` * `managed_tensor_to_py_object_no_sync` * `dltensor_from_py_object_no_sync` * `current_work_stream` Within the new `DLPackExchangeAPI` struct, the new `current_work_stream` function pointer allows more robust and integrated querying of the current device stream (e.g., CUDA stream) during DLPack tensor exchanges. All the conversion from/to DLPack has been updated to `_no_sync`, meaning you should use `current_work_stream` to explicitly handle stream synchronization. It also includes a non-owning DLTensor conversion `dltensor_from_py_object_no_sync` to avoid unnecessary reference counting. Following this change, the `dlpack.h` has been updated to the latest DLPack. Unit tests are added using `torch.utils.cpp_extension.load_inline` to avoid GIL release issues when calling `THPVariable_Wrap`. Pull Request resolved: pytorch#165483 Approved by: https://github.com/tqchen, https://github.com/albanD
…165483) ## Addressed Issue Issue #162845 ## Summary of Changes This PR introduces a unified `DLPackExchangeAPI` struct as described in proposal [175](dmlc/dlpack#175). This new convention replaces the previous mechanism of separate function pointers, and aligns with the latest DLPack standard as shown in PR [174](dmlc/dlpack#174). Specifically, the new `DLPackExchangeAPI` struct is exposed as `torch.Tensor.__c_dlpack_exchange_api__`, which stores and exposes the following function pointers: * `managed_tensor_allocator` * `managed_tensor_from_py_object_no_sync` * `managed_tensor_to_py_object_no_sync` * `dltensor_from_py_object_no_sync` * `current_work_stream` Within the new `DLPackExchangeAPI` struct, the new `current_work_stream` function pointer allows more robust and integrated querying of the current device stream (e.g., CUDA stream) during DLPack tensor exchanges. All the conversion from/to DLPack has been updated to `_no_sync`, meaning you should use `current_work_stream` to explicitly handle stream synchronization. It also includes a non-owning DLTensor conversion `dltensor_from_py_object_no_sync` to avoid unnecessary reference counting. Following this change, the `dlpack.h` has been updated to the latest DLPack. Unit tests are added using `torch.utils.cpp_extension.load_inline` to avoid GIL release issues when calling `THPVariable_Wrap`. Pull Request resolved: #165483 Approved by: https://github.com/tqchen, https://github.com/albanD
Addressed Issue
Issue #162845
Summary of Changes
This PR introduces a unified
DLPackExchangeAPIstruct as described in proposal 175. This new convention replaces the previous mechanism of separate function pointers, and aligns with the latest DLPack standard as shown in PR 174.Specifically, the new
DLPackExchangeAPIstruct is exposed astorch.Tensor.__c_dlpack_exchange_api__, which stores and exposes the following function pointers:managed_tensor_allocatormanaged_tensor_from_py_object_no_syncmanaged_tensor_to_py_object_no_syncdltensor_from_py_object_no_synccurrent_work_streamWithin the new
DLPackExchangeAPIstruct, the newcurrent_work_streamfunction pointer allows more robust and integrated querying of the current device stream (e.g., CUDA stream) during DLPack tensor exchanges. All the conversion from/to DLPack has been updated to_no_sync, meaning you should usecurrent_work_streamto explicitly handle stream synchronization. It also includes a non-owning DLTensor conversiondltensor_from_py_object_no_syncto avoid unnecessary reference counting.Following this change, the
dlpack.hhas been updated to the latest DLPack.Unit tests are added using
torch.utils.cpp_extension.load_inlineto avoid GIL release issueswhen calling
THPVariable_Wrap.