Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept#126376
Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept#126376wizzniu wants to merge 13 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126376
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 Cancelled Job, 1 Unrelated FailureAs of commit c73e0a4 with merge base c2425a3 ( CANCELLED JOB - The following job was cancelled. Please retry:
FLAKY - The following job failed but was likely due to flakiness present on trunk:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
@albanD for you |
There was a problem hiding this comment.
FYI @jeffdaily this makes HIP an "accelerator". I'm still not sure if you use it but that's just enabling more device-generic feature for the HIP device so I guess you're happy with it. We can remove it if you are not!
There was a problem hiding this comment.
IIRC we don't use HIPHooksInterface but rather a hipified version of the CUDAHooksInterface. In any case, we're okay with being an Accelerator. Is there an RFC or something similar describing pytorch's move to these generic interfaces?
There was a problem hiding this comment.
The short version is described in https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/DeviceAccelerator.h
There was a problem hiding this comment.
FYI @egienvalue you can implement this if/when you need to support pinned host-side memory used for faster transfers to the device!
There was a problem hiding this comment.
That is nice. Right now we have a hacky way to pin CPU tensors.
|
@pytorchbot merge |
|
Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra. |
|
@albanD Have rebased. Can we merge it now? |
|
@albanD Are there any other problems? Maybe we can start CI to see if any problem exists. |
|
@pytorchbot rebase |
|
You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra. |
|
Hi, @albanD I have pushed new commit to handle the failed test case. Can we trigger ci now? It seems that I don't have permissions. |
There was a problem hiding this comment.
@kulinseth Seems that there is no need to add extra APIs for MPS currently. But I don't test mps locally. It still need ci test for mps to verify.
|
It's probably best if @albanD finishes off the review here |
Thanks! Seems that he is busy. |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
|
@pytorchbot merge -i |
Merge startedYour change will be merged while ignoring the following 1 checks: trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
|
@pytorchbot merge |
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
|
@pytorchmergebot merge -f "Unrelated failures" |
|
The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command |
Merge startedYour change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Regression introduced by #126376 Before this change, compiling torch_cpu on my MacBook prints tons of warnings every time HooksInterface is included ``` In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/src/optim/adamw.cpp:1: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/optim/adamw.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_module_holder.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_value.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/detail/static.h:4: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/types.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/ATen.h:7: In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/Context.h:13: /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/HIPHooksInterface.h:27:11: warning: '~HIPHooksInterface' overrides a destructor but is not marked 'override' [-Winconsistent-missing-destructor-override] virtual ~HIPHooksInterface() = default; ^ /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/AcceleratorHooksInterface.h:16:11: note: overridden virtual function is here virtual ~AcceleratorHooksInterface() = default; ^ 1 warning generated. ```
Regression introduced by #126376 Before this change, compiling torch_cpu on my MacBook prints tons of warnings every time HooksInterface is included ``` In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/src/optim/adamw.cpp:1: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/optim/adamw.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_module_holder.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_value.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/detail/static.h:4: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/types.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/ATen.h:7: In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/Context.h:13: /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/HIPHooksInterface.h:27:11: warning: '~HIPHooksInterface' overrides a destructor but is not marked 'override' [-Winconsistent-missing-destructor-override] virtual ~HIPHooksInterface() = default; ^ /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/AcceleratorHooksInterface.h:16:11: note: overridden virtual function is here virtual ~AcceleratorHooksInterface() = default; ^ 1 warning generated. ``` Pull Request resolved: #131204 Approved by: https://github.com/albanD, https://github.com/seemethere
Regression introduced by #126376 Before this change, compiling torch_cpu on my MacBook prints tons of warnings every time HooksInterface is included ``` In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/src/optim/adamw.cpp:1: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/optim/adamw.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_module_holder.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_value.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/detail/static.h:4: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/types.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/ATen.h:7: In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/Context.h:13: /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/HIPHooksInterface.h:27:11: warning: '~HIPHooksInterface' overrides a destructor but is not marked 'override' [-Winconsistent-missing-destructor-override] virtual ~HIPHooksInterface() = default; ^ /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/AcceleratorHooksInterface.h:16:11: note: overridden virtual function is here virtual ~AcceleratorHooksInterface() = default; ^ 1 warning generated. ``` Pull Request resolved: #131204 Approved by: https://github.com/albanD, https://github.com/seemethere
…erator concept (pytorch#126376) This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods. Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods. Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed. Relates pytorch#124908 Relates pytorch#14560 Pull Request resolved: pytorch#126376 Approved by: https://github.com/albanD
…he Accelerator concept (pytorch#126376)" This reverts commit c986aee. Reverted pytorch#126376 on behalf of https://github.com/atalman due to Failing internal builds ([comment](pytorch#126376 (comment)))
…erator concept (pytorch#126376) This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods. Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods. Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed. Relates pytorch#124908 Relates pytorch#14560 Pull Request resolved: pytorch#126376 Approved by: https://github.com/albanD
This PR re-implements pin memory aiming to get rid of the optional
deviceargument and makes all related APIs to be device-agnostic. We add two new abstract APIs in AcceleratorHooksInterface and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses getAcceleratorHooksInterface in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the
isPinnedPtrandgetPinnedMemoryAllocatormethods.Additional context: To avoid BC-breaking, this PR just preserves the
devicearg of related APIs and would throw a deprecation warning ifdevicearg is passed. Another PR will be submitted to update all PT callers (Tensor.is_pinned(),Tensor.pin_memory()...) not to pass this arg based on this PR. In future,devicearg will be actually removed.cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @LucasLLC @MeetVadakkanchery @mhorowitz @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @tianyu-l @albanD @ezyang
Relates #124908
Relates #14560