Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept by wizzniu · Pull Request #126376 · pytorch/pytorch

wizzniu · 2024-05-16T03:49:10Z

This PR re-implements pin memory aiming to get rid of the optional device argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in AcceleratorHooksInterface and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses getAcceleratorHooksInterface in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.

Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the isPinnedPtr and getPinnedMemoryAllocator methods.

Additional context: To avoid BC-breaking, this PR just preserves the device arg of related APIs and would throw a deprecation warning if device arg is passed. Another PR will be submitted to update all PT callers (Tensor.is_pinned(), Tensor.pin_memory()...) not to pass this arg based on this PR. In future, device arg will be actually removed.

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @LucasLLC @MeetVadakkanchery @mhorowitz @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @tianyu-l @albanD @ezyang

Relates #124908
Relates #14560

pytorch-bot · 2024-05-16T03:49:13Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126376

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki or our office hours

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Cancelled Job, 1 Unrelated Failure

As of commit c73e0a4 with merge base c2425a3 ():

CANCELLED JOB - The following job was cancelled. Please retry:

Mac MPS / macos-py3-arm64-mps / test (mps, 1, 1, macos-m2-14) (gh)

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) (gh) (disabled by #131082, #131083)
export/test_export.py::TestExport::test_slice_with_floordiv

This comment was automatically generated by Dr. CI and updates every 15 minutes.

wizzniu · 2024-05-16T03:51:13Z

@albanD @ezyang Could you help to review this? If it's reasonable, I will go on for the next step to refresh all related APIs and modify the test cases.

aten/src/ATen/native/Memory.cpp

ezyang · 2024-05-17T12:43:34Z

@albanD for you

aten/src/ATen/detail/AcceleratorHooksInterface.h

aten/src/ATen/mps/MPSPinnedMemory.h

aten/src/ATen/native/Memory.cpp

aten/src/ATen/native/native_functions.yaml

albanD

Sounds good!
This needs rebasing on latest main so that we get CI signal btw!

albanD · 2024-05-30T21:03:59Z

aten/src/ATen/detail/HIPHooksInterface.h

FYI @jeffdaily this makes HIP an "accelerator". I'm still not sure if you use it but that's just enabling more device-generic feature for the HIP device so I guess you're happy with it. We can remove it if you are not!

IIRC we don't use HIPHooksInterface but rather a hipified version of the CUDAHooksInterface. In any case, we're okay with being an Accelerator. Is there an RFC or something similar describing pytorch's move to these generic interfaces?

The short version is described in https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/DeviceAccelerator.h

albanD · 2024-05-30T21:04:51Z

aten/src/ATen/detail/MTIAHooksInterface.h

FYI @egienvalue you can implement this if/when you need to support pinned host-side memory used for faster transfers to the device!

That is nice. Right now we have a hacky way to pin CPU tensors.

wizzniu · 2024-05-31T08:47:12Z

@pytorchbot merge

pytorch-bot · 2024-05-31T08:47:17Z

Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.

wizzniu · 2024-05-31T08:51:06Z

@albanD Have rebased. Can we merge it now?

aten/src/ATen/native/native_functions.yaml

wizzniu · 2024-06-04T02:56:27Z

@albanD Are there any other problems? Maybe we can start CI to see if any problem exists.

wizzniu · 2024-06-07T09:41:31Z

@pytorchbot rebase

pytorch-bot · 2024-06-07T09:41:35Z

You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra.

wizzniu · 2024-06-07T09:48:56Z

Hi, @albanD I have pushed new commit to handle the failed test case. Can we trigger ci now? It seems that I don't have permissions.

kulinseth · 2024-06-08T07:38:18Z

aten/src/ATen/detail/MPSHooksInterface.h

@wizzniu and @albanD
What is needed to adopt the MPS to the accelerator interface ? If there are APIs missing, we would like to extend it.

@kulinseth Seems that there is no need to add extra APIs for MPS currently. But I don't test mps locally. It still need ci test for mps to verify.

wizzniu · 2024-06-19T02:40:41Z

@albanD Do you have time to look at the newly pushed commits?
And @ezyang ,Could you help to review the code, especially for the part of op's registration and dispatch? ( I notice that you are the code owner of pin memory

ezyang · 2024-06-27T18:58:21Z

It's probably best if @albanD finishes off the review here

wizzniu · 2024-06-28T09:28:25Z

It's probably best if @albanD finishes off the review here

Thanks! Seems that he is busy.

pytorchmergebot · 2024-07-22T08:56:06Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-07-22T14:54:53Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

cyyever · 2024-07-22T14:59:25Z

@pytorchbot merge -i

pytorchmergebot · 2024-07-22T15:01:10Z

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2024-07-22T20:59:48Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

albanD · 2024-07-22T21:46:18Z

@pytorchbot merge

pytorchmergebot · 2024-07-22T21:48:05Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

cyyever · 2024-07-23T01:42:08Z

@pytorchmergebot merge -f "Unrelated failures"

pytorchmergebot · 2024-07-23T01:42:26Z

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

pytorchmergebot · 2024-07-23T01:44:04Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Regression introduced by #126376 Before this change, compiling torch_cpu on my MacBook prints tons of warnings every time HooksInterface is included ``` In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/src/optim/adamw.cpp:1: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/optim/adamw.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_module_holder.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_value.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/detail/static.h:4: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/types.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/ATen.h:7: In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/Context.h:13: /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/HIPHooksInterface.h:27:11: warning: '~HIPHooksInterface' overrides a destructor but is not marked 'override' [-Winconsistent-missing-destructor-override] virtual ~HIPHooksInterface() = default; ^ /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/AcceleratorHooksInterface.h:16:11: note: overridden virtual function is here virtual ~AcceleratorHooksInterface() = default; ^ 1 warning generated. ```

Regression introduced by #126376 Before this change, compiling torch_cpu on my MacBook prints tons of warnings every time HooksInterface is included ``` In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/src/optim/adamw.cpp:1: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/optim/adamw.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_module_holder.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_value.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/detail/static.h:4: In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/types.h:3: In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/ATen.h:7: In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/Context.h:13: /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/HIPHooksInterface.h:27:11: warning: '~HIPHooksInterface' overrides a destructor but is not marked 'override' [-Winconsistent-missing-destructor-override] virtual ~HIPHooksInterface() = default; ^ /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/AcceleratorHooksInterface.h:16:11: note: overridden virtual function is here virtual ~AcceleratorHooksInterface() = default; ^ 1 warning generated. ``` Pull Request resolved: #131204 Approved by: https://github.com/albanD, https://github.com/seemethere

…erator concept (pytorch#126376) This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods. Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods. Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed. Relates pytorch#124908 Relates pytorch#14560 Pull Request resolved: pytorch#126376 Approved by: https://github.com/albanD

…he Accelerator concept (pytorch#126376)" This reverts commit c986aee. Reverted pytorch#126376 on behalf of https://github.com/atalman due to Failing internal builds ([comment](pytorch#126376 (comment)))

…erator concept (pytorch#126376) This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods. Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods. Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed. Relates pytorch#124908 Relates pytorch#14560 Pull Request resolved: pytorch#126376 Approved by: https://github.com/albanD

wizzniu requested review from albanD, egienvalue, eqy, kulinseth, malfet and soulitzer as code owners May 16, 2024 03:49

pytorch-bot bot added the release notes: mps Release notes category label May 16, 2024

pytorchbot added the open source label May 16, 2024

wizzniu mentioned this pull request May 16, 2024

[DataLoader] Select available CUDA or 3rd devices automatically to pin memory #125016

Closed

legionGIT reviewed May 17, 2024

View reviewed changes

aten/src/ATen/native/Memory.cpp Outdated Show resolved Hide resolved

drisspg added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 20, 2024

albanD reviewed May 22, 2024

View reviewed changes

albanD previously approved these changes May 30, 2024

View reviewed changes

wizzniu force-pushed the new_pin_memory branch from 13a84a9 to 50b1875 Compare May 31, 2024 05:05

albanD reviewed May 31, 2024

View reviewed changes

aten/src/ATen/native/native_functions.yaml Outdated Show resolved Hide resolved

kulinseth reviewed Jun 8, 2024

View reviewed changes

wizzniu force-pushed the new_pin_memory branch from 56d2b15 to dce2048 Compare June 28, 2024 09:15

pytorchmergebot added the merging label Jul 22, 2024

pytorchmergebot closed this in 8963623 Jul 23, 2024

pytorchmergebot removed the merging label Jul 23, 2024

PaliC mentioned this pull request Jul 23, 2024

[BE] Improve error message when there are internal changes #131547

Closed

wizzniu mentioned this pull request Jul 26, 2024

Update pin memory related APIs to not pass 'device' argument #131858

Closed

henrylhtsang mentioned this pull request Jul 31, 2024

[BE][typing] fix types in common pruning #132309

Closed

fffrog mentioned this pull request Aug 14, 2024

Keep up to date cosdt/torch_backend#81

Open

bsochack mentioned this pull request Oct 3, 2024

[regression] PT 2.5 does not support memory pinning for out-of-tree accelerators as Intel Gaudi #137262

Closed

janeyx99 mentioned this pull request Dec 19, 2024

pin_memory/is_pinned API is too CUDA-centric #14560

Closed

ur4t mentioned this pull request Mar 19, 2025

The argument 'device' of Tensor.pin_memory() is deprecated LaurentMazare/tch-rs#941

Open

weifengpy mentioned this pull request Jun 27, 2025

Fix FSDP offload pin_memory bug #157147

Closed

albertvillanova mentioned this pull request Feb 8, 2026

DataLoader pin_memory passes deprecated device arg to Tensor.pin_memory() #174546

Closed

aperson30 mentioned this pull request Feb 8, 2026

Fix #174546: Remove deprecated device arg from Tensor.pin_memory() call #174573

Closed

Conversation

wizzniu commented May 16, 2024 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented May 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126376

❌ 1 Cancelled Job, 1 Unrelated Failure

Uh oh!

wizzniu commented May 16, 2024

Uh oh!

Uh oh!

ezyang commented May 17, 2024

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

albanD left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

albanD May 30, 2024

Choose a reason for hiding this comment

Uh oh!

jeffdaily May 30, 2024

Choose a reason for hiding this comment

Uh oh!

albanD May 30, 2024

Choose a reason for hiding this comment

Uh oh!

albanD May 30, 2024

Choose a reason for hiding this comment

Uh oh!

egienvalue May 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wizzniu commented May 31, 2024

Uh oh!

pytorch-bot bot commented May 31, 2024

Uh oh!

wizzniu commented May 31, 2024

Uh oh!

Uh oh!

wizzniu commented Jun 4, 2024

Uh oh!

wizzniu commented Jun 7, 2024

Uh oh!

pytorch-bot bot commented Jun 7, 2024

Uh oh!

wizzniu commented Jun 7, 2024

Uh oh!

kulinseth Jun 8, 2024

Choose a reason for hiding this comment

Uh oh!

wizzniu Jun 11, 2024

Choose a reason for hiding this comment

Uh oh!

wizzniu commented Jun 19, 2024

Uh oh!

ezyang commented Jun 27, 2024

Uh oh!

wizzniu commented Jun 28, 2024

Uh oh!

pytorchmergebot commented Jul 22, 2024

Merge started

Uh oh!

pytorchmergebot commented Jul 22, 2024

Uh oh!

cyyever commented Jul 22, 2024

Uh oh!

pytorchmergebot commented Jul 22, 2024

Merge started

Uh oh!

pytorchmergebot commented Jul 22, 2024

Uh oh!

albanD commented Jul 22, 2024

wizzniu commented May 16, 2024 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented May 16, 2024 •

edited

Loading

albanD left a comment •

edited

Loading

egienvalue May 31, 2024 •

edited

Loading