Skip to content

Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept#126376

Closed
wizzniu wants to merge 13 commits intopytorch:mainfrom
wizzniu:new_pin_memory
Closed

Re-implement pin_memory to be device-agnostic by leveraging the Accelerator concept#126376
wizzniu wants to merge 13 commits intopytorch:mainfrom
wizzniu:new_pin_memory

Conversation

@wizzniu
Copy link
Contributor

@wizzniu wizzniu commented May 16, 2024

This PR re-implements pin memory aiming to get rid of the optional device argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in AcceleratorHooksInterface and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses getAcceleratorHooksInterface in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.

Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the isPinnedPtr and getPinnedMemoryAllocator methods.

Additional context: To avoid BC-breaking, this PR just preserves the device arg of related APIs and would throw a deprecation warning if device arg is passed. Another PR will be submitted to update all PT callers (Tensor.is_pinned(), Tensor.pin_memory()...) not to pass this arg based on this PR. In future, device arg will be actually removed.

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang @LucasLLC @MeetVadakkanchery @mhorowitz @mrshenli @pritamdamania87 @zhaojuanmao @satgera @gqchen @aazzolini @osalpekar @jiayisuse @tianyu-l @albanD @ezyang

Relates #124908
Relates #14560

@pytorch-bot
Copy link

pytorch-bot bot commented May 16, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126376

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 Cancelled Job, 1 Unrelated Failure

As of commit c73e0a4 with merge base c2425a3 (image):

CANCELLED JOB - The following job was cancelled. Please retry:

FLAKY - The following job failed but was likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added the release notes: mps Release notes category label May 16, 2024
@wizzniu
Copy link
Contributor Author

wizzniu commented May 16, 2024

@albanD @ezyang Could you help to review this? If it's reasonable, I will go on for the next step to refresh all related APIs and modify the test cases.

@ezyang
Copy link
Contributor

ezyang commented May 17, 2024

@albanD for you

@drisspg drisspg added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 20, 2024
albanD
albanD previously approved these changes May 30, 2024
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good!
This needs rebasing on latest main so that we get CI signal btw!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @jeffdaily this makes HIP an "accelerator". I'm still not sure if you use it but that's just enabling more device-generic feature for the HIP device so I guess you're happy with it. We can remove it if you are not!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we don't use HIPHooksInterface but rather a hipified version of the CUDAHooksInterface. In any case, we're okay with being an Accelerator. Is there an RFC or something similar describing pytorch's move to these generic interfaces?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @egienvalue you can implement this if/when you need to support pinned host-side memory used for faster transfers to the device!

Copy link
Contributor

@egienvalue egienvalue May 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is nice. Right now we have a hacky way to pin CPU tensors.

@wizzniu
Copy link
Contributor Author

wizzniu commented May 31, 2024

@pytorchbot merge

@pytorch-bot
Copy link

pytorch-bot bot commented May 31, 2024

Pull workflow has not been scheduled for the PR yet. It could be because author doesn't have permissions to run those or skip-checks keywords were added to PR/commits, aborting merge. Please get/give approval for the workflows and/or remove skip ci decorators before next merge attempt. If you think this is a mistake, please contact PyTorch Dev Infra.

@wizzniu
Copy link
Contributor Author

wizzniu commented May 31, 2024

@albanD Have rebased. Can we merge it now?

@wizzniu
Copy link
Contributor Author

wizzniu commented Jun 4, 2024

@albanD Are there any other problems? Maybe we can start CI to see if any problem exists.

@wizzniu
Copy link
Contributor Author

wizzniu commented Jun 7, 2024

@pytorchbot rebase

@pytorch-bot
Copy link

pytorch-bot bot commented Jun 7, 2024

You don't have permissions to rebase this PR since you are a first time contributor. If you think this is a mistake, please contact PyTorch Dev Infra.

@wizzniu
Copy link
Contributor Author

wizzniu commented Jun 7, 2024

Hi, @albanD I have pushed new commit to handle the failed test case. Can we trigger ci now? It seems that I don't have permissions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wizzniu and @albanD
What is needed to adopt the MPS to the accelerator interface ? If there are APIs missing, we would like to extend it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kulinseth Seems that there is no need to add extra APIs for MPS currently. But I don't test mps locally. It still need ci test for mps to verify.

@wizzniu
Copy link
Contributor Author

wizzniu commented Jun 19, 2024

@albanD Do you have time to look at the newly pushed commits?
And @ezyang ,Could you help to review the code, especially for the part of op's registration and dispatch? ( I notice that you are the code owner of pin memory

@ezyang
Copy link
Contributor

ezyang commented Jun 27, 2024

It's probably best if @albanD finishes off the review here

@wizzniu
Copy link
Contributor Author

wizzniu commented Jun 28, 2024

It's probably best if @albanD finishes off the review here

Thanks! Seems that he is busy.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@cyyever
Copy link
Collaborator

cyyever commented Jul 22, 2024

@pytorchbot merge -i

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@albanD
Copy link
Collaborator

albanD commented Jul 22, 2024

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@cyyever
Copy link
Collaborator

cyyever commented Jul 23, 2024

@pytorchmergebot merge -f "Unrelated failures"

@pytorchmergebot
Copy link
Collaborator

The merge job was canceled or timed out. This most often happen if two merge requests were issued for the same PR, or if merge job was waiting for more than 6 hours for tests to finish. In later case, please do not hesitate to reissue the merge command
For more information see pytorch-bot wiki.

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jul 23, 2024
Regression introduced by #126376

Before this change, compiling torch_cpu on my MacBook prints tons of warnings every time HooksInterface is included
```
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/src/optim/adamw.cpp:1:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/optim/adamw.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_module_holder.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_value.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/detail/static.h:4:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/ATen.h:7:
In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/Context.h:13:
/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/HIPHooksInterface.h:27:11: warning: '~HIPHooksInterface' overrides a destructor but is not marked 'override' [-Winconsistent-missing-destructor-override]
  virtual ~HIPHooksInterface() = default;
          ^
/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/AcceleratorHooksInterface.h:16:11: note: overridden virtual function is here
  virtual ~AcceleratorHooksInterface() = default;
          ^
1 warning generated.
```
pytorchmergebot pushed a commit that referenced this pull request Jul 23, 2024
Regression introduced by #126376

Before this change, compiling torch_cpu on my MacBook prints tons of warnings every time HooksInterface is included
```
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/src/optim/adamw.cpp:1:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/optim/adamw.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_module_holder.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_value.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/detail/static.h:4:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/ATen.h:7:
In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/Context.h:13:
/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/HIPHooksInterface.h:27:11: warning: '~HIPHooksInterface' overrides a destructor but is not marked 'override' [-Winconsistent-missing-destructor-override]
  virtual ~HIPHooksInterface() = default;
          ^
/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/AcceleratorHooksInterface.h:16:11: note: overridden virtual function is here
  virtual ~AcceleratorHooksInterface() = default;
          ^
1 warning generated.
```

Pull Request resolved: #131204
Approved by: https://github.com/albanD, https://github.com/seemethere
pytorchmergebot pushed a commit that referenced this pull request Jul 24, 2024
Regression introduced by #126376

Before this change, compiling torch_cpu on my MacBook prints tons of warnings every time HooksInterface is included
```
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/src/optim/adamw.cpp:1:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/optim/adamw.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_module_holder.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/container/any_value.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/detail/static.h:4:
In file included from /Users/nshulga/git/pytorch/pytorch/torch/csrc/api/include/torch/types.h:3:
In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/ATen.h:7:
In file included from /Users/nshulga/git/pytorch/pytorch/aten/src/ATen/Context.h:13:
/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/HIPHooksInterface.h:27:11: warning: '~HIPHooksInterface' overrides a destructor but is not marked 'override' [-Winconsistent-missing-destructor-override]
  virtual ~HIPHooksInterface() = default;
          ^
/Users/nshulga/git/pytorch/pytorch/aten/src/ATen/detail/AcceleratorHooksInterface.h:16:11: note: overridden virtual function is here
  virtual ~AcceleratorHooksInterface() = default;
          ^
1 warning generated.
```

Pull Request resolved: #131204
Approved by: https://github.com/albanD, https://github.com/seemethere
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
…erator concept (pytorch#126376)

This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.

Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods.

Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed.

Relates pytorch#124908
Relates pytorch#14560
Pull Request resolved: pytorch#126376
Approved by: https://github.com/albanD
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
…he Accelerator concept (pytorch#126376)"

This reverts commit c986aee.

Reverted pytorch#126376 on behalf of https://github.com/atalman due to Failing internal builds ([comment](pytorch#126376 (comment)))
xuhancn pushed a commit to xuhancn/pytorch that referenced this pull request Jul 25, 2024
…erator concept (pytorch#126376)

This PR re-implements pin memory aiming to get rid of the optional `device` argument and makes all related APIs to be device-agnostic. We add two new abstract APIs in [AcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/detail/AcceleratorHooksInterface.h#L12) and redefine pin memory as: "Pin memory is always pinned for the current accelerator device". In detail, it uses [getAcceleratorHooksInterface](https://github.com/pytorch/pytorch/blob/main/aten/src/ATen/Context.h#L61) in pin_memory/is_pinned to get an appropriate device and invoke the corresponding overridden interfaces, instead of using BackendSelect and then dispatching to CUDA or other specific backends' implement methods.

Note: For new backends who want to implement and use pin memory, just inherit AcceleratorHooksInterface and overwrite the `isPinnedPtr` and `getPinnedMemoryAllocator` methods.

Additional context: To avoid BC-breaking, this PR just preserves the `device` arg of related APIs and would throw a deprecation warning if `device` arg is passed. Another PR will be submitted to update all PT callers (`Tensor.is_pinned()`, `Tensor.pin_memory()`...) not to pass this arg based on this PR. In future, `device` arg will be actually removed.

Relates pytorch#124908
Relates pytorch#14560
Pull Request resolved: pytorch#126376
Approved by: https://github.com/albanD
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/mps Run MPS tests (subset of trunk) ciflow/rocm Trigger "default" config CI on ROCm ciflow/trunk Trigger trunk jobs on your pull request ciflow/xpu Run XPU CI tasks Merged module: inductor oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: mps Release notes category Reverted triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.