Update pin memory related APIs to not pass 'device' argument#131858
Update pin memory related APIs to not pass 'device' argument#131858wizzniu wants to merge 9 commits intopytorch:mainfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131858
Note: Links to docs will display an error until the docs builds have been completed. ❌ 1 New FailureAs of commit ee437e8 with merge base 8c2aa0c ( NEW FAILURE - The following job has failed:
This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
Could we keep BC with the old behavior?
|
|
@guangyey We keep BC with the old behavior now. And it will not be broken until two or three release versions later. As what alban said, we want to redefine pin memory as: "Pin memory is always pinned for the current accelerator device".. So the final result we hope is to drop WDYT, @albanD ? |
|
@albanD Could you help to review this? Or give some suggestions? |
|
just avoid being stale. |
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
b37dcbc to
f3cc153
Compare
|
@pytorchbot rebase |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
f3cc153 to
b861a45
Compare
albanD
left a comment
There was a problem hiding this comment.
FYI @andrewkho for the dataloader changes
torch/_tensor_docs.py
Outdated
There was a problem hiding this comment.
This can say that the default device is the current accelerator and link to https://pytorch.org/docs/main/torch.html#accelerators
torch/_tensor_docs.py
Outdated
torch/storage.py
Outdated
There was a problem hiding this comment.
I'm not sure we actually want to remove this argument altogether. I think saying in the doc that using this argument is discouraged is enough in the short term?
Adding these warnings here is going to be very very spammy as all existing users of these APIs will now start to see them and we don't want that.
There was a problem hiding this comment.
Yes,too many warnings are annoying. But now, the default device of storage.is_pinned/pin_memory is 'cuda', not the current accelerator, is it acceptable?
torch/storage.py
Outdated
There was a problem hiding this comment.
For all versions of this pattern, I would just keep the code shared and pass in device as-is. The c++ side should be able to properly handle the device argument being "None" and detect that as not provided.
That will significantly reduce the code duplication
torch/utils/data/dataloader.py
Outdated
There was a problem hiding this comment.
Same, let's say discouraged with accelerator being the default
There was a problem hiding this comment.
Dumb question but I'm assuming this call has less overhead than what it's replacing @albanD , it calls in here: https://github.com/pytorch/pytorch/blob/main/torch/csrc/Module.cpp#L2130
There was a problem hiding this comment.
@andrewkho Updated. Now it only be called when pin_memory_device argument isn't passed.
Another question: Is it necessary to set device here? From my point of view, pin memory's behavior is in host memory, so it shouldn't be related to id of device? @albanD
There was a problem hiding this comment.
We might have to update torchdata's StatefulDataLoader if it's calling this function directly (which is totally fine), but unsure if downstream users are calling this, in which case it'd be BC breaking
There was a problem hiding this comment.
Veyr good catch, yeah we need to keep this function around as it is actually used in a few places in the wild: https://github.com/search?q=%2Fdata%5C._utils.pin_memory%2F&type=code
b861a45 to
4d09bfd
Compare
4d09bfd to
102d5d2
Compare
| # Check if the pin_memory is functioning properly on custom device | ||
| cpu_tensor = torch.empty(3) | ||
| self.assertFalse(cpu_tensor.is_foo) | ||
| self.assertFalse(cpu_tensor.is_pinned("foo")) |
There was a problem hiding this comment.
I guess we're missing some privateuse1 initialization here?
There was a problem hiding this comment.
Actually not, it has been done in setUpClass() method. The error is first caused by L347 and failed again at L336 when retrying to run this case. We restore the default argument device of storage.is_pinned/pin_memory to 'cuda', so have to add "foo" argument back here.
As for another failed case in TestDataLoader, it's because cuda (and other backends) doesn't support setCurrentDevice() now when calling torch._C._accelerator_hooks_set_current_device(device_id).We must get #131854 merged first, which blocks this PR.
|
@albanD Fix the failed case for MPS, please help to review it. |
|
@pytorchbot merge -r |
|
@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here |
|
Successfully rebased |
2ff588d to
0c1c989
Compare
Merge startedYour change will be merged once all checks pass (ETA 0-4 Hours). Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Merge failedReason: 3 mandatory check(s) failed. The first few are: Dig deeper by viewing the failures on hud |
test/test_dataloader.py
Outdated
| batch_size=16, | ||
| num_workers=2, | ||
| pin_memory=True, | ||
| pin_memory=True if not torch.backends.mps.is_available() else False, |
There was a problem hiding this comment.
@albanD @kulinseth
Since mps doesn't support pin_memory, do you think we should make pin_memory be False in Dataloader forcely for mps? Or just make a small change in testing code here is enough?
There was a problem hiding this comment.
Sorry for the delay here, yes we should preserve the current behavior to avoid doing any BC-breaking change in this PR.
If this flag used to be ignored on MPS, let's preserve that for now.
There was a problem hiding this comment.
@albanD We're going to change the logic from "self._pin_memory = loader.pin_memory and torch.cuda.is_available()" to "self._pin_memory = loader.pin_memory and torch.accelerator.is_available()", which would be BC-breaking for other non-cuda accelerators.
In my opinion, this is what we want because we expect to align the behavior with cuda in DataLoader for other non-cuda accelerators. And to tackle this issue, we have to make this BC-breaking change for other accelerators.
The existing problem is that for MPS, it would throw error when calling .pin_memory() though it has implement code for pin memory related interfaces. So it's special for MPS...
Forcely keep BC for MPS is easy to ensure, but maybe a little inelegant. I just want to confirm whether that's exactly what you mean.
There was a problem hiding this comment.
Yes that matches my expectation.
Fixing the pinned memory allocation for MPS would be the best thing to do, but I don't think it is on you to do that unless you want to. So doing the inelegant thing on MPS to unblock this PR (and make all other accelerator work properly) is an acceptable tradeoff in my mind.
We can fix MPS later once the pinned memory allocation is fixed there.
There was a problem hiding this comment.
@albanD Fine. Added restrictions for MPS currently, commenting the reason as well.
Co-authored-by: albanD <desmaison.alban@gmail.com>
0c1c989 to
f3dc4fc
Compare
albanD
left a comment
There was a problem hiding this comment.
Perfect!
FYI @kulinseth and @malfet we're keeping dataloader pinned memory a noop here, we should fix that when pinned memory is fixed on MPS.
|
@pytorchbot merge -i Thanks! |
Merge startedYour change will be merged while ignoring the following 1 checks: pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge) Learn more about merging in the wiki. Questions? Feedback? Please reach out to the PyTorch DevX Team |
Following [ #131858 suggestion](#131858 (review)) to optimize DataLoader code Pull Request resolved: #146821 Approved by: https://github.com/divyanshk Co-authored-by: Divyansh Khanna <divyanshkhanna09@gmail.com>
Based on #126376, this PR tries to update all PT callers (e.g.,
Tensor.is_pinned(),Tensor.pin_memory()) to not passdeviceargument.As for
storage/untyped_storage.is_pinned()/pin_memory(), we keep thedeviceargument but passingdeviceis discouraged. And if not given, the defaultdeviceis still 'cuda' for BC.Additionally, based on device-agnostic pin_memory,
pin_memory_deviceargument oftorch.utils.data.DataLoaderis discouraged now. For BC, explictly passing this argument is still effective. If not given, the defaultdevicewill be the current accelerator.Fixes #124908
Relates #126376
cc: @albanD
cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o