Skip to content

Update pin memory related APIs to not pass 'device' argument#131858

Closed
wizzniu wants to merge 9 commits intopytorch:mainfrom
wizzniu:update_pin_memory_api
Closed

Update pin memory related APIs to not pass 'device' argument#131858
wizzniu wants to merge 9 commits intopytorch:mainfrom
wizzniu:update_pin_memory_api

Conversation

@wizzniu
Copy link
Contributor

@wizzniu wizzniu commented Jul 26, 2024

Based on #126376, this PR tries to update all PT callers (e.g., Tensor.is_pinned(), Tensor.pin_memory()) to not pass device argument.
As for storage/untyped_storage.is_pinned()/pin_memory(), we keep the device argument but passing device is discouraged. And if not given, the default device is still 'cuda' for BC.
Additionally, based on device-agnostic pin_memory, pin_memory_device argument of torch.utils.data.DataLoader is discouraged now. For BC, explictly passing this argument is still effective. If not given, the default device will be the current accelerator.

Fixes #124908
Relates #126376

cc: @albanD

cc @XilunWu @H-Huang @awgu @kwen2501 @wanchaol @fegin @fduwjj @wz337 @wconstab @d4l3k @c-p-i-o

@pytorch-bot
Copy link

pytorch-bot bot commented Jul 26, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/131858

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit ee437e8 with merge base 8c2aa0c (image):

NEW FAILURE - The following job has failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot pytorch-bot bot added oncall: distributed Add this issue/PR to distributed oncall triage queue release notes: dataloader release notes category labels Jul 26, 2024
@guangyey
Copy link
Collaborator

guangyey commented Jul 26, 2024

Could we keep BC with the old behavior?

  • if a device type is passed, then explicitly use the passed device type;
  • if a None(by default) is passed, the current accelerator type will be used.

@mikaylagawarecki mikaylagawarecki requested a review from albanD July 26, 2024 15:51
@janeyx99 janeyx99 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jul 29, 2024
@wizzniu
Copy link
Contributor Author

wizzniu commented Jul 31, 2024

@guangyey We keep BC with the old behavior now. And it will not be broken until two or three release versions later.

As what alban said, we want to redefine pin memory as: "Pin memory is always pinned for the current accelerator device".. So the final result we hope is to drop device arg or pin_memory_device arg.

WDYT, @albanD ?

@wizzniu
Copy link
Contributor Author

wizzniu commented Aug 9, 2024

@albanD Could you help to review this? Or give some suggestions?

@wizzniu
Copy link
Contributor Author

wizzniu commented Aug 22, 2024

just avoid being stale.

@wizzniu
Copy link
Contributor Author

wizzniu commented Sep 10, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased update_pin_memory_api onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout update_pin_memory_api && git pull --rebase)

@wizzniu
Copy link
Contributor Author

wizzniu commented Sep 30, 2024

@pytorchbot rebase

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased update_pin_memory_api onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout update_pin_memory_api && git pull --rebase)

Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @andrewkho for the dataloader changes

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can say that the default device is the current accelerator and link to https://pytorch.org/docs/main/torch.html#accelerators

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

torch/storage.py Outdated
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we actually want to remove this argument altogether. I think saying in the doc that using this argument is discouraged is enough in the short term?

Adding these warnings here is going to be very very spammy as all existing users of these APIs will now start to see them and we don't want that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes,too many warnings are annoying. But now, the default device of storage.is_pinned/pin_memory is 'cuda', not the current accelerator, is it acceptable?

torch/storage.py Outdated
Comment on lines 385 to 401
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For all versions of this pattern, I would just keep the code shared and pass in device as-is. The c++ side should be able to properly handle the device argument being "None" and detect that as not provided.

That will significantly reduce the code duplication

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, let's say discouraged with accelerator being the default

Copy link
Contributor

@andrewkho andrewkho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dataloader side LGTM, we'll need to cherry-pick this to pytorch/data as well for StatefulDataLoader but we can take care of that. Will let @albanD give final stamp

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question but I'm assuming this call has less overhead than what it's replacing @albanD , it calls in here: https://github.com/pytorch/pytorch/blob/main/torch/csrc/Module.cpp#L2130

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewkho Updated. Now it only be called when pin_memory_device argument isn't passed.
Another question: Is it necessary to set device here? From my point of view, pin memory's behavior is in host memory, so it shouldn't be related to id of device? @albanD

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might have to update torchdata's StatefulDataLoader if it's calling this function directly (which is totally fine), but unsure if downstream users are calling this, in which case it'd be BC breaking

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Veyr good catch, yeah we need to keep this function around as it is actually used in a few places in the wild: https://github.com/search?q=%2Fdata%5C._utils.pin_memory%2F&type=code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Restore it.

@wizzniu wizzniu force-pushed the update_pin_memory_api branch from b861a45 to 4d09bfd Compare October 9, 2024 11:03
@wizzniu wizzniu requested a review from divyanshk as a code owner October 9, 2024 11:03
@wizzniu wizzniu force-pushed the update_pin_memory_api branch from 4d09bfd to 102d5d2 Compare October 9, 2024 11:09
@wizzniu wizzniu requested a review from albanD October 9, 2024 11:11
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!
Thanks for making all these edits!

CI needs fixing of course ;)

# Check if the pin_memory is functioning properly on custom device
cpu_tensor = torch.empty(3)
self.assertFalse(cpu_tensor.is_foo)
self.assertFalse(cpu_tensor.is_pinned("foo"))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we're missing some privateuse1 initialization here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually not, it has been done in setUpClass() method. The error is first caused by L347 and failed again at L336 when retrying to run this case. We restore the default argument device of storage.is_pinned/pin_memory to 'cuda', so have to add "foo" argument back here.
As for another failed case in TestDataLoader, it's because cuda (and other backends) doesn't support setCurrentDevice() now when calling torch._C._accelerator_hooks_set_current_device(device_id).We must get #131854 merged first, which blocks this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD Have a look at PR #131854 first?

@wizzniu
Copy link
Contributor Author

wizzniu commented Dec 26, 2024

@albanD Fix the failed case for MPS, please help to review it.
The reason is that DataLoader disables pin memory when pin_memory=True, pin_memory_device="" and cuda is unavailable in previous, but now it enables pin memory when cuda is unavailable but mps is available. I guess that pin_memory is not supported on MPS? So I just fix this case using pin_memory=False for MPS only.

@wizzniu
Copy link
Contributor Author

wizzniu commented Jan 2, 2025

@pytorchbot merge -r

@pytorchmergebot
Copy link
Collaborator

@pytorchbot started a rebase job onto refs/remotes/origin/viable/strict. Check the current status here

@pytorchmergebot
Copy link
Collaborator

Successfully rebased update_pin_memory_api onto refs/remotes/origin/viable/strict, please pull locally before adding more changes (for example, via git checkout update_pin_memory_api && git pull --rebase)

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: 3 mandatory check(s) failed. The first few are:

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Failing merge rule: Core Maintainers

batch_size=16,
num_workers=2,
pin_memory=True,
pin_memory=True if not torch.backends.mps.is_available() else False,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD @kulinseth
Since mps doesn't support pin_memory, do you think we should make pin_memory be False in Dataloader forcely for mps? Or just make a small change in testing code here is enough?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay here, yes we should preserve the current behavior to avoid doing any BC-breaking change in this PR.
If this flag used to be ignored on MPS, let's preserve that for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD We're going to change the logic from "self._pin_memory = loader.pin_memory and torch.cuda.is_available()" to "self._pin_memory = loader.pin_memory and torch.accelerator.is_available()", which would be BC-breaking for other non-cuda accelerators.
In my opinion, this is what we want because we expect to align the behavior with cuda in DataLoader for other non-cuda accelerators. And to tackle this issue, we have to make this BC-breaking change for other accelerators.
The existing problem is that for MPS, it would throw error when calling .pin_memory() though it has implement code for pin memory related interfaces. So it's special for MPS...
Forcely keep BC for MPS is easy to ensure, but maybe a little inelegant. I just want to confirm whether that's exactly what you mean.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that matches my expectation.
Fixing the pinned memory allocation for MPS would be the best thing to do, but I don't think it is on you to do that unless you want to. So doing the inelegant thing on MPS to unblock this PR (and make all other accelerator work properly) is an acceptable tradeoff in my mind.

We can fix MPS later once the pinned memory allocation is fixed there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@albanD Fine. Added restrictions for MPS currently, commenting the reason as well.

@wizzniu wizzniu force-pushed the update_pin_memory_api branch from 0c1c989 to f3dc4fc Compare January 15, 2025 07:17
Copy link
Collaborator

@albanD albanD left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect!
FYI @kulinseth and @malfet we're keeping dataloader pinned memory a noop here, we should fix that when pinned memory is fixed on MPS.

@albanD
Copy link
Collaborator

albanD commented Jan 15, 2025

@pytorchbot merge -i

Thanks!

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged while ignoring the following 1 checks: pull / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, lf.linux.2xlarge)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

pytorchmergebot pushed a commit that referenced this pull request Jul 8, 2025
Following [ #131858 suggestion](#131858 (review)) to optimize DataLoader code

Pull Request resolved: #146821
Approved by: https://github.com/divyanshk

Co-authored-by: Divyansh Khanna <divyanshkhanna09@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/inductor ciflow/trunk Trigger trunk jobs on your pull request Merged oncall: distributed Add this issue/PR to distributed oncall triage queue open source release notes: dataloader release notes category triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DataLoader's pin_memory is default to CUDA if parameter pin_memory_device is not set

7 participants