bypass getDeviceFromPtr check when device is known#36714
bypass getDeviceFromPtr check when device is known#36714emcastillo wants to merge 2 commits intopytorch:masterfrom
getDeviceFromPtr check when device is known#36714Conversation
💊 CI failures summary and remediationsAs of commit f7bcd98 (more details on the Dr. CI page): ✅ None of the CI failures appear to be your fault 💚
❄️ 1 failure tentatively classified as flakybut reruns have not yet been triggered to confirm:
|
THCudaInitgetDeviceFromPtr check when device is known
06c1d6d to
ea61b6c
Compare
|
@emcastillo Can you please rebase, PRs that are too old can't be merged. |
ea61b6c to
f7bcd98
Compare
|
Rebased! Thanks 😊 |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Fixes pytorch#36594 In some cases, when using memory that was allocated in another process before doing any memory-related operation in PyTorch, there are errors because the GPU CUDA context is not completely initialized. I guess there is an explicit reason to leave the context not initialized at first, and don't do it in `THCudaInit` where other CUDA calls are going on. I'd like to discuss it in this PR. Possible better solutions are Initialize the device context in `fromDLPack` or `from_blob`, probably by creating some dummy array with one element. But this feels like a hack. Another possibility is to catch the exception in `getDeviceFromPtr`, check if the context was initialized, and if not repeat this operation. but we will need to check for every device. This PR bypasses the `getDeviceFromPtr` call which is the one causing the problem if we already know the device. This allows us to create the Tensor from the shared memory storage but the context will not be initialized. However, it will be when the tensor is accessed later. Pull Request resolved: pytorch#36714 Differential Revision: D21504557 Pulled By: ngimel fbshipit-source-id: 173ccdeb7c2a2b0ece53dd50be97f2df577a5634
Fixes #36594
In some cases, when using memory that was allocated in another process before doing any memory-related operation in PyTorch, there are errors because the GPU CUDA context is not completely initialized.
I guess there is an explicit reason to leave the context not initialized at first, and don't do it in
THCudaInitwhere other CUDA calls are going on.I'd like to discuss it in this PR.
Possible better solutions are
Initialize the device context in
fromDLPackorfrom_blob, probably by creating some dummy array with one element. But this feels like a hack.Another possibility is to catch the exception in
getDeviceFromPtr, check if the context was initialized, and if not repeat this operation. but we will need to check for every device.This PR bypasses the
getDeviceFromPtrcall which is the one causing the problem if we already know the device. This allows us to create the Tensor from the shared memory storage but the context will not be initialized. However, it will be when the tensor is accessed later.