Skip to content

bypass getDeviceFromPtr check when device is known#36714

Closed
emcastillo wants to merge 2 commits intopytorch:masterfrom
emcastillo:dlpack-multiproc
Closed

bypass getDeviceFromPtr check when device is known#36714
emcastillo wants to merge 2 commits intopytorch:masterfrom
emcastillo:dlpack-multiproc

Conversation

@emcastillo
Copy link
Copy Markdown
Collaborator

@emcastillo emcastillo commented Apr 16, 2020

Fixes #36594

In some cases, when using memory that was allocated in another process before doing any memory-related operation in PyTorch, there are errors because the GPU CUDA context is not completely initialized.

I guess there is an explicit reason to leave the context not initialized at first, and don't do it in THCudaInit where other CUDA calls are going on.
I'd like to discuss it in this PR.

Possible better solutions are
Initialize the device context in fromDLPack or from_blob, probably by creating some dummy array with one element. But this feels like a hack.

Another possibility is to catch the exception in getDeviceFromPtr, check if the context was initialized, and if not repeat this operation. but we will need to check for every device.

This PR bypasses the getDeviceFromPtr call which is the one causing the problem if we already know the device. This allows us to create the Tensor from the shared memory storage but the context will not be initialized. However, it will be when the tensor is accessed later.

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Apr 16, 2020

💊 CI failures summary and remediations

As of commit f7bcd98 (more details on the Dr. CI page):


None of the CI failures appear to be your fault 💚



❄️ 1 failure tentatively classified as flaky

but reruns have not yet been triggered to confirm:

See CircleCI build pytorch_bazel_test (1/1)

Step: "Test" (full log | diagnosis details | 🔁 rerun) ❄️

May 07 08:23:52 TIMEOUT: //:optim_test (Summary)
May 07 08:23:50 [       OK ] TensorTest.SetData (0 ms) 
May 07 08:23:50 [ RUN      ] TensorTest.RequiresGradInplace 
May 07 08:23:50 [       OK ] TensorTest.RequiresGradInplace (0 ms) 
May 07 08:23:50 [----------] 39 tests from TensorTest (31 ms total) 
May 07 08:23:50  
May 07 08:23:50 [----------] Global test environment tear-down 
May 07 08:23:50 [==========] 39 tests from 1 test suite ran. (32 ms total) 
May 07 08:23:50 [  PASSED  ] 39 tests. 
May 07 08:23:50 ================================================================================ 
May 07 08:23:52  
May 07 08:23:52 TIMEOUT: //:optim_test (Summary) 
May 07 08:23:52       /var/lib/jenkins/.cache/bazel/_bazel_jenkins/fdf6d09bf4b4f04a71e2a7dfceb40620/execroot/pytorch/bazel-out/k8-fastbuild/testlogs/optim_test/test.log 
May 07 08:23:52 INFO: From Testing //:optim_test: 
May 07 08:23:52 ==================== Test output for //:optim_test: 
May 07 08:23:52 Running main() from gmock_main.cc 
May 07 08:23:52 Note: Google Test filter = -*_CUDA 
May 07 08:23:52 [==========] Running 28 tests from 1 test suite. 
May 07 08:23:52 [----------] Global test environment set-up. 
May 07 08:23:52 [----------] 28 tests from OptimTest 
May 07 08:23:52 [ RUN      ] OptimTest.OptimizerAccessors 
May 07 08:23:52 [       OK ] OptimTest.OptimizerAccessors (2 ms) 

🚧 1 fixed upstream failure:

These were probably caused by upstream breakages that were already fixed.

Please rebase on the viable/strict branch (expand for instructions)

If your commit is newer than viable/strict, you can try basing on an older, stable commit:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase --onto FETCH_HEAD $(git merge-base origin/master HEAD)

If your commit is older than viable/strict:

git fetch https://github.com/pytorch/pytorch viable/strict
git rebase FETCH_HEAD

Check out the recency history of this "viable master" tracking branch.


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 14 times.

@emcastillo emcastillo changed the title Initialize GPU context in THCudaInit bypass getDeviceFromPtr check when device is known Apr 16, 2020
@mruberry mruberry requested a review from ngimel April 18, 2020 06:36
@mruberry mruberry added module: cuda Related to torch.cuda, and CUDA support in general triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Apr 18, 2020
@ngimel
Copy link
Copy Markdown
Collaborator

ngimel commented May 6, 2020

@emcastillo Can you please rebase, PRs that are too old can't be merged.

@emcastillo
Copy link
Copy Markdown
Collaborator Author

Rebased! Thanks 😊

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ngimel merged this pull request in f418339.

laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Fixes pytorch#36594

In some cases, when using memory that was allocated in another process before doing any memory-related operation in PyTorch, there are errors because the GPU CUDA context is not completely initialized.

I guess there is an explicit reason to leave the context not initialized at first, and don't do it in `THCudaInit` where other CUDA calls are going on.
I'd like to discuss it in this PR.

Possible better solutions are
Initialize the device context in `fromDLPack` or `from_blob`, probably by creating some dummy array with one element. But this feels like a hack.

Another possibility is to catch the exception in `getDeviceFromPtr`, check if the context was initialized, and if not repeat this operation. but we will need to check for every device.

This PR bypasses the `getDeviceFromPtr` call which is the one causing the problem if we already know the device. This allows us to create the Tensor from the shared memory storage but the context will not be initialized. However, it will be when the tensor is accessed later.
Pull Request resolved: pytorch#36714

Differential Revision: D21504557

Pulled By: ngimel

fbshipit-source-id: 173ccdeb7c2a2b0ece53dd50be97f2df577a5634
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cuda Related to torch.cuda, and CUDA support in general open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

from_dlpack multiprocess CUDA error

5 participants