Skip to content

Guard gloo algorithm creation with DeviceGuard#9371

Closed
apaszke wants to merge 1 commit intomasterfrom
c10d_device
Closed

Guard gloo algorithm creation with DeviceGuard#9371
apaszke wants to merge 1 commit intomasterfrom
c10d_device

Conversation

@apaszke
Copy link
Contributor

@apaszke apaszke commented Jul 12, 2018

Let us avoid creating a context on GPU0 unnecessarily.

@apaszke apaszke requested a review from pietern July 12, 2018 00:40
@apaszke apaszke requested a review from teng-li as a code owner July 12, 2018 00:40
Copy link
Contributor

@pietern pietern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! It must be a call to cudaMallocHost in Gloo that triggers creation of that context. There are no other allocations/interactions happening with CUDA for the creation of algorithms that have just a single input pointer (as is the case here).

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pietern is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@apaszke apaszke deleted the c10d_device branch July 12, 2018 14:17
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary:
Let us avoid creating a context on GPU0 unnecessarily.
Pull Request resolved: pytorch#9371

Reviewed By: pietern

Differential Revision: D8817343

Pulled By: apaszke

fbshipit-source-id: a6cc91a1dd127840486a42c64f97f117475b0d5f
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants