Add CUDAGuard to ATen by goldsborough · Pull Request #9277 · pytorch/pytorch

goldsborough · 2018-07-09T18:57:23Z

THCStream was recently moved to ATen by @mruberry: #8997. This PR now introduces a guard class that replaces AutoStream from torch/csrc/ and also uses this new stream interface.

I had to extend the CUDAStream interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor.

@colesbury @apaszke @ezyang

Fixes #7800

goldsborough · 2018-07-10T04:24:53Z

@pytorchbot retest this please

goldsborough · 2018-07-10T22:00:13Z

@pytorchbot retest this please

aten/src/ATen/CUDAStreamGuard.h

torch/csrc/cuda/comm.cpp

goldsborough · 2018-07-12T02:49:07Z

After discussing a strategy with @mruberry, we agreed that it would be better that CUDAStreamGuard (formerly) not only set the current stream, but also the current device to the one associated with the stream supplied to the guard. In that sense, this kind of guard would actually be more of an extension ofat::DeviceGuard, giving it an understanding of CUDA streams (and maybe in the future, more CUDA specific things). As such, we decided to call it CUDAGuard (like DeviceGuard, but for all CUDA things).

Given the above, it might seem to make sense to inherit CUDAGuard from DeviceGuard. At the moment DeviceGuard is still very CUDA specific, so this would be fine. However in the future DeviceGuard should gain more understanding of other kinds of devices. At that point, a CUDAGuard would not follow an is-a relationship with DeviceGuard anymore (e.g. a DeviceGuard can accept a RandomFPGADevice, but it wouldn't make sense for a CUDAGuard to deal with such an object). So instead of inheritance, I think it's better to use composition and implement the device-specific parts of CUDAGuard using a DeviceGuard. So much for my design thoughts here.

NOTE: I changed AT_CHECK in CUDAStream.cpp to AT_ASSERT. AT_ASSERT should be used for asserts without a message, AT_CHECK for asserts with a message.

mruberry · 2018-07-13T18:27:37Z

(summary of offline discussion w/@goldsborough): the entire active stream state must be captured, not just one stream, to preserve the invariant that the stream state resets when the guard is destroyed.

goldsborough · 2018-07-13T23:11:36Z

Now storing the currently active stream for each device upon construction @mruberry
@colesbury could you take a look from PyTorch end?

mruberry · 2018-07-13T23:35:11Z

Looks super good to me and I really like the tests you added.

So glad that AutoStream is officially dead.

facebook-github-bot

@goldsborough has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

aten/src/ATen/DeviceGuard.h

aten/src/ATen/CUDAGuard.h

aten/src/ATen/CUDAStream.h

torch/csrc/cuda/comm.cpp

aten/src/ATen/CUDAGuard.h

goldsborough · 2018-07-17T18:31:58Z

@colesbury I've made both DeviceGuard and CUDAGuard movable, and addressed the nits. Thanks for the review! Let me know if this is better now

… improvements

goldsborough · 2018-07-18T16:13:18Z

@pytorchbot retest this please

colesbury

lgtm

aten/src/ATen/CUDAGuard.h

+  /// Move-constructs this `CUDAGuard` from another `CUDAGuard`. The
+  /// moved-from `CUDAGuard` is modified such that its destruction has no
+  /// effect (does not reset the stream or device).
+  CUDAGuard(CUDAGuard&& other) noexcept


aten/src/ATen/CUDAStream.cpp

@@ -1,10 +1,10 @@
-#include "ATen/CUDAStream.h"
+ #include "ATen/CUDAStream.h"


facebook-github-bot

@goldsborough is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@goldsborough is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Summary: THCStream was recently moved to ATen by mruberry: pytorch/pytorch#8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface. I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor. colesbury apaszke ezyang Fixes pytorch/pytorch#7800 Pull Request resolved: pytorch/pytorch#9277 Differential Revision: D8865183 Pulled By: goldsborough fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7

Summary: ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also: - Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency - Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext. This PR will conflict with #9277 and I will merge with master after #9277 goes in. Pull Request resolved: #9435 Reviewed By: soumith Differential Revision: D8917236 Pulled By: ezyang fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751

Summary: THCStream was recently moved to ATen by mruberry: pytorch#8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface. I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor. colesbury apaszke ezyang Fixes pytorch#7800 Pull Request resolved: pytorch#9277 Differential Revision: D8865183 Pulled By: goldsborough fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7

Summary: ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also: - Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency - Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext. This PR will conflict with pytorch#9277 and I will merge with master after pytorch#9277 goes in. Pull Request resolved: pytorch#9435 Reviewed By: soumith Differential Revision: D8917236 Pulled By: ezyang fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751

Summary: THCStream was recently moved to ATen by mruberry: pytorch#8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface. I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor. colesbury apaszke ezyang Fixes pytorch#7800 Pull Request resolved: pytorch#9277 Differential Revision: D8865183 Pulled By: goldsborough fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7

Summary: ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also: - Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency - Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext. This PR will conflict with pytorch#9277 and I will merge with master after pytorch#9277 goes in. Pull Request resolved: pytorch#9435 Reviewed By: soumith Differential Revision: D8917236 Pulled By: ezyang fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751

goldsborough requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners July 9, 2018 18:57

goldsborough force-pushed the auto-stream branch 2 times, most recently from 11b423a to 8f9994b Compare July 10, 2018 02:06

goldsborough force-pushed the auto-stream branch from 8f9994b to abf6063 Compare July 10, 2018 18:38

mruberry reviewed Jul 11, 2018

View reviewed changes

aten/src/ATen/CUDAStreamGuard.h Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

mruberry reviewed Jul 11, 2018

View reviewed changes

aten/src/ATen/CUDAStreamGuard.h Outdated

This comment was marked as off-topic.

Sign in to view

mruberry reviewed Jul 11, 2018

View reviewed changes

torch/csrc/cuda/comm.cpp Outdated

This comment was marked as off-topic.

Sign in to view

This comment was marked as off-topic.

Sign in to view

goldsborough force-pushed the auto-stream branch from abf6063 to bbd206b Compare July 12, 2018 02:40

goldsborough changed the title ~~Add CUDAStreamGuard to ATen~~ Add CUDAGuard to ATen Jul 13, 2018

goldsborough force-pushed the auto-stream branch from bbd206b to 46eada9 Compare July 13, 2018 23:08

mruberry mentioned this pull request Jul 13, 2018

Creates CUDAContext #9435

Closed

facebook-github-bot reviewed Jul 16, 2018

View reviewed changes

colesbury reviewed Jul 16, 2018

View reviewed changes

goldsborough force-pushed the auto-stream branch from 46eada9 to 354836e Compare July 17, 2018 18:30

goldsborough requested a review from ebetica as a code owner July 17, 2018 18:30

goldsborough force-pushed the auto-stream branch from 354836e to b452c6c Compare July 17, 2018 20:25

goldsborough added 2 commits July 17, 2018 15:28

Create CUDAStreamGuard

f2d173b

Remove AutoStream

f4cb342

goldsborough added 4 commits July 17, 2018 15:28

Bump THCStream refcount

20eea3e

Use the empty state of CUDAStream instead of optional because Windows

e084682

CUDAStreamGuard -> CUDAGuard

98ec2be

Take snapshots of all current streams in CUDAGuard

774fe5c

goldsborough force-pushed the auto-stream branch 2 times, most recently from 0e99968 to cb96296 Compare July 17, 2018 23:18

Made DeviceGuard and CUDAGuard movable and comment nits/error message…

5ba9fd5

… improvements

goldsborough force-pushed the auto-stream branch from cb96296 to 5ba9fd5 Compare July 18, 2018 05:47

colesbury approved these changes Jul 18, 2018

View reviewed changes

Use default move constructor

785cdd4

facebook-github-bot reviewed Jul 18, 2018

View reviewed changes

facebook-github-bot closed this in 3b88650 Jul 18, 2018

ezyang added the merged label Jun 26, 2019

		@@ -1,10 +1,10 @@
		#include "ATen/CUDAStream.h"
		#include "ATen/CUDAStream.h"

Conversation

goldsborough commented Jul 9, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

goldsborough commented Jul 10, 2018

Uh oh!

goldsborough commented Jul 10, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

goldsborough commented Jul 12, 2018

Uh oh!

mruberry commented Jul 13, 2018

Uh oh!

goldsborough commented Jul 13, 2018

Uh oh!

mruberry commented Jul 13, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

goldsborough commented Jul 17, 2018

Uh oh!

goldsborough commented Jul 18, 2018

Uh oh!

colesbury left a comment

Choose a reason for hiding this comment

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

goldsborough commented Jul 9, 2018 •

edited

Loading