Creates CUDAContext by mruberry · Pull Request #9435 · pytorch/pytorch

mruberry · 2018-07-13T23:55:44Z

@ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also:

Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency
Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks

The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext.

This PR will conflict with #9277 and I will merge with master after #9277 goes in.

mruberry · 2018-07-14T01:44:44Z

Windows failure is a real import issue which I think I have a local fix for. Going to delay because I expect to merge with master shortly, anyway, for #9277,

ezyang · 2018-07-14T03:33:03Z

Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency

OMG thank you so much!

aten/src/ATen/cuda/CUDAApplyUtils.cuh

aten/src/ATen/cuda/Exceptions.h

torch/csrc/autograd/profiler.h

ezyang

Thanks, very happy to have less gunk in the CPU Context.

mruberry · 2018-07-14T04:56:32Z

REMINDER: DO NOT MERGE THIS (YET).

Thanks for the review @ezyang. I like your thinking and will scrub the qualifiers to a reasonable level. I can probably find a convenient way to lock the AT_CUDNN_CHECK macro up, too ;)

As for the globalCUDAContext() verbosity I also would like to do something about that. Instead of doing it in this PR, however, can we briefly delay a plan there? I suspect we can significantly refactor THCState into ATen now (well, after your allocator PR and this PR are in), and seeing how that works will likely give us a better idea of how we want to expose these CUDA calls.

ezyang · 2018-07-14T15:03:03Z

Yeah, I'm very happy to delay renaming the functions.

mruberry · 2018-07-19T08:34:49Z

The clang5-asan error is also occurring for multiple other PRs currently:

08:32:10 test_neg (main.TestTorch) ... /var/lib/jenkins/workspace/aten/src/TH/generic/THTensorCopy.cpp:234:1: runtime error: 1.38519e+219 is outside the range of representable values of type 'int'

mruberry · 2018-07-19T08:37:30Z

With #9277 in, this update:

Eliminates the CUDAContext object to clarify that there is a single context and reduce verbosity
Scrubs the qualified names (per @ezyang's request)
Moves CUDAGuard to ATen/cuda for consistency (CUDAGuard was introduced in Add CUDAGuard to ATen #9277)

facebook-github-bot

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ezyang · 2018-07-20T15:29:57Z

I've fixed some of the internal errors and rebased internally. Sorry about the lack of visibility. It's testing now.

mruberry · 2018-07-20T16:16:10Z

Thanks for taking it all the way through! I mentioned to Soumith yesterday how much I appreciate it.

Summary: ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also: - Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency - Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext. This PR will conflict with #9277 and I will merge with master after #9277 goes in. Pull Request resolved: pytorch/pytorch#9435 Reviewed By: soumith Differential Revision: D8917236 Pulled By: ezyang fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751

Summary: ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also: - Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency - Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext. This PR will conflict with pytorch#9277 and I will merge with master after pytorch#9277 goes in. Pull Request resolved: pytorch#9435 Reviewed By: soumith Differential Revision: D8917236 Pulled By: ezyang fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751

mruberry requested review from apaszke, colesbury, ezyang, gchanan, soumith and zdevito as code owners July 13, 2018 23:55

ezyang reviewed Jul 14, 2018

View reviewed changes

aten/src/ATen/cuda/CUDAApplyUtils.cuh Outdated

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Jul 14, 2018

View reviewed changes

aten/src/ATen/cuda/Exceptions.h Outdated

This comment was marked as off-topic.

Sign in to view

ezyang reviewed Jul 14, 2018

View reviewed changes

torch/csrc/autograd/profiler.h Outdated

This comment was marked as off-topic.

Sign in to view

ezyang approved these changes Jul 14, 2018

View reviewed changes

Merges with master, cleans up namespaces, removes CUDAContext class

2abc59b

mruberry force-pushed the aten_cuda_context branch from 96df033 to 2abc59b Compare July 19, 2018 00:38

mruberry added 4 commits July 18, 2018 21:53

Build fixes

73bfea3

Build fixes

7727a01

HIP and Windows build fixes

d4b94da

last Windows attempt

a5f09d3

facebook-github-bot reviewed Jul 19, 2018

View reviewed changes

facebook-github-bot closed this in 1003ccf Jul 20, 2018

ngimel mentioned this pull request Jul 21, 2018

add fused dropout kernels #9666

Closed

ngimel pushed a commit to ngimel/pytorch that referenced this pull request Jul 21, 2018

fixes for pytorch#9435

ae176af

fmassa mentioned this pull request Jul 30, 2018

[feature request] ROI Pooling layers pytorch/vision#477

Closed

3 tasks

soumith mentioned this pull request Aug 24, 2018

Migrate PyTorch to C++ bindings horovod/horovod#458

Merged

mruberry deleted the aten_cuda_context branch March 16, 2019 04:39

ezyang added open source merged labels Jun 24, 2019

Conversation

mruberry commented Jul 13, 2018

Uh oh!

mruberry commented Jul 14, 2018

Uh oh!

ezyang commented Jul 14, 2018

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

This comment was marked as off-topic.

Uh oh!

ezyang left a comment

Choose a reason for hiding this comment

Uh oh!

mruberry commented Jul 14, 2018

Uh oh!

ezyang commented Jul 14, 2018

Uh oh!

mruberry commented Jul 19, 2018

Uh oh!

mruberry commented Jul 19, 2018

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ezyang commented Jul 20, 2018

Uh oh!

mruberry commented Jul 20, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants