Skip to content

Creates CUDAContext#9435

Closed
mruberry wants to merge 5 commits intopytorch:masterfrom
mruberry:aten_cuda_context
Closed

Creates CUDAContext#9435
mruberry wants to merge 5 commits intopytorch:masterfrom
mruberry:aten_cuda_context

Conversation

@mruberry
Copy link
Collaborator

@ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also:

  • Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency
  • Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks

The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext.

This PR will conflict with #9277 and I will merge with master after #9277 goes in.

@mruberry
Copy link
Collaborator Author

Windows failure is a real import issue which I think I have a local fix for. Going to delay because I expect to merge with master shortly, anyway, for #9277,

@ezyang
Copy link
Contributor

ezyang commented Jul 14, 2018

Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency

OMG thank you so much!

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, very happy to have less gunk in the CPU Context.

@mruberry
Copy link
Collaborator Author

REMINDER: DO NOT MERGE THIS (YET).

Thanks for the review @ezyang. I like your thinking and will scrub the qualifiers to a reasonable level. I can probably find a convenient way to lock the AT_CUDNN_CHECK macro up, too ;)

As for the globalCUDAContext() verbosity I also would like to do something about that. Instead of doing it in this PR, however, can we briefly delay a plan there? I suspect we can significantly refactor THCState into ATen now (well, after your allocator PR and this PR are in), and seeing how that works will likely give us a better idea of how we want to expose these CUDA calls.

@ezyang
Copy link
Contributor

ezyang commented Jul 14, 2018

Yeah, I'm very happy to delay renaming the functions.

@mruberry mruberry force-pushed the aten_cuda_context branch from 96df033 to 2abc59b Compare July 19, 2018 00:38
@mruberry
Copy link
Collaborator Author

The clang5-asan error is also occurring for multiple other PRs currently:

08:32:10 test_neg (main.TestTorch) ... /var/lib/jenkins/workspace/aten/src/TH/generic/THTensorCopy.cpp:234:1: runtime error: 1.38519e+219 is outside the range of representable values of type 'int'

@mruberry
Copy link
Collaborator Author

With #9277 in, this update:

  • Eliminates the CUDAContext object to clarify that there is a single context and reduce verbosity
  • Scrubs the qualified names (per @ezyang's request)
  • Moves CUDAGuard to ATen/cuda for consistency (CUDAGuard was introduced in Add CUDAGuard to ATen #9277)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented Jul 20, 2018

I've fixed some of the internal errors and rebased internally. Sorry about the lack of visibility. It's testing now.

@mruberry
Copy link
Collaborator Author

Thanks for taking it all the way through! I mentioned to Soumith yesterday how much I appreciate it.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Jul 20, 2018
Summary:
ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also:

- Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency
- Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks

The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext.

This PR will conflict with #9277 and I will merge with master after #9277 goes in.
Pull Request resolved: pytorch/pytorch#9435

Reviewed By: soumith

Differential Revision: D8917236

Pulled By: ezyang

fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751
ngimel pushed a commit to ngimel/pytorch that referenced this pull request Jul 21, 2018
jramseyer pushed a commit to jramseyer/pytorch that referenced this pull request Jul 30, 2018
Summary:
ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also:

- Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency
- Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks

The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext.

This PR will conflict with pytorch#9277 and I will merge with master after pytorch#9277 goes in.
Pull Request resolved: pytorch#9435

Reviewed By: soumith

Differential Revision: D8917236

Pulled By: ezyang

fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751
goodlux pushed a commit to goodlux/pytorch that referenced this pull request Aug 15, 2018
Summary:
ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also:

- Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency
- Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks

The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext.

This PR will conflict with pytorch#9277 and I will merge with master after pytorch#9277 goes in.
Pull Request resolved: pytorch#9435

Reviewed By: soumith

Differential Revision: D8917236

Pulled By: ezyang

fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751
@mruberry mruberry deleted the aten_cuda_context branch March 16, 2019 04:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants