Skip to content

Make CUDA exceptions unlikely and isolate C10_CUDA_CHECK body#85256

Closed
r-barnes wants to merge 1 commit intopytorch:masterfrom
r-barnes:export-D39619861
Closed

Make CUDA exceptions unlikely and isolate C10_CUDA_CHECK body#85256
r-barnes wants to merge 1 commit intopytorch:masterfrom
r-barnes:export-D39619861

Conversation

@r-barnes
Copy link
Copy Markdown
Contributor

@r-barnes r-barnes commented Sep 19, 2022

This marks CUDA exception checks as unlikely, which might have a positive performance impact.

If further isolates part of C10_CUDA_CHECK into a separate function and file so that code can be made more expressive in subsequent diffs without bloating functions using the check or creating readability issues.

Test Plan: Sandcastle

Differential Revision: D39619861

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot bot commented Sep 19, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85256

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 4 Pending

As of commit a5797e6:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D39619861

1 similar comment
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D39619861

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D39619861

…h#85256)

Summary: Pull Request resolved: pytorch#85256

Test Plan: Sandcastle

Differential Revision: D39619861

fbshipit-source-id: 8910ada87d8d0fc95e7dfcf8d6b6eb2f90e2625b
)
set(C10_CUDA_HEADERS
CUDAException.h
CUDAFunctions.h
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just alphabetizing these since I'm here.

set(C10_CUDA_SRCS
CUDAStream.cpp
CUDACachingAllocator.cpp
CUDAException.cpp
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add this, alphabetize the rest.

@r-barnes r-barnes requested a review from ezyang September 22, 2022 22:12
@ngimel
Copy link
Copy Markdown
Collaborator

ngimel commented Sep 22, 2022

@pytorchbot merge -f "test failure unrelated"

@pytorchmergebot
Copy link
Copy Markdown
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the force (-f) flag. This means your change will be merged immediately, bypassing any CI checks (ETA: 1-5 minutes). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@github-actions
Copy link
Copy Markdown
Contributor

Hey @r-barnes.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

@r-barnes r-barnes added release notes: cuda release notes category topic: improvements topic category topic: not user facing topic category labels Sep 23, 2022
mehtanirav pushed a commit that referenced this pull request Oct 4, 2022
This marks CUDA exception checks as unlikely, which might have a positive performance impact.

If further isolates part of `C10_CUDA_CHECK` into a separate function and file so that code can be made more expressive in subsequent diffs without bloating functions using the check or creating readability issues.

Test Plan: Sandcastle

Differential Revision: D39619861

Pull Request resolved: #85256
Approved by: https://github.com/ezyang, https://github.com/ngimel
pytorchmergebot pushed a commit that referenced this pull request Jan 28, 2023
#93192)

Fix C10_CUDA_CHECK for failing to capture last cuda error occasionally

This error was accidentally introduced by #92227, which was trying to fix_ #91758 as introduced in #85256.

The unit test `TestCuda.test_events_multi_gpu_elapsed_time` has been failed since that PR got merged (in cuda 11.8 and cuda 12.0). That test requires >=2 GPU, so it's probably not tested in the OSS CI?
```
python test/test_cuda.py -v -k TestCuda.test_events_multi_gpu_elapsed_time
```

E.g. in https://github.com/pytorch/pytorch/actions/runs/4026926691/jobs/6922406192
```
2023-01-27T19:41:32.2312162Z   test_events_multi_gpu_elapsed_time (__main__.TestCuda) ... skip: detected only one GPU (0.001s)
```

The original C10_CUDA_CHECK before #85256 has an extra `cudaGetLastError` that captures those cuda errors, https://github.com/pytorch/pytorch/pull/85256/files#diff-0823e63e781acf56e93a5553ed7feee0db0bda05d86e2560c7b80e87e32e0024L41-L42

This extra `cudaGetLastError` was originally introduced in #17337. As commented here https://github.com/pytorch/pytorch/pull/17337/files#r259104503

> soumith on Feb 21, 2019:
Without this, a previously raised error was still lingering and falsely being triggered for a subsequent CUDA call. colesbury suggested that this is the right thing to do.
Pull Request resolved: #93192
Approved by: https://github.com/ezyang
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants