Make CUDA exceptions unlikely and isolate C10_CUDA_CHECK body by r-barnes · Pull Request #85256 · pytorch/pytorch

r-barnes · 2022-09-19T14:20:46Z

This marks CUDA exception checks as unlikely, which might have a positive performance impact.

If further isolates part of C10_CUDA_CHECK into a separate function and file so that code can be made more expressive in subsequent diffs without bloating functions using the check or creating readability issues.

Test Plan: Sandcastle

Differential Revision: D39619861

pytorch-bot · 2022-09-19T14:20:48Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85256

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit our office hours

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 4 Pending

As of commit a5797e6:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

facebook-github-bot · 2022-09-19T14:21:41Z

This pull request was exported from Phabricator. Differential Revision: D39619861

facebook-github-bot · 2022-09-19T20:02:41Z

This pull request was exported from Phabricator. Differential Revision: D39619861

facebook-github-bot · 2022-09-19T20:12:23Z

This pull request was exported from Phabricator. Differential Revision: D39619861

…h#85256) Summary: Pull Request resolved: pytorch#85256 Test Plan: Sandcastle Differential Revision: D39619861 fbshipit-source-id: 8910ada87d8d0fc95e7dfcf8d6b6eb2f90e2625b

r-barnes · 2022-09-22T22:09:13Z

c10/cuda/CMakeLists.txt

 )
 set(C10_CUDA_HEADERS
    CUDAException.h
+    CUDAFunctions.h


Just alphabetizing these since I'm here.

r-barnes · 2022-09-22T22:09:22Z

c10/cuda/CMakeLists.txt

 set(C10_CUDA_SRCS
-    CUDAStream.cpp
+    CUDACachingAllocator.cpp
+    CUDAException.cpp


Add this, alphabetize the rest.

ngimel · 2022-09-22T23:13:46Z

@pytorchbot merge -f "test failure unrelated"

pytorchmergebot · 2022-09-22T23:15:06Z

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered with the force (-f) flag. This means your change will be merged immediately, bypassing any CI checks (ETA: 1-5 minutes). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

github-actions · 2022-09-22T23:15:51Z

Hey @r-barnes.
You've committed this PR, but it does not have both a 'release notes: ...' and 'topics: ...' label. Please add one of each to the PR. The 'release notes: ...' label should represent the part of PyTorch that this PR changes (fx, autograd, distributed, etc) and the 'topics: ...' label should represent the kind of PR it is (not user facing, new feature, bug fix, perf improvement, etc). The list of valid labels can be found here for the 'release notes: ...' and here for the 'topics: ...'.
For changes that are 'topic: not user facing' there is no need for a release notes label.

This marks CUDA exception checks as unlikely, which might have a positive performance impact. If further isolates part of `C10_CUDA_CHECK` into a separate function and file so that code can be made more expressive in subsequent diffs without bloating functions using the check or creating readability issues. Test Plan: Sandcastle Differential Revision: D39619861 Pull Request resolved: #85256 Approved by: https://github.com/ezyang, https://github.com/ngimel

#93192) Fix C10_CUDA_CHECK for failing to capture last cuda error occasionally This error was accidentally introduced by #92227, which was trying to fix_ #91758 as introduced in #85256. The unit test `TestCuda.test_events_multi_gpu_elapsed_time` has been failed since that PR got merged (in cuda 11.8 and cuda 12.0). That test requires >=2 GPU, so it's probably not tested in the OSS CI? ``` python test/test_cuda.py -v -k TestCuda.test_events_multi_gpu_elapsed_time ``` E.g. in https://github.com/pytorch/pytorch/actions/runs/4026926691/jobs/6922406192 ``` 2023-01-27T19:41:32.2312162Z test_events_multi_gpu_elapsed_time (__main__.TestCuda) ... skip: detected only one GPU (0.001s) ``` The original C10_CUDA_CHECK before #85256 has an extra `cudaGetLastError` that captures those cuda errors, https://github.com/pytorch/pytorch/pull/85256/files#diff-0823e63e781acf56e93a5553ed7feee0db0bda05d86e2560c7b80e87e32e0024L41-L42 This extra `cudaGetLastError` was originally introduced in #17337. As commented here https://github.com/pytorch/pytorch/pull/17337/files#r259104503 > soumith on Feb 21, 2019: Without this, a previously raised error was still lingering and falsely being triggered for a subsequent CUDA call. colesbury suggested that this is the right thing to do. Pull Request resolved: #93192 Approved by: https://github.com/ezyang

facebook-github-bot added cla signed fb-exported labels Sep 19, 2022

r-barnes force-pushed the export-D39619861 branch from df3ba51 to a18dd98 Compare September 19, 2022 20:02

r-barnes force-pushed the export-D39619861 branch from a18dd98 to 57e69a2 Compare September 19, 2022 20:12

Make CUDA exceptions unlikely and isolate C10_CUDA_CHECK body (pytorc…

a5797e6

…h#85256) Summary: Pull Request resolved: pytorch#85256 Test Plan: Sandcastle Differential Revision: D39619861 fbshipit-source-id: 8910ada87d8d0fc95e7dfcf8d6b6eb2f90e2625b

r-barnes force-pushed the export-D39619861 branch from 3b9dd5e to a5797e6 Compare September 22, 2022 18:02

r-barnes commented Sep 22, 2022

View reviewed changes

r-barnes requested a review from ezyang September 22, 2022 22:12

ezyang approved these changes Sep 22, 2022

View reviewed changes

ngimel approved these changes Sep 22, 2022

View reviewed changes

pytorchmergebot added the Merged label Sep 22, 2022

pytorchmergebot closed this in f0869cc Sep 22, 2022

r-barnes added release notes: cuda release notes category topic: improvements topic category topic: not user facing topic category labels Sep 23, 2022

ppwwyyxx mentioned this pull request Jan 5, 2023

C10_CUDA_KERNEL_LAUNCH_CHECK() becomes a no-op #91758

Closed

xwang233 mentioned this pull request Jan 28, 2023

Fix C10_CUDA_CHECK for failing to capture last cuda error occasionally #93192

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make CUDA exceptions unlikely and isolate C10_CUDA_CHECK body#85256

Make CUDA exceptions unlikely and isolate C10_CUDA_CHECK body#85256
r-barnes wants to merge 1 commit intopytorch:masterfrom
r-barnes:export-D39619861

r-barnes commented Sep 19, 2022 •

edited

Loading

Uh oh!

pytorch-bot bot commented Sep 19, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Sep 19, 2022

Uh oh!

facebook-github-bot commented Sep 19, 2022

Uh oh!

facebook-github-bot commented Sep 19, 2022

Uh oh!

r-barnes Sep 22, 2022

Uh oh!

r-barnes Sep 22, 2022

Uh oh!

ngimel commented Sep 22, 2022

Uh oh!

pytorchmergebot commented Sep 22, 2022

Uh oh!

github-actions bot commented Sep 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

r-barnes commented Sep 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Sep 19, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85256

✅ No Failures, 4 Pending

Uh oh!

facebook-github-bot commented Sep 19, 2022

Uh oh!

facebook-github-bot commented Sep 19, 2022

Uh oh!

facebook-github-bot commented Sep 19, 2022

Uh oh!

r-barnes Sep 22, 2022

Choose a reason for hiding this comment

Uh oh!

r-barnes Sep 22, 2022

Choose a reason for hiding this comment

Uh oh!

ngimel commented Sep 22, 2022

Uh oh!

pytorchmergebot commented Sep 22, 2022

Uh oh!

github-actions bot commented Sep 22, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

r-barnes commented Sep 19, 2022 •

edited

Loading

pytorch-bot bot commented Sep 19, 2022 •

edited

Loading