-
Notifications
You must be signed in to change notification settings - Fork 27.4k
C10_CUDA_KERNEL_LAUNCH_CHECK() becomes a no-op #91758
Description
🐛 Describe the bug
C10_CUDA_KERNEL_LAUNCH_CHECK calls cudaGetLastError:
pytorch/c10/cuda/CUDAException.h
Line 73 in 18b37bb
| #define C10_CUDA_KERNEL_LAUNCH_CHECK() C10_CUDA_CHECK(cudaGetLastError()) |
however, the result is discarded!
pytorch/c10/cuda/CUDAException.h
Lines 27 to 32 in 18b37bb
| #define C10_CUDA_CHECK(EXPR) \ | |
| do { \ | |
| /* We get & disarm the error inside of */ \ | |
| /* `c10_cuda_check_implementation` */ \ | |
| C10_UNUSED const cudaError_t __err = EXPR; \ | |
| c10::cuda::c10_cuda_check_implementation( \ |
The code claims that the error will be obtained in c10_cuda_check_implementation, but that's not always true: many errors are non-sticky, meaning that after cudaGetLastError the error state is reset. In that case, this macro becomes a no-op.
In particular, launching a kernel with too many resources (too many registers & grid/block size) produces an error that's silently ignored by this macro -- and the kernel is silently not run 😢 . Would be good to create a unittest about catching such errors.
It seems this bug was introduced in #85256 cc @ezyang @gchanan @zou3519 @ngimel @r-barnes .
Versions
n/a