C10_CUDA_KERNEL_LAUNCH_CHECK() becomes a no-op

### 🐛 Describe the bug

`C10_CUDA_KERNEL_LAUNCH_CHECK` calls `cudaGetLastError`:

https://github.com/pytorch/pytorch/blob/18b37bbff91d5746e5a1ec63bf82ce24a717434f/c10/cuda/CUDAException.h#L73

however, the result is discarded!

https://github.com/pytorch/pytorch/blob/18b37bbff91d5746e5a1ec63bf82ce24a717434f/c10/cuda/CUDAException.h#L27-L32

The code claims that the error will be obtained in `c10_cuda_check_implementation`, but that's not always true: many errors are non-sticky, meaning that after `cudaGetLastError` the error state is reset. In that case, this macro becomes a no-op.

In particular, launching a kernel with too many resources (too many registers & grid/block size) produces an error that's silently ignored by this macro -- and the kernel is silently not run :cry:  . Would be good to create a unittest about catching such errors.

It seems this bug was introduced in https://github.com/pytorch/pytorch/pull/85256 cc @ezyang @gchanan @zou3519 @ngimel @r-barnes .


### Versions

n/a

	#define C10_CUDA_CHECK(EXPR) \
	do { \
	/* We get & disarm the error inside of */ \
	/* `c10_cuda_check_implementation` */ \
	C10_UNUSED const cudaError_t __err = EXPR; \
	c10::cuda::c10_cuda_check_implementation( \

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C10_CUDA_KERNEL_LAUNCH_CHECK() becomes a no-op #91758

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

C10_CUDA_KERNEL_LAUNCH_CHECK() becomes a no-op #91758

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions