Skip to content

C10_CUDA_KERNEL_LAUNCH_CHECK() becomes a no-op #91758

@ppwwyyxx

Description

@ppwwyyxx

🐛 Describe the bug

C10_CUDA_KERNEL_LAUNCH_CHECK calls cudaGetLastError:

#define C10_CUDA_KERNEL_LAUNCH_CHECK() C10_CUDA_CHECK(cudaGetLastError())

however, the result is discarded!

#define C10_CUDA_CHECK(EXPR) \
do { \
/* We get & disarm the error inside of */ \
/* `c10_cuda_check_implementation` */ \
C10_UNUSED const cudaError_t __err = EXPR; \
c10::cuda::c10_cuda_check_implementation( \

The code claims that the error will be obtained in c10_cuda_check_implementation, but that's not always true: many errors are non-sticky, meaning that after cudaGetLastError the error state is reset. In that case, this macro becomes a no-op.

In particular, launching a kernel with too many resources (too many registers & grid/block size) produces an error that's silently ignored by this macro -- and the kernel is silently not run 😢 . Would be good to create a unittest about catching such errors.

It seems this bug was introduced in #85256 cc @ezyang @gchanan @zou3519 @ngimel @r-barnes .

Versions

n/a

Metadata

Metadata

Assignees

Labels

high prioritymodule: cudaRelated to torch.cuda, and CUDA support in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions