CUDA errors are delayed and may occur several calls after the real error site. This can make it difficult to debug in CI if you can't reproduce locally. One way to make debugging easier for people is to (1) make sure we synchronize at the end of each test and (2) rerun the failing test with CUDA_LAUNCH_BLOCKING=1 so that you can find out exactly which CUDA call caused the assert error.
cc @ngimel @mruberry @VitalyFedyunin @walterddr
CUDA errors are delayed and may occur several calls after the real error site. This can make it difficult to debug in CI if you can't reproduce locally. One way to make debugging easier for people is to (1) make sure we synchronize at the end of each test and (2) rerun the failing test with CUDA_LAUNCH_BLOCKING=1 so that you can find out exactly which CUDA call caused the assert error.
cc @ngimel @mruberry @VitalyFedyunin @walterddr