Skip to content

TestCommonCUDA.test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32 fails when TF32 is enabled #67947

@crcrpar

Description

@crcrpar

With TF32,

$ pytest test/test_ops.py -k "test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32"
========================================================================================================================================================================================================== test session starts ===========================================================================================================================================================================================================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/mkozuki/ghq/github.com/crcrpar/torch-1, configfile: pytest.ini
plugins: hypothesis-6.14.1
collected 29688 items / 29687 deselected / 1 selected

test/test_ops.py F                                                                                                                                                                                                                                                                                                                                                                                                                 [100%]

================================================================================================================================================================================================================ FAILURES ================================================================================================================================================================================================================
______________________________________________________________________________________________________________________________________________________________________________ TestCommonCUDA.test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32 ______________________________________________________________________________________________________________________________________________________________________________
Traceback (most recent call last):
  File "/home/mkozuki/ghq/github.com/crcrpar/torch-1/test/test_ops.py", line 263, in test_noncontiguous_samples
    self.assertEqual(actual_grad, expected_grad)
  File "/home/mkozuki/ghq/github.com/crcrpar/torch-1/torch/testing/_internal/common_utils.py", line 1903, in assertEqual
    super().assertTrue(result, msg=self._get_assert_msg(msg, debug_msg=debug_msg))
  File "/home/mkozuki/anaconda3/envs/torch-1/lib/python3.8/unittest/case.py", line 765, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 50 element(s) (out of 50) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 2.54345703125 (-4442.3525390625 vs. -4444.89599609375), which occurred at index (1, 3, 3).
======================================================================================================================================================================================================== short test summary info =========================================================================================================================================================================================================
FAILED test/test_ops.py::TestCommonCUDA::test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32 - AssertionError: False is not true : Tensors failed to compare as equal!With rtol=1.3e-06 and atol=1e-05, found 50 element(s) (out of 50) whose difference(s) exceeded the margin of error (including 0 nan comparisons). The greatest difference was 2.54345703125 (-4442.3525390625 vs. -4444.89599609375), which occurred at...
================================================================================================================================================================================================== 1 failed, 29687 deselected in 5.60s ===================================================================================================================================================================================================

while without TF32

$ NVIDIA_TF32_OVERRIDE=0 pytest test/test_ops.py -k "test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32"
========================================================================================================================================================================================================== test session starts ===========================================================================================================================================================================================================
platform linux -- Python 3.8.10, pytest-6.2.4, py-1.10.0, pluggy-0.13.1
rootdir: /home/mkozuki/ghq/github.com/crcrpar/torch-1, configfile: pytest.ini
plugins: hypothesis-6.14.1
collected 29688 items / 29687 deselected / 1 selected

test/test_ops.py .                                                                                                                                                                                                                                                                                                                                                                                                                 [100%]

================================================================================================================================================================================================== 1 passed, 29687 deselected in 5.80s ===================================================================================================================================================================================================

Expected behavior

pinv_backward does math in FP32 even when TF32 is available.

Tensor pinv_backward(
const Tensor& grad,
const Tensor& pinvA,
const Tensor& A
) {
auto m = A.size(-2);
auto n = A.size(-1);
auto pinvAh = pinvA.mH();
auto gradh = grad.mH();
// optimization to produce matrices of the smallest dimension
if (m <= n) {
auto K = gradh.matmul(pinvA);
auto KpinvAh = K.matmul(pinvAh);
return - (pinvA.matmul(K)).mH()
+ KpinvAh - (A.matmul(pinvA)).matmul(KpinvAh)
+ (pinvAh.matmul(pinvA)).matmul(gradh - K.matmul(A));
}
else {
auto K = pinvA.matmul(gradh);
auto pinvAhK = pinvAh.matmul(K);
return - (K.matmul(pinvA)).mH()
+ (gradh - A.matmul(K)).matmul(pinvA).matmul(pinvAh)
+ pinvAhK - pinvAhK.matmul(pinvA).matmul(A);
}
}

Related PRs

Environment

$ python torch/utils/collect_env.py
Collecting environment information...
PyTorch version: 1.10.0a0+git571a2be
Is debug build: False
CUDA used to build PyTorch: 11.4
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.2 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.19.6
Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun  4 2021, 15:09:15)  [GCC 7.5.0] (64-bit runtime)
Python platform: Linux-5.8.0-55-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 11.4.100
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3080
Nvidia driver version: 470.63.01
cuDNN version: Probably one of the following:
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn.so.8.2.2
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_adv_infer.so.8.2.2
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_adv_train.so.8.2.2
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_cnn_infer.so.8.2.2
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_cnn_train.so.8.2.2
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_ops_infer.so.8.2.2
/usr/local/cuda-11.4/targets/x86_64-linux/lib/libcudnn_ops_train.so.8.2.2
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.20.2
[pip3] torch==1.11.0a0+git9e8016d
[conda] blas                      1.0                         mkl
[conda] magma-cuda111             2.5.2                         1    pytorch
[conda] mkl                       2021.2.0           h06a4308_296
[conda] mkl-include               2021.2.0           h06a4308_296
[conda] mkl-service               2.3.0            py38h27cfd23_1
[conda] mkl_fft                   1.3.0            py38h42c9631_2
[conda] mkl_random                1.2.1            py38ha9443f7_2
[conda] numpy                     1.20.2           py38h2d18471_0
[conda] numpy-base                1.20.2           py38hfae3a4d_0
[conda] torch                     1.11.0a0+git9e8016d           dev_0    <develop>

cc @mruberry @zasdfgbnm @ptrblck

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: testsIssues related to tests (not the torch.testing module)module: tf32Related to tf32 data formattriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions