🐛 Describe the bug
After upgrading to torch 1.13.0, torch.linalg.solve suddenly gives solutions with much lower precisions, regardless of device (cpu or gpu) or type (float64 or float32). The errors quickly escalate in my numerical calculations and break down my simulations.
Take the following data as an example (I know it is somewhat ill-conditioned, but the changes in behaviors are real)
import torch
torch.set_default_dtype(torch.float64)
torch.backends.cuda.matmul.allow_tf32 = False
A = torch.tensor([
[ 3.8025705376834739e-07, -9.1719365342788720e-07, -6.7124337949782264e-06, -6.4837019110456791e-05, -7.0869999797614066e-04, -1.0694859984690733e-02, -3.2912231531790004e-01, -6.6347339870464399e+00, -8.2509761085708249e+01, 0.0000000000000000e+00],
[ 0.0000000000000000e+00, 4.4000124553730829e-07, -5.5080918253708871e-07, -5.1498277032055974e-06, -5.7818057148617599e-05, -9.1226448867859551e-04, -2.2619326362175465e-02, -4.4038788530099793e-01, -5.1992675801721502e+00, 0.0000000000000000e+00],
[ 0.0000000000000000e+00, -1.0669700681643825e-10, 4.3768558191229986e-07, -4.3974816153203019e-07, -4.8865127972067992e-06, -7.8116560507683326e-05, -1.7589402883070333e-03, -3.3666362131922367e-02, -3.8659142733749491e-01, 0.0000000000000000e+00],
[ 0.0000000000000000e+00, -7.8216940301197729e-12, -1.5895421888461478e-10, 4.3542984469163267e-07, -4.0043248885844276e-07, -6.6798905178796823e-06, -1.3761857019311234e-04, -2.5943507621790695e-03, -2.9003633389177604e-02, 0.0000000000000000e+00],
[ 0.0000000000000000e+00, -2.4603969583879200e-13, -6.0925772512004975e-12, -1.9886454656863128e-10, 4.3370279880257098e-07, -5.6639032522315289e-07, -1.0649799471193429e-05, -1.9808440853565822e-04, -2.1583707954594099e-03, 0.0000000000000000e+00],
[ 0.0000000000000000e+00, -1.4999959257460881e-15, -3.2831398418930186e-14, -8.8714562886788080e-13, -4.3280772005187299e-11, 4.4148762039828565e-07, -6.8089481270669943e-07, -1.4575015323337058e-05, -1.5597848962814291e-04, 0.0000000000000000e+00],
[ 0.0000000000000000e+00, -3.6858575028157790e-16, -7.2036090445864899e-15, -1.4349791509103240e-13, -2.9849302443991965e-12, 6.3914122655929791e-10, 4.6448551809896547e-07, -6.8453604307207769e-07, -1.0332761488908590e-05, 0.0000000000000000e+00],
[ 0.0000000000000000e+00, -3.7045642770024088e-17, -7.2015144280333478e-16, -1.4158860652466324e-14, -2.8662564585632735e-13, -6.2285079180541528e-12, 1.5090963357302090e-09, 4.8979817748389458e-07, -1.2863401745116974e-07, 0.0000000000000000e+00],
[ 0.0000000000000000e+00, -2.3760629594245614e-18, -4.6007155546998113e-17, -8.9513844792609796e-16, -1.7640414722799569e-14, -3.5935860384434572e-13, -7.9429359080595169e-12, 2.0146206213869421e-09, 4.7959403001188342e-07, 0.0000000000000000e+00],
[ 0.0000000000000000e+00, 0.0000000000000000e+00, 0.0000000000000000e+00, 0.0000000000000000e+00, 0.0000000000000000e+00, 0.0000000000000000e+00, 0.0000000000000000e+00, 0.0000000000000000e+00, 0.0000000000000000e+00, 3.8025705376834739e-07]
])
b = torch.tensor(
[ 6.9677181015078851e+04, 3.9337825712781823e+03, 2.7914109655787729e+02, 1.9895852311404216e+01, 1.3819016836738420e+00, 7.5229947004102571e-02, 1.3433804143281360e-03, -3.1421146091483441e-04, -2.8076324348838071e-05, 0.0000000000000000e+00]
)
With torch 1.12.1, the relative errors are around machine-precision (a few 1e-16), which is consistent with the precision obtained from numpy or cupy
In [1]: (A @ torch.linalg.solve(A, b) - b) / b
tensor([ 0.0000000000000000e+00, 0.0000000000000000e+00, 0.0000000000000000e+00, 0.0000000000000000e+00, 1.6068046486108669e-16,
-3.6894317650011501e-16, 0.0000000000000000e+00, -0.0000000000000000e+00, 3.6202728109145290e-16, nan])
However, with torch 1.13.0, the relative errors are huge (max at 5e-11)
In [2]: (A @ torch.linalg.solve(A, b) - b) / b
tensor([-2.0884764590602007e-16, 4.6240212075443264e-16, 0.0000000000000000e+00, -1.7856554337026822e-16, -4.1776920863882539e-15,
-8.7255061242277206e-14, 5.0944524844510106e-11, -2.0456409676328997e-11, -4.9269499441339466e-12, nan])
Below are more comparisons using torch.float64 and cuda
In [1]: A = torch.tensor([ ... ], device=torch.device('cuda'))
In [2]: b = torch.tensor([ ... ], device=torch.device('cuda'))
In [3]: (A @ torch.linalg.solve(A, b) - b) / b # with torch 1.12.1
tensor([ 0.0000e+00, 1.1560e-16, 0.0000e+00, 0.0000e+00, 0.0000e+00,
-1.8447e-16, 0.0000e+00, 1.7253e-16, 3.6203e-16, nan],
device='cuda:0')
In [4]: (A @ torch.linalg.solve(A, b) - b) / b # with torch 1.13.0
tensor([-2.0885e-16, 0.0000e+00, -2.0364e-16, -7.1426e-16, -1.7675e-15,
4.1875e-14, 4.4228e-11, -1.2897e-11, -3.0743e-12, nan],
device='cuda:0')
And more comparisons using torch.float32 and cpu
In [1]: torch.set_default_dtype(torch.float32)
In [2]: torch.backends.cuda.matmul.allow_tf32 = True
In [3]: (A @ torch.linalg.solve(A, b) - b) / b # with torch 1.12.1
tensor([-1.1212e-07, 0.0000e+00, 0.0000e+00, 0.0000e+00, 8.6265e-08,
1.9807e-07, -8.6658e-08, 9.2625e-08, -0.0000e+00, nan])
In [4]: (A @ torch.linalg.solve(A, b) - b) / b # with torch 1.13.0
tensor([-1.1212e-07, 6.2063e-08, -1.0933e-07, -9.5867e-08, -2.3291e-06,
-4.0902e-05, -2.2294e-02, -2.5929e-03, -1.9909e-03, nan])
Versions
For tests with torch 1.12.1, the output is
Collecting environment information...
PyTorch version: 1.12.1+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to:
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
Nvidia driver version: 527.37
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.5
[pip3] pytorch-memlab==0.2.4
[pip3] torch==1.12.1+cu116
[pip3] torchaudio==0.12.1+cu116
[pip3] torchvision==0.13.1+cu116
[pip3] xitorch==0.3.0
[conda] No relevant packages
For tests with torch 1.13.0, the output is
PyTorch version: 1.13.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A
OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31
Python version: 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
Nvidia driver version: 527.37
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==1.13.0
[pip3] torchaudio==0.13.0
[pip3] torchvision==0.14.0
[conda] No relevant packages
cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @lezcano
🐛 Describe the bug
After upgrading to
torch 1.13.0,torch.linalg.solvesuddenly gives solutions with much lower precisions, regardless of device (cpuorgpu) or type (float64orfloat32). The errors quickly escalate in my numerical calculations and break down my simulations.Take the following data as an example (I know it is somewhat ill-conditioned, but the changes in behaviors are real)
With
torch 1.12.1, the relative errors are around machine-precision (a few 1e-16), which is consistent with the precision obtained fromnumpyorcupyHowever, with
torch 1.13.0, the relative errors are huge (max at 5e-11)Below are more comparisons using
torch.float64andcudaAnd more comparisons using
torch.float32andcpuVersions
For tests with
torch 1.12.1, the output isCollecting environment information... PyTorch version: 1.12.1+cu116 Is debug build: False CUDA used to build PyTorch: 11.6 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: 10.0.0-4ubuntu1 CMake version: version 3.16.3 Libc version: glibc-2.31 Python version: 3.8.10 (default, Sep 28 2021, 16:10:42) [GCC 9.3.0] (64-bit runtime) Python platform: Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU Nvidia driver version: 527.37 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Versions of relevant libraries: [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.23.5 [pip3] pytorch-memlab==0.2.4 [pip3] torch==1.12.1+cu116 [pip3] torchaudio==0.12.1+cu116 [pip3] torchvision==0.13.1+cu116 [pip3] xitorch==0.3.0 [conda] No relevant packagesFor tests with
torch 1.13.0, the output isPyTorch version: 1.13.0+cu117 Is debug build: False CUDA used to build PyTorch: 11.7 ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.5 LTS (x86_64) GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0 Clang version: 10.0.0-4ubuntu1 CMake version: version 3.16.3 Libc version: glibc-2.31 Python version: 3.8.10 (default, Jun 22 2022, 20:18:18) [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with-glibc2.29 Is CUDA available: True CUDA runtime version: 11.8.89 CUDA_MODULE_LOADING set to: LAZY GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU Nvidia driver version: 527.37 cuDNN version: Could not collect HIP runtime version: N/A MIOpen runtime version: N/A Is XNNPACK available: True Versions of relevant libraries: [pip3] numpy==1.23.5 [pip3] torch==1.13.0 [pip3] torchaudio==0.13.0 [pip3] torchvision==0.14.0 [conda] No relevant packagescc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @lezcano