`torch.linalg.solve` yields much lower precisions in `1.13.0` than previous versions

### 🐛 Describe the bug

After upgrading to `torch 1.13.0`, `torch.linalg.solve` suddenly gives solutions with much lower precisions, **regardless of device (`cpu` or `gpu`) or type (`float64` or `float32`)**. The errors quickly escalate in my numerical calculations and break down my simulations.

Take the following data as an example (I know it is somewhat ill-conditioned, but the changes in behaviors are real)
```python
import torch
torch.set_default_dtype(torch.float64)
torch.backends.cuda.matmul.allow_tf32 = False
A = torch.tensor([
    [ 3.8025705376834739e-07, -9.1719365342788720e-07, -6.7124337949782264e-06, -6.4837019110456791e-05, -7.0869999797614066e-04, -1.0694859984690733e-02, -3.2912231531790004e-01, -6.6347339870464399e+00, -8.2509761085708249e+01,  0.0000000000000000e+00],
    [ 0.0000000000000000e+00,  4.4000124553730829e-07, -5.5080918253708871e-07, -5.1498277032055974e-06, -5.7818057148617599e-05, -9.1226448867859551e-04, -2.2619326362175465e-02, -4.4038788530099793e-01, -5.1992675801721502e+00,  0.0000000000000000e+00],
    [ 0.0000000000000000e+00, -1.0669700681643825e-10,  4.3768558191229986e-07, -4.3974816153203019e-07, -4.8865127972067992e-06, -7.8116560507683326e-05, -1.7589402883070333e-03, -3.3666362131922367e-02, -3.8659142733749491e-01,  0.0000000000000000e+00],
    [ 0.0000000000000000e+00, -7.8216940301197729e-12, -1.5895421888461478e-10,  4.3542984469163267e-07, -4.0043248885844276e-07, -6.6798905178796823e-06, -1.3761857019311234e-04, -2.5943507621790695e-03, -2.9003633389177604e-02,  0.0000000000000000e+00],
    [ 0.0000000000000000e+00, -2.4603969583879200e-13, -6.0925772512004975e-12, -1.9886454656863128e-10,  4.3370279880257098e-07, -5.6639032522315289e-07, -1.0649799471193429e-05, -1.9808440853565822e-04, -2.1583707954594099e-03,  0.0000000000000000e+00],
    [ 0.0000000000000000e+00, -1.4999959257460881e-15, -3.2831398418930186e-14, -8.8714562886788080e-13, -4.3280772005187299e-11,  4.4148762039828565e-07, -6.8089481270669943e-07, -1.4575015323337058e-05, -1.5597848962814291e-04,  0.0000000000000000e+00],
    [ 0.0000000000000000e+00, -3.6858575028157790e-16, -7.2036090445864899e-15, -1.4349791509103240e-13, -2.9849302443991965e-12,  6.3914122655929791e-10,  4.6448551809896547e-07, -6.8453604307207769e-07, -1.0332761488908590e-05,  0.0000000000000000e+00],
    [ 0.0000000000000000e+00, -3.7045642770024088e-17, -7.2015144280333478e-16, -1.4158860652466324e-14, -2.8662564585632735e-13, -6.2285079180541528e-12,  1.5090963357302090e-09,  4.8979817748389458e-07, -1.2863401745116974e-07,  0.0000000000000000e+00],
    [ 0.0000000000000000e+00, -2.3760629594245614e-18, -4.6007155546998113e-17, -8.9513844792609796e-16, -1.7640414722799569e-14, -3.5935860384434572e-13, -7.9429359080595169e-12,  2.0146206213869421e-09,  4.7959403001188342e-07,  0.0000000000000000e+00],
    [ 0.0000000000000000e+00,  0.0000000000000000e+00,  0.0000000000000000e+00,  0.0000000000000000e+00,  0.0000000000000000e+00,  0.0000000000000000e+00,  0.0000000000000000e+00,  0.0000000000000000e+00,  0.0000000000000000e+00,  3.8025705376834739e-07]
])
b = torch.tensor(
    [ 6.9677181015078851e+04,  3.9337825712781823e+03,  2.7914109655787729e+02,  1.9895852311404216e+01,  1.3819016836738420e+00,  7.5229947004102571e-02,  1.3433804143281360e-03, -3.1421146091483441e-04, -2.8076324348838071e-05,  0.0000000000000000e+00]
)
```

With `torch 1.12.1`, the relative errors are around machine-precision (a few 1e-16), which is *consistent with the precision obtained from `numpy` or `cupy`*
```python
In [1]: (A @ torch.linalg.solve(A, b) - b) / b
tensor([ 0.0000000000000000e+00,  0.0000000000000000e+00,  0.0000000000000000e+00,  0.0000000000000000e+00,  1.6068046486108669e-16,
        -3.6894317650011501e-16,  0.0000000000000000e+00, -0.0000000000000000e+00,  3.6202728109145290e-16,                     nan])
```

However, **with `torch 1.13.0`, the relative errors are huge (max at 5e-11)**
```python
In [2]: (A @ torch.linalg.solve(A, b) - b) / b
tensor([-2.0884764590602007e-16,  4.6240212075443264e-16,  0.0000000000000000e+00, -1.7856554337026822e-16, -4.1776920863882539e-15,
        -8.7255061242277206e-14,  5.0944524844510106e-11, -2.0456409676328997e-11, -4.9269499441339466e-12,                     nan])
```
---
Below are more comparisons using `torch.float64` and `cuda`
```python
In [1]: A = torch.tensor([ ... ], device=torch.device('cuda'))
In [2]: b = torch.tensor([ ... ], device=torch.device('cuda'))
In [3]: (A @ torch.linalg.solve(A, b) - b) / b  # with torch 1.12.1
tensor([ 0.0000e+00,  1.1560e-16,  0.0000e+00,  0.0000e+00,  0.0000e+00,
        -1.8447e-16,  0.0000e+00,  1.7253e-16,  3.6203e-16,         nan],
       device='cuda:0')
In [4]: (A @ torch.linalg.solve(A, b) - b) / b  # with torch 1.13.0
tensor([-2.0885e-16,  0.0000e+00, -2.0364e-16, -7.1426e-16, -1.7675e-15,
         4.1875e-14,  4.4228e-11, -1.2897e-11, -3.0743e-12,         nan],
       device='cuda:0')
```
And more comparisons using `torch.float32` and `cpu`
```python
In [1]: torch.set_default_dtype(torch.float32)
In [2]: torch.backends.cuda.matmul.allow_tf32 = True
In [3]: (A @ torch.linalg.solve(A, b) - b) / b  # with torch 1.12.1
tensor([-1.1212e-07,  0.0000e+00,  0.0000e+00,  0.0000e+00,  8.6265e-08,
         1.9807e-07, -8.6658e-08,  9.2625e-08, -0.0000e+00,         nan])
In [4]: (A @ torch.linalg.solve(A, b) - b) / b  # with torch 1.13.0
tensor([-1.1212e-07,  6.2063e-08, -1.0933e-07, -9.5867e-08, -2.3291e-06,
        -4.0902e-05, -2.2294e-02, -2.5929e-03, -1.9909e-03,         nan])
```





### Versions

For tests with `torch 1.12.1`, the output is
```bash
Collecting environment information...
PyTorch version: 1.12.1+cu116
Is debug build: False
CUDA used to build PyTorch: 11.6
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Sep 28 2021, 16:10:42)  [GCC 9.3.0] (64-bit runtime)
Python platform: Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to:
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
Nvidia driver version: 527.37
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.5
[pip3] pytorch-memlab==0.2.4
[pip3] torch==1.12.1+cu116
[pip3] torchaudio==0.12.1+cu116
[pip3] torchvision==0.13.1+cu116
[pip3] xitorch==0.3.0
[conda] No relevant packages
```

For tests with `torch 1.13.0`, the output is
```bash
PyTorch version: 1.13.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.5 LTS (x86_64)
GCC version: (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
Clang version: 10.0.0-4ubuntu1
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8.10 (default, Jun 22 2022, 20:18:18)  [GCC 9.4.0] (64-bit runtime)
Python platform: Linux-5.15.79.1-microsoft-standard-WSL2-x86_64-with-glibc2.29
Is CUDA available: True
CUDA runtime version: 11.8.89
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
Nvidia driver version: 527.37
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] numpy==1.23.5
[pip3] torch==1.13.0
[pip3] torchaudio==0.13.0
[pip3] torchvision==0.14.0
[conda] No relevant packages
```

cc @jianyuh @nikitaved @pearu @mruberry @walterddr @IvanYashchuk @xwang233 @Lezcano

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`torch.linalg.solve` yields much lower precisions in `1.13.0` than previous versions #90453

🐛 Describe the bug

Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

torch.linalg.solve yields much lower precisions in 1.13.0 than previous versions #90453

Description

🐛 Describe the bug

Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`torch.linalg.solve` yields much lower precisions in `1.13.0` than previous versions #90453