🐛 Bug
The sum of expanded tensors is different to the sum of regular and repeated tensors and produces higher errors.
The error occurs on CPU only and error rate varies between different (Intel) CPUs.
To Reproduce
Steps to reproduce the behavior:
- Create a float tensor (on cpu) and expand it to a huge size
- Sum the expanded tensor
- Compare it to the sum of a repeated or regular tensor of the same size
Script to reproduce the behavior:
import torch
t1 = torch.ones((10000,10000), dtype=torch.float32, device='cpu') * 1.01
t2 = torch.tensor([[1.01]], dtype=torch.float32, device='cpu').expand((10000,10000))
t3 = torch.tensor([[1.01]], dtype=torch.float32, device='cpu').repeat((10000,10000))
print(t1.sum(), t1.mean())
print(t2.sum(), t2.mean())
print(t3.sum(), t3.mean())
Expected behavior
- Consistent results among expanded, repeated, and regular tensors
- Accurate result (up to floating point precision)
Environment
PC#1:
- PyTorch Version (e.g., 1.0): 1.5.0+cu101
- OS (e.g., Linux): Microsoft Windows 10
- How you installed PyTorch (
conda, pip, source): pip
- Python version: 3.7
- CUDA/cuDNN version: 10.1
- CPU: Intel(R) Core(TM) i7-1065G7
PC#2:
- PyTorch Version (e.g., 1.0): 1.4.0
- OS (e.g., Linux): Linux (Ubuntu 16.04 LTS)
- How you installed PyTorch (
conda, pip, source): pip
- Python version: 3.6
- CUDA/cuDNN version: 10.1
- CPU: Intel(R) Core(TM) i7-7700K
PC#3 (minor difference between sums):
- PyTorch Version (e.g., 1.0): 1.5.0+cu101
- OS (e.g., Linux): Microsoft Windows 10
- How you installed PyTorch (
conda, pip, source): pip
- Python version: 3.6
- CUDA/cuDNN version: 10.1
- CPU: Intel(R) Core(TM) i7-5820K
Additional context
Script output on PC#1 and PC#2:
tensor(1.0041e+08) tensor(1.0041)
tensor(1.3292e+08) tensor(1.3292)
tensor(1.0041e+08) tensor(1.0041)
Script output on PC#3
tensor(1.0062e+08) tensor(1.0062)
tensor(1.0002e+08) tensor(1.0002)
tensor(1.0062e+08) tensor(1.0062)
The issue does not occur on CUDA:
tensor(1.0100e+08, device='cuda:0') tensor(1.0100, device='cuda:0')
tensor(1.0100e+08, device='cuda:0') tensor(1.0100, device='cuda:0')
tensor(1.0100e+08, device='cuda:0') tensor(1.0100, device='cuda:0')
It seems to be an issue related to floating point precision: the issue does not occur with smaller tensors or float64 tensors.
I don't know if the issue affects other tensor ops.
🐛 Bug
The sum of expanded tensors is different to the sum of regular and repeated tensors and produces higher errors.
The error occurs on CPU only and error rate varies between different (Intel) CPUs.
To Reproduce
Steps to reproduce the behavior:
Script to reproduce the behavior:
Expected behavior
Environment
PC#1:
conda,pip, source): pipPC#2:
conda,pip, source): pipPC#3 (minor difference between sums):
conda,pip, source): pipAdditional context
Script output on PC#1 and PC#2:
Script output on PC#3
The issue does not occur on CUDA:
It seems to be an issue related to floating point precision: the issue does not occur with smaller tensors or float64 tensors.
I don't know if the issue affects other tensor ops.