Sums of expanded and repeated tensors are different

## 🐛 Bug

The sum of expanded tensors is different to the sum of regular and repeated tensors and produces higher errors.
The error occurs on CPU only and error rate varies between different (Intel) CPUs.

## To Reproduce

Steps to reproduce the behavior:

1. Create a float tensor (on cpu) and expand it to a huge size
1. Sum the expanded tensor
1. Compare it to the sum of a repeated or regular tensor of the same size

Script to reproduce the behavior:

```
import torch
t1 = torch.ones((10000,10000), dtype=torch.float32, device='cpu') * 1.01
t2 = torch.tensor([[1.01]], dtype=torch.float32, device='cpu').expand((10000,10000))
t3 = torch.tensor([[1.01]], dtype=torch.float32, device='cpu').repeat((10000,10000))
print(t1.sum(), t1.mean())
print(t2.sum(), t2.mean())
print(t3.sum(), t3.mean())
```

## Expected behavior

- Consistent results among expanded, repeated, and regular tensors
- Accurate result (up to floating point precision)

## Environment

PC#1:
 - PyTorch Version (e.g., 1.0): 1.5.0+cu101
 - OS (e.g., Linux): Microsoft Windows 10 
 - How you installed PyTorch (`conda`, `pip`, source): pip
 - Python version: 3.7
 - CUDA/cuDNN version: 10.1
 - CPU: Intel(R) Core(TM) i7-1065G7

PC#2:
 - PyTorch Version (e.g., 1.0): 1.4.0
 - OS (e.g., Linux): Linux (Ubuntu 16.04 LTS)
 - How you installed PyTorch (`conda`, `pip`, source): pip
 - Python version: 3.6
 - CUDA/cuDNN version: 10.1
 - CPU: Intel(R) Core(TM) i7-7700K

PC#3 (minor difference between sums):

 - PyTorch Version (e.g., 1.0): 1.5.0+cu101
 - OS (e.g., Linux): Microsoft Windows 10
 - How you installed PyTorch (`conda`, `pip`, source): pip
 - Python version: 3.6
 - CUDA/cuDNN version: 10.1
 - CPU: Intel(R) Core(TM) i7-5820K

## Additional context

Script output on PC#1 and PC#2:
```
tensor(1.0041e+08) tensor(1.0041)
tensor(1.3292e+08) tensor(1.3292)
tensor(1.0041e+08) tensor(1.0041)
```

Script output on PC#3
```
tensor(1.0062e+08) tensor(1.0062)
tensor(1.0002e+08) tensor(1.0002)
tensor(1.0062e+08) tensor(1.0062)
```

The issue does not occur on CUDA:
```
tensor(1.0100e+08, device='cuda:0') tensor(1.0100, device='cuda:0')
tensor(1.0100e+08, device='cuda:0') tensor(1.0100, device='cuda:0')
tensor(1.0100e+08, device='cuda:0') tensor(1.0100, device='cuda:0')
```

It seems to be an issue related to floating point precision: the issue does not occur with smaller tensors or float64 tensors.
I don't know if the issue affects other tensor ops.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sums of expanded and repeated tensors are different #37234

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Sums of expanded and repeated tensors are different #37234

Description

🐛 Bug

To Reproduce

Expected behavior

Environment

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions