Skip to content

Sums of expanded and repeated tensors are different #37234

@stefanopini

Description

@stefanopini

🐛 Bug

The sum of expanded tensors is different to the sum of regular and repeated tensors and produces higher errors.
The error occurs on CPU only and error rate varies between different (Intel) CPUs.

To Reproduce

Steps to reproduce the behavior:

  1. Create a float tensor (on cpu) and expand it to a huge size
  2. Sum the expanded tensor
  3. Compare it to the sum of a repeated or regular tensor of the same size

Script to reproduce the behavior:

import torch
t1 = torch.ones((10000,10000), dtype=torch.float32, device='cpu') * 1.01
t2 = torch.tensor([[1.01]], dtype=torch.float32, device='cpu').expand((10000,10000))
t3 = torch.tensor([[1.01]], dtype=torch.float32, device='cpu').repeat((10000,10000))
print(t1.sum(), t1.mean())
print(t2.sum(), t2.mean())
print(t3.sum(), t3.mean())

Expected behavior

  • Consistent results among expanded, repeated, and regular tensors
  • Accurate result (up to floating point precision)

Environment

PC#1:

  • PyTorch Version (e.g., 1.0): 1.5.0+cu101
  • OS (e.g., Linux): Microsoft Windows 10
  • How you installed PyTorch (conda, pip, source): pip
  • Python version: 3.7
  • CUDA/cuDNN version: 10.1
  • CPU: Intel(R) Core(TM) i7-1065G7

PC#2:

  • PyTorch Version (e.g., 1.0): 1.4.0
  • OS (e.g., Linux): Linux (Ubuntu 16.04 LTS)
  • How you installed PyTorch (conda, pip, source): pip
  • Python version: 3.6
  • CUDA/cuDNN version: 10.1
  • CPU: Intel(R) Core(TM) i7-7700K

PC#3 (minor difference between sums):

  • PyTorch Version (e.g., 1.0): 1.5.0+cu101
  • OS (e.g., Linux): Microsoft Windows 10
  • How you installed PyTorch (conda, pip, source): pip
  • Python version: 3.6
  • CUDA/cuDNN version: 10.1
  • CPU: Intel(R) Core(TM) i7-5820K

Additional context

Script output on PC#1 and PC#2:

tensor(1.0041e+08) tensor(1.0041)
tensor(1.3292e+08) tensor(1.3292)
tensor(1.0041e+08) tensor(1.0041)

Script output on PC#3

tensor(1.0062e+08) tensor(1.0062)
tensor(1.0002e+08) tensor(1.0002)
tensor(1.0062e+08) tensor(1.0062)

The issue does not occur on CUDA:

tensor(1.0100e+08, device='cuda:0') tensor(1.0100, device='cuda:0')
tensor(1.0100e+08, device='cuda:0') tensor(1.0100, device='cuda:0')
tensor(1.0100e+08, device='cuda:0') tensor(1.0100, device='cuda:0')

It seems to be an issue related to floating point precision: the issue does not occur with smaller tensors or float64 tensors.
I don't know if the issue affects other tensor ops.

Metadata

Metadata

Assignees

Labels

triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions