Vectorize linspace on CPU.#27957
Conversation
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
ghstack-source-id: d54ff74
Pull Request resolved: #27957
|
@VitalyFedyunin can you review this? |
|
in progress |
VitalyFedyunin
left a comment
There was a problem hiding this comment.
Not going to work properly if result tensor is non contiguous (for example it is transposed 2d tensor with numel == steps).
Not going to work properly if result tensor is large (due to parallelism of cpu_kernel_vec, consider using serial vec).
Please write tests for both cases.
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
[ghstack-poisoned]
The test has already added a non-contiguous case. The reason it is handled is that pytorch/aten/src/ATen/native/cpu/Loops.h Line 210 in 174e1ba
I've now added a test case for large tensor. From my understanding, |
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
ghstack-source-id: 54b1259
Pull Request resolved: #27957
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
ghstack-source-id: 054a78e
Pull Request resolved: #27957
|
@VitalyFedyunin Mind reviewing again :)? |
|
Fails correctness check (please include as test): In [27]: y = torch.linspace(0, 1000000-1, 1000000)
In [28]: correct = True
In [29]: for i in range(y.shape[0]-1):
...: correct = correct and y[i] < y[i+1]
...:
In [30]: correct
Out[30]: tensor(False) |
VitalyFedyunin
left a comment
There was a problem hiding this comment.
Concurrently and incorrectly using idx
VitalyFedyunin
left a comment
There was a problem hiding this comment.
Concurrently and incorrectly using idx
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
ghstack-source-id: f616cde
Pull Request resolved: #27957
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
ghstack-source-id: 145c334
Pull Request resolved: #27957
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
ghstack-source-id: 38de789
Pull Request resolved: #27957
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
ghstack-source-id: a6a6da7
Pull Request resolved: #27957
|
Hi! Can you please rebase stack. |
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
ghstack-source-id: 7aea696
Pull Request resolved: #27957
|
@VitalyFedyunin Done! |
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```
Differential Revision: [D20773454](https://our.internmc.facebook.com/intern/diff/D20773454)
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
ghstack-source-id: 5c6a4e9
Pull Request resolved: #27957
|
@VitalyFedyunin All tests have passed |
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```
Differential Revision: [D20773454](https://our.internmc.facebook.com/intern/diff/D20773454)
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
ghstack-source-id: 13c0569
Pull Request resolved: #27957
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```
Differential Revision: [D20773454](https://our.internmc.facebook.com/intern/diff/D20773454)
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
ghstack-source-id: bc11a5e
Pull Request resolved: #27957
|
@VitalyFedyunin Any chance for this one? :) |
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```
Differential Revision: [D20773454](https://our.internmc.facebook.com/intern/diff/D20773454)
[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
```python
import timeit
for dtype in ('torch.double', 'torch.float'):
for n, t in [(40_000, 200000),
(400_000, 20000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```
ghstack-source-id: ed44553
Pull Request resolved: #27957
|
@VitalyFedyunin merged this pull request in cd48fb5. |
Summary: Pull Request resolved: pytorch#27957 Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136): ```python import timeit for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(40_000, 50000), (400_000, 5000)]: print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times') print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t)) ``` Before: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 1.3964195849839598 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 1.2374563289922662 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 1.8631796519621275 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 1.6991038109990768 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 1.8358083459897898 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 1.7214750979910605 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 1.8356257299892604 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 1.706238206999842 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 1.7463878280250356 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 1.6172360889613628 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 1.8656846070080064 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 1.714238062966615 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 1.8272205490502529 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 1.6409171230043285 ``` After: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 1.0077099470072426 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 0.8227124120458029 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 1.0058343949494883 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 0.8376779520185664 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 1.903041019977536 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 1.7576498500420712 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 1.7628699769848026 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 1.6204477970022708 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 2.0970272019621916 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 1.9493417189805768 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 2.29020385700278 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 2.1212510910118 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 2.3479344319785014 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 2.156775983981788 ``` Test Plan: Imported from OSS Differential Revision: D20773454 Pulled By: VitalyFedyunin fbshipit-source-id: ebeef59a90edde581669cc2afcc3d65929c8ac79
Stack from ghstack:
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):
Before:
After:
Differential Revision: D20773454