Further parallelize linspace in addition to AVX#38093
Closed
xuhdev wants to merge 9 commits intogh/xuhdev/74/basefrom
Closed
Further parallelize linspace in addition to AVX#38093xuhdev wants to merge 9 commits intogh/xuhdev/74/basefrom
xuhdev wants to merge 9 commits intogh/xuhdev/74/basefrom
Conversation
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):
```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
With AVX
========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```
Without AVX
===========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```
[ghstack-poisoned]
xuhdev
added a commit
that referenced
this pull request
May 8, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):
```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
With AVX
========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```
Without AVX
===========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```
ghstack-source-id: 1855f02
Pull Request resolved: #38093
Collaborator
Author
|
If this works, then we should be able to vectorize most other range factories similarly to what has been done here. |
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):
```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
With AVX
========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```
Without AVX
===========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```
[ghstack-poisoned]
xuhdev
added a commit
that referenced
this pull request
May 8, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):
```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
With AVX
========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```
Without AVX
===========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```
ghstack-source-id: 15d3f21
Pull Request resolved: #38093
malfet
approved these changes
May 8, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):
```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
With AVX
========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```
Without AVX
===========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```
[ghstack-poisoned]
xuhdev
added a commit
that referenced
this pull request
May 9, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):
```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
With AVX
========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```
Without AVX
===========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```
ghstack-source-id: 8f86c88
Pull Request resolved: #38093
💊 CI failures summary and remediationsAs of commit db7e032 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 14 times. |
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):
```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
With AVX
========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```
Without AVX
===========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```
Differential Revision: [D21528099](https://our.internmc.facebook.com/intern/diff/D21528099)
[ghstack-poisoned]
xuhdev
added a commit
that referenced
this pull request
May 12, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):
```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
for n, t in [(40_000, 50000),
(400_000, 5000)]:
print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```
With AVX
========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```
Without AVX
===========
Before:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```
After:
```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```
ghstack-source-id: bc63be9
Pull Request resolved: #38093
Collaborator
Author
|
rebased |
malfet
reviewed
May 12, 2020
Contributor
malfet
left a comment
There was a problem hiding this comment.
Looks like you've missed a few changes during the rebase
malfet
reviewed
May 12, 2020
Contributor
laurentdupin
pushed a commit
to laurentdupin/pytorch
that referenced
this pull request
Apr 24, 2026
Summary: Pull Request resolved: pytorch#38093 Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP): ``` import timeit for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(40_000, 50000), (400_000, 5000)]: print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times') print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t)) ``` With AVX ======== Before: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 1.0942596640015836 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 0.9209065200011537 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 1.0520610109997506 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 0.9031864690005023 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 1.949299545998656 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 1.82629113800067 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 1.9547776939980395 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 1.8259895039991534 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 2.759497356000793 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 2.6285490109985403 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 2.3456633150017296 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 2.2031515989983745 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 2.559069258000818 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 2.378239962999942 ``` After: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 0.8100852870011295 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 0.18943897200006177 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 0.6679975400002149 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 0.17846923400065862 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 1.1431112539976311 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 0.3336703610002587 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 1.157699686998967 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 0.32964968899977976 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 1.5379577429994242 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 0.4638638729993545 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 1.360489848000725 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 0.4033017760011717 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 1.4591587399991113 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 0.44132660000104806 ``` Without AVX =========== Before: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 3.4967273879992717 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 3.330881046000286 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 2.176502857997548 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 2.023505228000431 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 2.117801246000454 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 1.9885458380013006 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 2.1057261179994384 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 1.9809251260012388 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 3.187070896001387 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 3.049615387000813 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 3.4874590049985272 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 3.33596555099939 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 4.256659758000751 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 4.100936053000623 ``` After: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 1.9155298300029244 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 0.598213522000151 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 1.3183841649988608 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 0.40136947100108955 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 1.2191377319977619 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 0.35984685299990815 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 1.2153874989999167 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 0.35752785600197967 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 1.750796647000243 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 0.5376063230032742 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 1.9153429929974664 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 0.5952553579991218 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 2.281823589000851 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 0.7391443560009066 ``` Differential Revision: D21528099 Test Plan: Imported from OSS Pulled By: malfet fbshipit-source-id: a6b3904e7860bb6d652a48b2056154509e73157d
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack:
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):
With AVX
Before:
After:
Without AVX
Before:
After:
Differential Revision: D21528099