Skip to content

Further parallelize linspace in addition to AVX#38093

Closed
xuhdev wants to merge 9 commits intogh/xuhdev/74/basefrom
gh/xuhdev/74/head
Closed

Further parallelize linspace in addition to AVX#38093
xuhdev wants to merge 9 commits intogh/xuhdev/74/basefrom
gh/xuhdev/74/head

Conversation

@xuhdev
Copy link
Copy Markdown
Collaborator

@xuhdev xuhdev commented May 8, 2020

Stack from ghstack:

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):

import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))

With AVX

Before:

torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942

After:

torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806

Without AVX

Before:

torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623

After:

torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066

Differential Revision: D21528099

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):

```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

With AVX
========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```

Without AVX
===========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request May 8, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):

```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

With AVX
========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```

Without AVX
===========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```

ghstack-source-id: 1855f02
Pull Request resolved: #38093
@xuhdev xuhdev requested a review from VitalyFedyunin May 8, 2020 01:11
@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented May 8, 2020

If this works, then we should be able to vectorize most other range factories similarly to what has been done here.

@xuhdev xuhdev requested review from gchanan and ngimel May 8, 2020 17:50
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):

```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

With AVX
========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```

Without AVX
===========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request May 8, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):

```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

With AVX
========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```

Without AVX
===========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```

ghstack-source-id: 15d3f21
Pull Request resolved: #38093
Comment thread aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp Outdated
Comment thread aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp Outdated
@malfet malfet added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 8, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):

```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

With AVX
========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```

Without AVX
===========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request May 9, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):

```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

With AVX
========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```

Without AVX
===========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```

ghstack-source-id: 8f86c88
Pull Request resolved: #38093
Copy link
Copy Markdown
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xuhdev please rebase this PR, as it can not be cleanly merged into master.

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented May 12, 2020

💊 CI failures summary and remediations

As of commit db7e032 (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-CircleCI failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

See how this bot performed.

This comment has been revised 14 times.

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):

```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

With AVX
========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```

Without AVX
===========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```

Differential Revision: [D21528099](https://our.internmc.facebook.com/intern/diff/D21528099)

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request May 12, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):

```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

With AVX
========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```

Without AVX
===========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```

ghstack-source-id: bc63be9
Pull Request resolved: #38093
@xuhdev xuhdev requested a review from malfet May 12, 2020 20:12
@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented May 12, 2020

rebased

Copy link
Copy Markdown
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you've missed a few changes during the rebase

Comment thread aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp Outdated
Comment thread aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp Outdated
Comment thread aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp Outdated
Comment thread aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp Outdated
Comment thread aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp Outdated
Copy link
Copy Markdown
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will land if CI is green

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@malfet merged this pull request in 61bea93.

@xuhdev xuhdev deleted the gh/xuhdev/74/head branch May 13, 2020 18:08
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#38093

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP):

```
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

With AVX
========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0942596640015836
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.9209065200011537
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0520610109997506
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.9031864690005023
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.949299545998656
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.82629113800067
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.9547776939980395
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.8259895039991534
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.759497356000793
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
2.6285490109985403
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.3456633150017296
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.2031515989983745
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.559069258000818
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.378239962999942
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
0.8100852870011295
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.18943897200006177
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
0.6679975400002149
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.17846923400065862
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.1431112539976311
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.3336703610002587
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.157699686998967
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.32964968899977976
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.5379577429994242
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.4638638729993545
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.360489848000725
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.4033017760011717
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.4591587399991113
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.44132660000104806
```

Without AVX
===========

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
3.4967273879992717
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
3.330881046000286
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
2.176502857997548
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
2.023505228000431
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
2.117801246000454
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.9885458380013006
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
2.1057261179994384
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.9809251260012388
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
3.187070896001387
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
3.049615387000813
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
3.4874590049985272
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
3.33596555099939
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
4.256659758000751
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
4.100936053000623
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.9155298300029244
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.598213522000151
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.3183841649988608
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.40136947100108955
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.2191377319977619
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
0.35984685299990815
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.2153874989999167
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
0.35752785600197967
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.750796647000243
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
0.5376063230032742
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.9153429929974664
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
0.5952553579991218
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.281823589000851
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
0.7391443560009066
```

Differential Revision: D21528099

Test Plan: Imported from OSS

Pulled By: malfet

fbshipit-source-id: a6b3904e7860bb6d652a48b2056154509e73157d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants