Skip to content

Vectorize linspace on CPU.#27957

Closed
xuhdev wants to merge 36 commits intogh/xuhdev/42/basefrom
gh/xuhdev/42/head
Closed

Vectorize linspace on CPU.#27957
xuhdev wants to merge 36 commits intogh/xuhdev/42/basefrom
gh/xuhdev/42/head

Conversation

@xuhdev
Copy link
Copy Markdown
Collaborator

@xuhdev xuhdev commented Oct 15, 2019

Stack from ghstack:

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))

Before:

torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285

After:

torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788

Differential Revision: D20773454

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

[ghstack-poisoned]
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request Oct 15, 2019
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

ghstack-source-id: d54ff74
Pull Request resolved: #27957
@gchanan
Copy link
Copy Markdown
Contributor

gchanan commented Oct 16, 2019

@VitalyFedyunin can you review this?

@VitalyFedyunin
Copy link
Copy Markdown
Contributor

in progress

Copy link
Copy Markdown
Contributor

@VitalyFedyunin VitalyFedyunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not going to work properly if result tensor is non contiguous (for example it is transposed 2d tensor with numel == steps).

Not going to work properly if result tensor is large (due to parallelism of cpu_kernel_vec, consider using serial vec).

Please write tests for both cases.

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

[ghstack-poisoned]
@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented Oct 16, 2019

Not going to work properly if result tensor is non contiguous (for example it is transposed 2d tensor with numel == steps).

The test has already added a non-contiguous case. The reason it is handled is that cpu_kernel_vec has specifically handled it (and this is why I removed specific contiguity processing in the original implementation):

if (is_contiguous<traits>(strides)) {

Not going to work properly if result tensor is large (due to parallelism of cpu_kernel_vec, consider using serial vec).

I've now added a test case for large tensor. From my understanding, vectorized_loop (called from cpu_kernel_vec) merely loops over Vec objects using SIMD instructions, but does not use multi-CPU level parallelization. Therefore, I don't think this PR will break large tensors using cpu_kernel_vec.

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request Oct 16, 2019
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

ghstack-source-id: 54b1259
Pull Request resolved: #27957
@xuhdev xuhdev requested a review from VitalyFedyunin October 16, 2019 22:14
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request Oct 21, 2019
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

ghstack-source-id: 054a78e
Pull Request resolved: #27957
@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented Nov 4, 2019

@VitalyFedyunin Mind reviewing again :)?

@VitalyFedyunin
Copy link
Copy Markdown
Contributor

Fails correctness check (please include as test):

In [27]: y = torch.linspace(0, 1000000-1, 1000000)                                                                                                                     

In [28]: correct = True                                                                                                                                                

In [29]: for i in range(y.shape[0]-1): 
    ...:     correct = correct and y[i] < y[i+1] 
    ...:                                                                                                                                                               

In [30]: correct                                                                                                                                                       
Out[30]: tensor(False)

Comment thread aten/src/ATen/native/cpu/RangeFactoriesKernel.cpp Outdated
Copy link
Copy Markdown
Contributor

@VitalyFedyunin VitalyFedyunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrently and incorrectly using idx

@VitalyFedyunin VitalyFedyunin self-requested a review November 12, 2019 19:33
Copy link
Copy Markdown
Contributor

@VitalyFedyunin VitalyFedyunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concurrently and incorrectly using idx

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request Nov 12, 2019
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

ghstack-source-id: f616cde
Pull Request resolved: #27957
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request Mar 19, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

ghstack-source-id: 145c334
Pull Request resolved: #27957
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request Mar 20, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

ghstack-source-id: 38de789
Pull Request resolved: #27957
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request Mar 24, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

ghstack-source-id: a6a6da7
Pull Request resolved: #27957
@VitalyFedyunin
Copy link
Copy Markdown
Contributor

Hi! Can you please rebase stack.

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request Mar 31, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

ghstack-source-id: 7aea696
Pull Request resolved: #27957
@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented Mar 31, 2020

@VitalyFedyunin Done!

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```

Differential Revision: [D20773454](https://our.internmc.facebook.com/intern/diff/D20773454)

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request Mar 31, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

ghstack-source-id: 5c6a4e9
Pull Request resolved: #27957
@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented Apr 1, 2020

@VitalyFedyunin All tests have passed

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```

Differential Revision: [D20773454](https://our.internmc.facebook.com/intern/diff/D20773454)

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request Apr 7, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

ghstack-source-id: 13c0569
Pull Request resolved: #27957
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```

Differential Revision: [D20773454](https://our.internmc.facebook.com/intern/diff/D20773454)

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request Apr 8, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

ghstack-source-id: bc11a5e
Pull Request resolved: #27957
@xuhdev
Copy link
Copy Markdown
Collaborator Author

xuhdev commented Apr 22, 2020

@VitalyFedyunin Any chance for this one? :)

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```

Differential Revision: [D20773454](https://our.internmc.facebook.com/intern/diff/D20773454)

[ghstack-poisoned]
xuhdev added a commit that referenced this pull request Apr 23, 2020
Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float'):
    for n, t in [(40_000, 200000),
                (400_000, 20000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
11.188449680000303
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
10.69958164000036
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
11.296819276999486
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
10.829777259000366
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 200000 times
3.704719146999196
torch.linspace(0, 10, 400000, dtype=torch.double) for 20000 times
3.0970425030000115
torch.linspace(0, 10, 40000, dtype=torch.float) for 200000 times
3.9462350260000676
torch.linspace(0, 10, 400000, dtype=torch.float) for 20000 times
3.4302567130007446
```

ghstack-source-id: ed44553
Pull Request resolved: #27957
@xuhdev xuhdev deleted the gh/xuhdev/42/head branch April 30, 2020 21:40
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@VitalyFedyunin merged this pull request in cd48fb5.

laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Pull Request resolved: pytorch#27957

Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136):

```python
import timeit
for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'):
    for n, t in [(40_000, 50000),
                (400_000, 5000)]:
        print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times')
        print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t))
```

Before:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.3964195849839598
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
1.2374563289922662
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.8631796519621275
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
1.6991038109990768
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.8358083459897898
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7214750979910605
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.8356257299892604
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.706238206999842
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
1.7463878280250356
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.6172360889613628
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
1.8656846070080064
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
1.714238062966615
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
1.8272205490502529
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
1.6409171230043285
```

After:

```
torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times
1.0077099470072426
torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times
0.8227124120458029
torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times
1.0058343949494883
torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times
0.8376779520185664
torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times
1.903041019977536
torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times
1.7576498500420712
torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times
1.7628699769848026
torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times
1.6204477970022708
torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times
2.0970272019621916
torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times
1.9493417189805768
torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times
2.29020385700278
torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times
2.1212510910118
torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times
2.3479344319785014
torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times
2.156775983981788
```

Test Plan: Imported from OSS

Differential Revision: D20773454

Pulled By: VitalyFedyunin

fbshipit-source-id: ebeef59a90edde581669cc2afcc3d65929c8ac79
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants