Migrate acos from TH to ATen (CUDA)#29323
Closed
xuhdev wants to merge 3 commits intogh/xuhdev/51/basefrom
Closed
Conversation
Benchmark (Debian Buster, gcc 7.4, Release build, P400, turbo off):
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.acos(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.acos(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```
Before:
```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3783099120009865
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.37258279799971206
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5627449999992677
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8581132070012245
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0164795860000595
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.644646360999104
```
After:
```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3873771430007764
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.38498222500038537
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5826049269999203
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8118497010000283
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0175845949997893
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.658536324999659
```
Close #24532
[ghstack-poisoned]
This was referenced Nov 6, 2019
xuhdev
added a commit
that referenced
this pull request
Nov 6, 2019
Benchmark (Debian Buster, gcc 7.4, Release build, P400, turbo off):
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.acos(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.acos(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```
Before:
```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3783099120009865
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.37258279799971206
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5627449999992677
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8581132070012245
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0164795860000595
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.644646360999104
```
After:
```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3873771430007764
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.38498222500038537
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5826049269999203
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8118497010000283
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0175845949997893
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.658536324999659
```
Close #24532
ghstack-source-id: cb04416
Pull Request resolved: #29323
Benchmark (Debian Buster, gcc 7.4, Release build, P400, turbo off):
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.acos(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.acos(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```
Before:
```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3783099120009865
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.37258279799971206
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5627449999992677
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8581132070012245
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0164795860000595
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.644646360999104
```
After:
```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3873771430007764
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.38498222500038537
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5826049269999203
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8118497010000283
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0175845949997893
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.658536324999659
```
Close #24532
[ghstack-poisoned]
xuhdev
added a commit
that referenced
this pull request
Nov 6, 2019
Benchmark (Debian Buster, gcc 7.4, Release build, P400, turbo off):
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.acos(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.acos(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```
Before:
```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3783099120009865
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.37258279799971206
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5627449999992677
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8581132070012245
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0164795860000595
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.644646360999104
```
After:
```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3873771430007764
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.38498222500038537
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5826049269999203
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8118497010000283
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0175845949997893
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.658536324999659
```
Close #24532
ghstack-source-id: 1993d43
Pull Request resolved: #29323
Benchmark (Debian Buster, gcc 7.4, Release build, P400, turbo off):
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.acos(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.acos(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```
Before:
```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3783099120009865
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.37258279799971206
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5627449999992677
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8581132070012245
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0164795860000595
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.644646360999104
```
After:
```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3873771430007764
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.38498222500038537
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5826049269999203
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8118497010000283
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0175845949997893
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.658536324999659
```
Close #24532
[ghstack-poisoned]
xuhdev
added a commit
that referenced
this pull request
Nov 7, 2019
Benchmark (Debian Buster, gcc 7.4, Release build, P400, turbo off):
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.acos(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.acos(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```
Before:
```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3783099120009865
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.37258279799971206
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5627449999992677
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8581132070012245
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0164795860000595
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.644646360999104
```
After:
```
torch.acos(a) a.numel() == 10000 for 20000 times torch.half
0.3873771430007764
torch.acos(a) a.numel() == 10000 for 20000 times torch.float
0.38498222500038537
torch.acos(a) a.numel() == 10000 for 20000 times torch.double
0.5826049269999203
torch.acos(a) a.numel() == 100000 for 20000 times torch.half
0.8118497010000283
torch.acos(a) a.numel() == 100000 for 20000 times torch.float
1.0175845949997893
torch.acos(a) a.numel() == 100000 for 20000 times torch.double
2.658536324999659
```
Close #24532
ghstack-source-id: b28c3d4
Pull Request resolved: #29323
VitalyFedyunin
approved these changes
Nov 8, 2019
zdevito
pushed a commit
to zdevito/ATen
that referenced
this pull request
Nov 9, 2019
Summary: Pull Request resolved: pytorch/pytorch#29323 Benchmark (Debian Buster, gcc 7.4, Release build, P400, turbo off): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.acos(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.acos(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.acos(a) a.numel() == 10000 for 20000 times torch.half 0.3783099120009865 torch.acos(a) a.numel() == 10000 for 20000 times torch.float 0.37258279799971206 torch.acos(a) a.numel() == 10000 for 20000 times torch.double 0.5627449999992677 torch.acos(a) a.numel() == 100000 for 20000 times torch.half 0.8581132070012245 torch.acos(a) a.numel() == 100000 for 20000 times torch.float 1.0164795860000595 torch.acos(a) a.numel() == 100000 for 20000 times torch.double 2.644646360999104 ``` After: ``` torch.acos(a) a.numel() == 10000 for 20000 times torch.half 0.3873771430007764 torch.acos(a) a.numel() == 10000 for 20000 times torch.float 0.38498222500038537 torch.acos(a) a.numel() == 10000 for 20000 times torch.double 0.5826049269999203 torch.acos(a) a.numel() == 100000 for 20000 times torch.half 0.8118497010000283 torch.acos(a) a.numel() == 100000 for 20000 times torch.float 1.0175845949997893 torch.acos(a) a.numel() == 100000 for 20000 times torch.double 2.658536324999659 ``` Close #24532 Test Plan: Imported from OSS Differential Revision: D18406806 Pulled By: VitalyFedyunin fbshipit-source-id: 2d012485f4747fae0ddbcf2e08b1d75ef5274a19
Contributor
|
@VitalyFedyunin merged this pull request in 6c02067. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack:
Benchmark (Debian Buster, gcc 7.4, Release build, P400, turbo off):
Before:
After:
Close #24532
Differential Revision: D18406806