Migrate asin and asin_ from the TH to Aten (CUDA)#28482
Closed
xuhdev wants to merge 2 commits intogh/xuhdev/44/basefrom
Closed
Migrate asin and asin_ from the TH to Aten (CUDA)#28482xuhdev wants to merge 2 commits intogh/xuhdev/44/basefrom
asin and asin_ from the TH to Aten (CUDA)#28482xuhdev wants to merge 2 commits intogh/xuhdev/44/basefrom
Conversation
Benchmark (RHEL 7.3, Release, P1000, gcc 8.3):
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.asin(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.asin(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```
Before:
```
torch.asin(a) a.numel() == 10000 for 20000 times torch.half
0.475854377997166
torch.asin(a) a.numel() == 10000 for 20000 times torch.float
0.4772826389998954
torch.asin(a) a.numel() == 10000 for 20000 times torch.double
0.6297428649995709
torch.asin(a) a.numel() == 100000 for 20000 times torch.half
0.5475849750000634
torch.asin(a) a.numel() == 100000 for 20000 times torch.float
0.6156488769993302
torch.asin(a) a.numel() == 100000 for 20000 times torch.double
2.728912709000724
```
After:
```
torch.asin(a) a.numel() == 10000 for 20000 times torch.half
0.5107104659982724
torch.asin(a) a.numel() == 10000 for 20000 times torch.float
0.509122366001975
torch.asin(a) a.numel() == 10000 for 20000 times torch.double
0.6929216960015765
torch.asin(a) a.numel() == 100000 for 20000 times torch.half
0.5914848840002378
torch.asin(a) a.numel() == 100000 for 20000 times torch.float
0.6518679289983993
torch.asin(a) a.numel() == 100000 for 20000 times torch.double
2.916458261999651
```
Close #24537
[ghstack-poisoned]
xuhdev
added a commit
that referenced
this pull request
Oct 23, 2019
Benchmark (RHEL 7.3, Release, P1000, gcc 8.3):
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.asin(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.asin(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```
Before:
```
torch.asin(a) a.numel() == 10000 for 20000 times torch.half
0.475854377997166
torch.asin(a) a.numel() == 10000 for 20000 times torch.float
0.4772826389998954
torch.asin(a) a.numel() == 10000 for 20000 times torch.double
0.6297428649995709
torch.asin(a) a.numel() == 100000 for 20000 times torch.half
0.5475849750000634
torch.asin(a) a.numel() == 100000 for 20000 times torch.float
0.6156488769993302
torch.asin(a) a.numel() == 100000 for 20000 times torch.double
2.728912709000724
```
After:
```
torch.asin(a) a.numel() == 10000 for 20000 times torch.half
0.5107104659982724
torch.asin(a) a.numel() == 10000 for 20000 times torch.float
0.509122366001975
torch.asin(a) a.numel() == 10000 for 20000 times torch.double
0.6929216960015765
torch.asin(a) a.numel() == 100000 for 20000 times torch.half
0.5914848840002378
torch.asin(a) a.numel() == 100000 for 20000 times torch.float
0.6518679289983993
torch.asin(a) a.numel() == 100000 for 20000 times torch.double
2.916458261999651
```
Close #24537
ghstack-source-id: 36ddd47
Pull Request resolved: #28482
Contributor
|
It seems like there's a regression, any idea why? |
Collaborator
Author
It's likely some sort of hardware unstability. It's pretty hard to get consistent CUDA benchmarks (as it turns out, even power source would affect the outcome). I can try to run the benchmark again at another time, but the CUDA build is really time consuming. |
VitalyFedyunin
approved these changes
Oct 23, 2019
Benchmark (RHEL 7.3, Release, P1000, gcc 8.3):
```python
import timeit
for n, t in [(10_000, 20000),
(100_000, 20000)]:
for dtype in ('torch.half', 'torch.float', 'torch.double'):
print(f'torch.asin(a) a.numel() == {n} for {t} times {dtype}')
print(timeit.timeit(f'torch.asin(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))
```
Before:
```
torch.asin(a) a.numel() == 10000 for 20000 times torch.half
0.475854377997166
torch.asin(a) a.numel() == 10000 for 20000 times torch.float
0.4772826389998954
torch.asin(a) a.numel() == 10000 for 20000 times torch.double
0.6297428649995709
torch.asin(a) a.numel() == 100000 for 20000 times torch.half
0.5475849750000634
torch.asin(a) a.numel() == 100000 for 20000 times torch.float
0.6156488769993302
torch.asin(a) a.numel() == 100000 for 20000 times torch.double
2.728912709000724
```
After:
```
torch.asin(a) a.numel() == 10000 for 20000 times torch.half
0.5107104659982724
torch.asin(a) a.numel() == 10000 for 20000 times torch.float
0.509122366001975
torch.asin(a) a.numel() == 10000 for 20000 times torch.double
0.6929216960015765
torch.asin(a) a.numel() == 100000 for 20000 times torch.half
0.5914848840002378
torch.asin(a) a.numel() == 100000 for 20000 times torch.float
0.6518679289983993
torch.asin(a) a.numel() == 100000 for 20000 times torch.double
2.916458261999651
```
Close #24537
Differential Revision: [D18089074](https://our.internmc.facebook.com/intern/diff/D18089074)
[ghstack-poisoned]
Collaborator
Author
|
I added more stabilization to the benchmarking environment and now they are producing very close results. See the updated benchmarking result. |
Contributor
|
Thanks @xuhdev! |
zdevito
pushed a commit
to zdevito/ATen
that referenced
this pull request
Oct 30, 2019
Summary: Pull Request resolved: pytorch/pytorch#28482 Benchmark (RHEL 7.3, Release, P1000, gcc 8.3): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.asin(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.asin(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.asin(a) a.numel() == 10000 for 20000 times torch.half 0.475854377997166 torch.asin(a) a.numel() == 10000 for 20000 times torch.float 0.4772826389998954 torch.asin(a) a.numel() == 10000 for 20000 times torch.double 0.6297428649995709 torch.asin(a) a.numel() == 100000 for 20000 times torch.half 0.5475849750000634 torch.asin(a) a.numel() == 100000 for 20000 times torch.float 0.6156488769993302 torch.asin(a) a.numel() == 100000 for 20000 times torch.double 2.728912709000724 ``` After: ``` torch.asin(a) a.numel() == 10000 for 20000 times torch.half 0.5107104659982724 torch.asin(a) a.numel() == 10000 for 20000 times torch.float 0.509122366001975 torch.asin(a) a.numel() == 10000 for 20000 times torch.double 0.6929216960015765 torch.asin(a) a.numel() == 100000 for 20000 times torch.half 0.5914848840002378 torch.asin(a) a.numel() == 100000 for 20000 times torch.float 0.6518679289983993 torch.asin(a) a.numel() == 100000 for 20000 times torch.double 2.916458261999651 ``` Close #24537 Test Plan: Imported from OSS Differential Revision: D18089074 Pulled By: VitalyFedyunin fbshipit-source-id: f27515dd1ee73b6e2391ebcc0004af28bcb82234
Contributor
|
@VitalyFedyunin merged this pull request in a7166ae. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack:
sinhandsinh_from the TH to Aten (CUDA) #28527 Migratesinhandsinh_from the TH to Aten (CUDA)asinandasin_from the TH to Aten (CUDA) #28482 Migrateasinandasin_from the TH to Aten (CUDA)sinandsin_from the TH to Aten (CUDA) #28237 Migratesinandsin_from the TH to Aten (CUDA)Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4):
Before:
After:
Close #24537
Differential Revision: D18089074