Migrate `sin` and `sin_` from the TH to Aten (CUDA) by xuhdev · Pull Request #28237 · pytorch/pytorch

xuhdev · 2019-10-17T18:57:50Z

Stack from ghstack:

Remove definitions of acosh and asinh from TH #28696 Remove definitions of acosh and asinh from TH
Migrate sinh and sinh_ from the TH to Aten (CUDA) #28527 Migrate sinh and sinh_ from the TH to Aten (CUDA)
Migrate asin and asin_ from the TH to Aten (CUDA) #28482 Migrate asin and asin_ from the TH to Aten (CUDA)
Migrate sin and sin_ from the TH to Aten (CUDA) #28237 Migrate sin and sin_ from the TH to Aten (CUDA)

Benchmark (RHEL 7, gcc 8.3.1, P1000):

import timeit

for n, t in [(10_000, 20000),
             (100_000, 20000)]:
    for dtype in ('torch.half', 'torch.float', 'torch.double'):
        print(f'torch.sin(a) a.numel() == {n} for {t} times {dtype}')
        print(timeit.timeit(f'torch.sin(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))

Before:

torch.sin(a) a.numel() == 10000 for 20000 times torch.half
0.4649172620011086
torch.sin(a) a.numel() == 10000 for 20000 times torch.float
0.4616892600006395
torch.sin(a) a.numel() == 10000 for 20000 times torch.double
0.5166665920005471
torch.sin(a) a.numel() == 100000 for 20000 times torch.half
0.5376560490003612
torch.sin(a) a.numel() == 100000 for 20000 times torch.float
0.6207812359989475
torch.sin(a) a.numel() == 100000 for 20000 times torch.double
1.873208982999131

After:

torch.sin(a) a.numel() == 10000 for 20000 times torch.half
0.4796977340010926
torch.sin(a) a.numel() == 10000 for 20000 times torch.float
0.48329569199995603
torch.sin(a) a.numel() == 10000 for 20000 times torch.double
0.5380683220009814
torch.sin(a) a.numel() == 100000 for 20000 times torch.half
0.5299932739999349
torch.sin(a) a.numel() == 100000 for 20000 times torch.float
0.6144487999990815
torch.sin(a) a.numel() == 100000 for 20000 times torch.double
1.8838113630008593

Close #24627

Differential Revision: D18089072

Close #24627 [ghstack-poisoned]

Close #24627 ghstack-source-id: c6d9d9f Pull Request resolved: #28237

aten/src/ATen/native/native_functions.yaml

Close #24627 [ghstack-poisoned]

Close #24627 ghstack-source-id: 0632ddf Pull Request resolved: #28237

ifedan · 2019-10-18T20:00:49Z

Provide, please, performance metrics for this function before and after your change.

xuhdev · 2019-10-21T21:35:08Z

Benchmark added. Seems the timing is pretty unstable though; I got numbers that are way different after a while. Also see benchmark on the previous two PRs on this stack. But this should be sufficient to show that there is likely no negligence in losing performance.

ifedan · 2019-10-22T15:28:57Z

Benchmark added. Seems the timing is pretty unstable though; I got numbers that are way different after a while. Also see benchmark on the previous two PRs on this stack. But this should be sufficient to show that there is likely no negligence in losing performance.

did you use?
OMP_NUM_THREADS=1
MKL_NUM_THREADS=1

xuhdev · 2019-10-22T17:41:44Z

did you use?
OMP_NUM_THREADS=1
MKL_NUM_THREADS=1

I didn't enable OMP (USE_OPENMP=0) and didn't have MKL installed.

xuhdev · 2019-10-22T21:10:08Z

@ifedan I ended up benchmarking on a different machine which I have more control over the hardware (e.g., turning off turbo, warming up GPU and making sure its completely unused by others). The results are pretty close and stable now. Please see my update in the benchmark results.

Benchmark (RHEL 7, gcc 8.3.1, P1000): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.sin(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.sin(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.sin(a) a.numel() == 10000 for 20000 times torch.half 0.4649172620011086 torch.sin(a) a.numel() == 10000 for 20000 times torch.float 0.4616892600006395 torch.sin(a) a.numel() == 10000 for 20000 times torch.double 0.5166665920005471 torch.sin(a) a.numel() == 100000 for 20000 times torch.half 0.5376560490003612 torch.sin(a) a.numel() == 100000 for 20000 times torch.float 0.6207812359989475 torch.sin(a) a.numel() == 100000 for 20000 times torch.double 1.873208982999131 ``` After: ``` torch.sin(a) a.numel() == 10000 for 20000 times torch.half 0.4796977340010926 torch.sin(a) a.numel() == 10000 for 20000 times torch.float 0.48329569199995603 torch.sin(a) a.numel() == 10000 for 20000 times torch.double 0.5380683220009814 torch.sin(a) a.numel() == 100000 for 20000 times torch.half 0.5299932739999349 torch.sin(a) a.numel() == 100000 for 20000 times torch.float 0.6144487999990815 torch.sin(a) a.numel() == 100000 for 20000 times torch.double 1.8838113630008593 ``` Close #24627 [ghstack-poisoned]

Benchmark (RHEL 7, gcc 8.3.1, P1000): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.sin(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.sin(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.sin(a) a.numel() == 10000 for 20000 times torch.half 0.4649172620011086 torch.sin(a) a.numel() == 10000 for 20000 times torch.float 0.4616892600006395 torch.sin(a) a.numel() == 10000 for 20000 times torch.double 0.5166665920005471 torch.sin(a) a.numel() == 100000 for 20000 times torch.half 0.5376560490003612 torch.sin(a) a.numel() == 100000 for 20000 times torch.float 0.6207812359989475 torch.sin(a) a.numel() == 100000 for 20000 times torch.double 1.873208982999131 ``` After: ``` torch.sin(a) a.numel() == 10000 for 20000 times torch.half 0.4796977340010926 torch.sin(a) a.numel() == 10000 for 20000 times torch.float 0.48329569199995603 torch.sin(a) a.numel() == 10000 for 20000 times torch.double 0.5380683220009814 torch.sin(a) a.numel() == 100000 for 20000 times torch.half 0.5299932739999349 torch.sin(a) a.numel() == 100000 for 20000 times torch.float 0.6144487999990815 torch.sin(a) a.numel() == 100000 for 20000 times torch.double 1.8838113630008593 ``` Close #24627 Differential Revision: [D18089072](https://our.internmc.facebook.com/intern/diff/D18089072) [ghstack-poisoned]

Summary: Pull Request resolved: pytorch/pytorch#28237 Benchmark (RHEL 7, gcc 8.3.1, P1000): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.sin(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.sin(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.sin(a) a.numel() == 10000 for 20000 times torch.half 0.4649172620011086 torch.sin(a) a.numel() == 10000 for 20000 times torch.float 0.4616892600006395 torch.sin(a) a.numel() == 10000 for 20000 times torch.double 0.5166665920005471 torch.sin(a) a.numel() == 100000 for 20000 times torch.half 0.5376560490003612 torch.sin(a) a.numel() == 100000 for 20000 times torch.float 0.6207812359989475 torch.sin(a) a.numel() == 100000 for 20000 times torch.double 1.873208982999131 ``` After: ``` torch.sin(a) a.numel() == 10000 for 20000 times torch.half 0.4796977340010926 torch.sin(a) a.numel() == 10000 for 20000 times torch.float 0.48329569199995603 torch.sin(a) a.numel() == 10000 for 20000 times torch.double 0.5380683220009814 torch.sin(a) a.numel() == 100000 for 20000 times torch.half 0.5299932739999349 torch.sin(a) a.numel() == 100000 for 20000 times torch.float 0.6144487999990815 torch.sin(a) a.numel() == 100000 for 20000 times torch.double 1.8838113630008593 ``` Close #24627 Test Plan: Imported from OSS Differential Revision: D18089072 Pulled By: VitalyFedyunin fbshipit-source-id: 4824804960309fe7fdb16073d021388704986993

facebook-github-bot · 2019-10-30T23:38:12Z

@VitalyFedyunin merged this pull request in d0bd8a3.

Migrate sin and sin_ from the TH to Aten (CUDA)

b4b4637

Close #24627 [ghstack-poisoned]

This was referenced Oct 17, 2019

Move the CUDA implementation of log1p to ATen. #26923

Closed

Move the CUDA implementation of sqrt to ATen. #27372

Closed

Update on "Migrate sin and sin_ from the TH to Aten (CUDA)"

b6eec24

Close #24627 [ghstack-poisoned]

Update on "Migrate sin and sin_ from the TH to Aten (CUDA)"

313d416

Close #24627 [ghstack-poisoned]

xuhdev added a commit that referenced this pull request Oct 17, 2019

Migrate sin and sin_ from the TH to Aten (CUDA)

e873cd9

Close #24627 ghstack-source-id: c6d9d9f Pull Request resolved: #28237

ifedan reviewed Oct 18, 2019

View reviewed changes

aten/src/ATen/native/native_functions.yaml Show resolved Hide resolved

Update on "Migrate sin and sin_ from the TH to Aten (CUDA)"

20d2711

Close #24627 [ghstack-poisoned]

xuhdev added a commit that referenced this pull request Oct 18, 2019

Migrate sin and sin_ from the TH to Aten (CUDA)

d614f44

Close #24627 ghstack-source-id: 0632ddf Pull Request resolved: #28237

xuhdev requested review from VitalyFedyunin and ifedan October 18, 2019 17:43

ifedan approved these changes Oct 22, 2019

View reviewed changes

xuhdev mentioned this pull request Oct 23, 2019

Migrate asin and asin_ from the TH to Aten (CUDA) #28482

Closed

This was referenced Oct 23, 2019

Migrate sinh and sinh_ from the TH to Aten (CUDA) #28527

Closed

Remove definitions of acosh and asinh from TH #28696

Closed

facebook-github-bot closed this in d0bd8a3 Oct 30, 2019

facebook-github-bot added the merged label Oct 30, 2019

facebook-github-bot deleted the gh/xuhdev/43/head branch November 3, 2019 15:15

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate `sin` and `sin_` from the TH to Aten (CUDA)#28237

Migrate `sin` and `sin_` from the TH to Aten (CUDA)#28237
xuhdev wants to merge 7 commits intogh/xuhdev/43/basefrom
gh/xuhdev/43/head

xuhdev commented Oct 17, 2019 •

edited

Loading

Uh oh!

Uh oh!

ifedan commented Oct 18, 2019

Uh oh!

xuhdev commented Oct 21, 2019 •

edited

Loading

Uh oh!

ifedan commented Oct 22, 2019

Uh oh!

xuhdev commented Oct 22, 2019

Uh oh!

xuhdev commented Oct 22, 2019 •

edited

Loading

Uh oh!

facebook-github-bot commented Oct 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xuhdev commented Oct 17, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ifedan commented Oct 18, 2019

Uh oh!

xuhdev commented Oct 21, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ifedan commented Oct 22, 2019

Uh oh!

xuhdev commented Oct 22, 2019

Uh oh!

xuhdev commented Oct 22, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Oct 30, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xuhdev commented Oct 17, 2019 •

edited

Loading

xuhdev commented Oct 21, 2019 •

edited

Loading

xuhdev commented Oct 22, 2019 •

edited

Loading