Skip to content

Move the CUDA implementation of log1p to ATen.#26923

Closed
xuhdev wants to merge 16 commits intogh/xuhdev/38/basefrom
gh/xuhdev/38/head
Closed

Move the CUDA implementation of log1p to ATen.#26923
xuhdev wants to merge 16 commits intogh/xuhdev/38/basefrom
gh/xuhdev/38/head

Conversation

@xuhdev
Copy link
Collaborator

@xuhdev xuhdev commented Sep 26, 2019

Stack from ghstack:

Benchmark (RHEL 7, gcc 8.3.1, P1000):

import timeit

for n, t in [(10_000, 20000),
             (100_000, 20000)]:
    for dtype in ('torch.half', 'torch.float', 'torch.double'):
        print(f'torch.log1p(a) a.numel() == {n} for {t} times {dtype}')
        print(timeit.timeit(f'torch.log1p(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")/100000', number=t))

Before:

torch.log1p(a) a.numel() == 10000 for 20000 times torch.half
0.46644441100033873
torch.log1p(a) a.numel() == 10000 for 20000 times torch.float
0.47403449599914893
torch.log1p(a) a.numel() == 10000 for 20000 times torch.double
0.5681769420007186
torch.log1p(a) a.numel() == 100000 for 20000 times torch.half
0.5420387039994239
torch.log1p(a) a.numel() == 100000 for 20000 times torch.float
0.6156843030003074
torch.log1p(a) a.numel() == 100000 for 20000 times torch.double
2.4580643359986425

After:

torch.log1p(a) a.numel() == 10000 for 20000 times torch.half
0.46042340799976955
torch.log1p(a) a.numel() == 10000 for 20000 times torch.float
0.4593915469995409
torch.log1p(a) a.numel() == 10000 for 20000 times torch.double
0.559550654999839
torch.log1p(a) a.numel() == 100000 for 20000 times torch.half
0.513594127000033
torch.log1p(a) a.numel() == 100000 for 20000 times torch.float
0.6011945249993005
torch.log1p(a) a.numel() == 100000 for 20000 times torch.double
2.444039010999404

Fix #24588

Differential Revision: D17984184

@pytorchbot pytorchbot added module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels Sep 26, 2019
@pytorchbot pytorchbot added the module: cpu CPU specific problem (e.g., perf, algorithm) label Sep 26, 2019
xuhdev added a commit that referenced this pull request Sep 26, 2019
Fix #24588

ghstack-source-id: a821eea
Pull Request resolved: #26923
xuhdev added a commit that referenced this pull request Sep 30, 2019
Fix #24588

ghstack-source-id: 5a3306b
Pull Request resolved: #26923
xuhdev added a commit that referenced this pull request Oct 3, 2019
Fix #24588

ghstack-source-id: 525ba92
Pull Request resolved: #26923
@xuhdev xuhdev requested a review from VitalyFedyunin October 3, 2019 22:26
@cpuhrsch cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 11, 2019
Copy link
Contributor

@VitalyFedyunin VitalyFedyunin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, Please rebase and I will start landing.

@xuhdev
Copy link
Collaborator Author

xuhdev commented Oct 17, 2019

Rebased

@xuhdev xuhdev deleted the gh/xuhdev/38/head branch October 22, 2019 21:13
@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in 19aeb47.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 22, 2019
Summary:
Pull Request resolved: pytorch/pytorch#26923

Fix #24588

Test Plan: Imported from OSS

Differential Revision: D17984184

Pulled By: VitalyFedyunin

fbshipit-source-id: 3bc2be4f08e800b1de274940f2bd3d5b418b45ee
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Pull Request resolved: pytorch#26923

Fix pytorch#24588

Test Plan: Imported from OSS

Differential Revision: D17984184

Pulled By: VitalyFedyunin

fbshipit-source-id: 3bc2be4f08e800b1de274940f2bd3d5b418b45ee
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants