Skip to content

Move the CUDA implementation of sqrt to ATen.#27372

Closed
xuhdev wants to merge 8 commits intogh/xuhdev/41/basefrom
gh/xuhdev/41/head
Closed

Move the CUDA implementation of sqrt to ATen.#27372
xuhdev wants to merge 8 commits intogh/xuhdev/41/basefrom
gh/xuhdev/41/head

Conversation

@xuhdev
Copy link
Collaborator

@xuhdev xuhdev commented Oct 4, 2019

Stack from ghstack:

Benchmark (RHEL 7, gcc 8.3.1, P1000):

import timeit

for n, t in [(10_000, 20000),
             (100_000, 20000)]:
    for dtype in ('torch.half', 'torch.float', 'torch.double'):
        print(f'torch.sqrt(a) a.numel() == {n} for {t} times {dtype}')
        print(timeit.timeit(f'torch.sqrt(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))

Before:

torch.sqrt(a) a.numel() == 10000 for 20000 times torch.half
0.47206006300075387
torch.sqrt(a) a.numel() == 10000 for 20000 times torch.float
0.500142054999742
torch.sqrt(a) a.numel() == 10000 for 20000 times torch.double
0.49376482300067437
torch.sqrt(a) a.numel() == 100000 for 20000 times torch.half
0.5250821959998575
torch.sqrt(a) a.numel() == 100000 for 20000 times torch.float
0.6157269270006509
torch.sqrt(a) a.numel() == 100000 for 20000 times torch.double
1.1384222449996741

After:

torch.sqrt(a) a.numel() == 10000 for 20000 times torch.half
0.4811987979992409
torch.sqrt(a) a.numel() == 10000 for 20000 times torch.float
0.4821368369994161
torch.sqrt(a) a.numel() == 10000 for 20000 times torch.double
0.48819217599884723
torch.sqrt(a) a.numel() == 100000 for 20000 times torch.half
0.5438129949998256
torch.sqrt(a) a.numel() == 100000 for 20000 times torch.float
0.6017923809995409
torch.sqrt(a) a.numel() == 100000 for 20000 times torch.double
1.1260791029999382

Fix #24638

Differential Revision: D18037944

@pytorchbot pytorchbot added module: build Build system issues module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen module: operators labels Oct 4, 2019
xuhdev added a commit that referenced this pull request Oct 4, 2019
Fix #24638

ghstack-source-id: 17e1081
Pull Request resolved: #27372
xuhdev added a commit that referenced this pull request Oct 8, 2019
Fix #24638

ghstack-source-id: f144e78
Pull Request resolved: #27372
@cpuhrsch cpuhrsch added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 11, 2019
xuhdev added a commit that referenced this pull request Oct 16, 2019
Fix #24638

ghstack-source-id: 401140a
Pull Request resolved: #27372
@xuhdev xuhdev requested a review from ifedan October 18, 2019 17:43
@xuhdev xuhdev deleted the gh/xuhdev/41/head branch October 22, 2019 21:13
@facebook-github-bot
Copy link
Contributor

@VitalyFedyunin merged this pull request in 30712f6.

zdevito pushed a commit to zdevito/ATen that referenced this pull request Oct 22, 2019
Summary:
Pull Request resolved: pytorch/pytorch#27372

Fix #24638

Test Plan: Imported from OSS

Differential Revision: D18037944

Pulled By: VitalyFedyunin

fbshipit-source-id: d3dbbc167954c7bbee25be13b5b669433bca6ee5
thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Pull Request resolved: pytorch#27372

Fix pytorch#24638

Test Plan: Imported from OSS

Differential Revision: D18037944

Pulled By: VitalyFedyunin

fbshipit-source-id: d3dbbc167954c7bbee25be13b5b669433bca6ee5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: build Build system issues module: cpu CPU specific problem (e.g., perf, algorithm) module: cuda Related to torch.cuda, and CUDA support in general module: internals Related to internal abstractions in c10 and ATen triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants