Migrate frac from TH to ATen (CUDA) by xuhdev · Pull Request #28953 · pytorch/pytorch

xuhdev · 2019-10-31T06:00:06Z

Stack from ghstack:

Migrate acos from TH to ATen (CUDA) #29323 Migrate acos from TH to ATen (CUDA)
Clean up many unused declaration/definitions in TH #29046 Clean up many unused declaration/definitions in TH
Migrate frac from TH to ATen (CUDA) #28953 Migrate frac from TH to ATen (CUDA)

Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc
7.4):

import timeit

for n, t in [(10_000, 20000),
             (100_000, 20000)]:
    for dtype in ('torch.half', 'torch.float', 'torch.double'):
        print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}')
        print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t))

Before:

torch.frac(a) a.numel() == 10000 for 20000 times torch.half
0.3608182370007853
torch.frac(a) a.numel() == 10000 for 20000 times torch.float
0.3647012189976522
torch.frac(a) a.numel() == 10000 for 20000 times torch.double
0.3889585220022127
torch.frac(a) a.numel() == 100000 for 20000 times torch.half
0.622635444997286
torch.frac(a) a.numel() == 100000 for 20000 times torch.float
0.9595754649999435
torch.frac(a) a.numel() == 100000 for 20000 times torch.double
1.5590267750012572

After:

torch.frac(a) a.numel() == 10000 for 20000 times torch.half
0.3675256470014574
torch.frac(a) a.numel() == 10000 for 20000 times torch.float
0.3703597319981782
torch.frac(a) a.numel() == 10000 for 20000 times torch.double
0.372184894993552
torch.frac(a) a.numel() == 100000 for 20000 times torch.half
0.60767333900003
torch.frac(a) a.numel() == 100000 for 20000 times torch.float
0.9645607889979146
torch.frac(a) a.numel() == 100000 for 20000 times torch.double
1.5542530329985311

Differential Revision: D18302768

Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` [ghstack-poisoned]

Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` ghstack-source-id: 5f9c0f9 Pull Request resolved: #28953

kostmo · 2019-10-31T08:48:52Z

CircleCI build failures summary

As of commit 4a5421e:

0/4 flaky

Here are the reasons each build failed.

This comment was automatically generated by Dr. CI.
Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 2 time(s).

Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` [ghstack-poisoned]

Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` ghstack-source-id: a8cea62 Pull Request resolved: #28953

xuhdev · 2019-11-06T20:45:32Z

@VitalyFedyunin Are you merging this? I'm going to rebase

VitalyFedyunin · 2019-11-06T20:51:41Z

Go ahead with rebase

Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` Differential Revision: [D18302768](https://our.internmc.facebook.com/intern/diff/D18302768) [ghstack-poisoned]

xuhdev · 2019-11-06T20:52:35Z

Done

Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` Differential Revision: [D18302768](https://our.internmc.facebook.com/intern/diff/D18302768) [ghstack-poisoned]

Summary: Pull Request resolved: pytorch/pytorch#28953 Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` Test Plan: Imported from OSS Differential Revision: D18302768 Pulled By: VitalyFedyunin fbshipit-source-id: 24198838dc903d455155f0819d0c7d58974aaecd

facebook-github-bot · 2019-11-10T00:34:08Z

@VitalyFedyunin merged this pull request in 4606deb.

xuhdev mentioned this pull request Oct 31, 2019

Remove definitions of acosh and asinh from TH #28696

Closed

xuhdev requested a review from VitalyFedyunin October 31, 2019 06:00

xuhdev added module: operators module: cuda Related to torch.cuda, and CUDA support in general labels Oct 31, 2019

xuhdev mentioned this pull request Nov 1, 2019

Clean up many unused declaration/definitions in TH #29046

Closed

VitalyFedyunin approved these changes Nov 4, 2019

View reviewed changes

xuhdev mentioned this pull request Nov 6, 2019

Migrate acos from TH to ATen (CUDA) #29323

Closed

xuhdev added 2 commits November 6, 2019 13:58

facebook-github-bot closed this in 4606deb Nov 9, 2019

facebook-github-bot added the merged label Nov 10, 2019

facebook-github-bot deleted the gh/xuhdev/47/head branch November 13, 2019 15:17

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate frac from TH to ATen (CUDA)#28953

Migrate frac from TH to ATen (CUDA)#28953
xuhdev wants to merge 5 commits intogh/xuhdev/47/basefrom
gh/xuhdev/47/head

xuhdev commented Oct 31, 2019 •

edited

Loading

Uh oh!

kostmo commented Oct 31, 2019 •

edited

Loading

Uh oh!

xuhdev commented Nov 6, 2019

Uh oh!

VitalyFedyunin commented Nov 6, 2019

Uh oh!

xuhdev commented Nov 6, 2019

Uh oh!

facebook-github-bot commented Nov 10, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

xuhdev commented Oct 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kostmo commented Oct 31, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

CircleCI build failures summary

Uh oh!

xuhdev commented Nov 6, 2019

Uh oh!

VitalyFedyunin commented Nov 6, 2019

Uh oh!

xuhdev commented Nov 6, 2019

Uh oh!

facebook-github-bot commented Nov 10, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xuhdev commented Oct 31, 2019 •

edited

Loading

kostmo commented Oct 31, 2019 •

edited

Loading