Migrate frac from TH to ATen (CUDA)#28953
Closed
xuhdev wants to merge 5 commits intogh/xuhdev/47/basefrom
Closed
Conversation
Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` [ghstack-poisoned]
xuhdev
added a commit
that referenced
this pull request
Oct 31, 2019
Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` ghstack-source-id: 5f9c0f9 Pull Request resolved: #28953
Member
CircleCI build failures summaryAs of commit 4a5421e:
Here are the reasons each build failed. This comment was automatically generated by Dr. CI. Please report bugs/suggestions on the GitHub issue tracker. This comment has been revised 2 time(s). |
Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` [ghstack-poisoned]
xuhdev
added a commit
that referenced
this pull request
Oct 31, 2019
Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` ghstack-source-id: a8cea62 Pull Request resolved: #28953
VitalyFedyunin
approved these changes
Nov 4, 2019
Collaborator
Author
|
@VitalyFedyunin Are you merging this? I'm going to rebase |
Contributor
|
Go ahead with rebase |
Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` Differential Revision: [D18302768](https://our.internmc.facebook.com/intern/diff/D18302768) [ghstack-poisoned]
Collaborator
Author
|
Done |
Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` Differential Revision: [D18302768](https://our.internmc.facebook.com/intern/diff/D18302768) [ghstack-poisoned]
Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` Differential Revision: [D18302768](https://our.internmc.facebook.com/intern/diff/D18302768) [ghstack-poisoned]
zdevito
pushed a commit
to zdevito/ATen
that referenced
this pull request
Nov 9, 2019
Summary: Pull Request resolved: pytorch/pytorch#28953 Close #24566 Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc 7.4): ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.frac(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.frac(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3608182370007853 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3647012189976522 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.3889585220022127 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.622635444997286 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9595754649999435 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5590267750012572 ``` After: ``` torch.frac(a) a.numel() == 10000 for 20000 times torch.half 0.3675256470014574 torch.frac(a) a.numel() == 10000 for 20000 times torch.float 0.3703597319981782 torch.frac(a) a.numel() == 10000 for 20000 times torch.double 0.372184894993552 torch.frac(a) a.numel() == 100000 for 20000 times torch.half 0.60767333900003 torch.frac(a) a.numel() == 100000 for 20000 times torch.float 0.9645607889979146 torch.frac(a) a.numel() == 100000 for 20000 times torch.double 1.5542530329985311 ``` Test Plan: Imported from OSS Differential Revision: D18302768 Pulled By: VitalyFedyunin fbshipit-source-id: 24198838dc903d455155f0819d0c7d58974aaecd
Contributor
|
@VitalyFedyunin merged this pull request in 4606deb. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack:
Close #24566
Benchmark (Debian Buster, CUDA 9.2, Quadro P400, turbo off, Release, gcc
7.4):
Before:
After:
Differential Revision: D18302768