Optimize LeakyReLU and PReLU 'forward' functions on the CPU#9206
Closed
btgraham wants to merge 1 commit intopytorch:masterfrom
Closed
Optimize LeakyReLU and PReLU 'forward' functions on the CPU#9206btgraham wants to merge 1 commit intopytorch:masterfrom
btgraham wants to merge 1 commit intopytorch:masterfrom
Conversation
apaszke
approved these changes
Jul 6, 2018
Contributor
apaszke
left a comment
There was a problem hiding this comment.
Wow, that's nice. Looks like the vectorization pass can't deal with the original code, but has no issue with the later version: https://godbolt.org/g/j5XJr3 (looks similar across many different compiler versions). It might be due to lack of -ffast-math (one path doesn't use multiplication, the other one does, so it has to be careful).
Contributor
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ssnl has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
zdevito
pushed a commit
to zdevito/ATen
that referenced
this pull request
Jul 7, 2018
Summary:
This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread.
```
import os
os.environ['OMP_NUM_THREADS']='1' #Use one CPU thread
import torch, torch.nn as nn, time
def test_net(net,offset):
net.eval()
total=0
with torch.no_grad():
for _ in range(100):
x = torch.randn(100,100,100)+offset
start_time = time.time()
y = net(x)
total+=time.time()-start_time
print(net, total*10, 'ms')
for offset in [-1,0,+1]:
test_net(nn.LeakyReLU(),offset)
test_net(nn.PReLU(),offset)
```
Closes pytorch/pytorch#9206
Reviewed By: yf225
Differential Revision: D8749491
Pulled By: btgraham
fbshipit-source-id: 3db8049dd151c0ba9ae1dd5c05bcc58bcab97e9a
zdevito
pushed a commit
to zdevito/ATen
that referenced
this pull request
Jul 13, 2018
Summary:
This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread.
```
import os
os.environ['OMP_NUM_THREADS']='1' #Use one CPU thread
import torch, torch.nn as nn, time
def test_net(net,offset):
net.eval()
total=0
with torch.no_grad():
for _ in range(100):
x = torch.randn(100,100,100)+offset
start_time = time.time()
y = net(x)
total+=time.time()-start_time
print(net, total*10, 'ms')
for offset in [-1,0,+1]:
test_net(nn.LeakyReLU(),offset)
test_net(nn.PReLU(),offset)
```
Closes pytorch/pytorch#9206
Reviewed By: yf225
Differential Revision: D8749491
Pulled By: btgraham
fbshipit-source-id: 3db8049dd151c0ba9ae1dd5c05bcc58bcab97e9a
goodlux
pushed a commit
to goodlux/pytorch
that referenced
this pull request
Aug 15, 2018
…9206) Summary: This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread. ``` import os os.environ['OMP_NUM_THREADS']='1' #Use one CPU thread import torch, torch.nn as nn, time def test_net(net,offset): net.eval() total=0 with torch.no_grad(): for _ in range(100): x = torch.randn(100,100,100)+offset start_time = time.time() y = net(x) total+=time.time()-start_time print(net, total*10, 'ms') for offset in [-1,0,+1]: test_net(nn.LeakyReLU(),offset) test_net(nn.PReLU(),offset) ``` Closes pytorch#9206 Reviewed By: yf225 Differential Revision: D8749491 Pulled By: btgraham fbshipit-source-id: 3db8049dd151c0ba9ae1dd5c05bcc58bcab97e9a
Closed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread.