parallel max and min for ATen on CPU#10343
Conversation
|
cc @colesbury |
|
can someone take a look at the build fail: |
|
Feel free to ignore that one. Do you have some benchmarks on this? |
|
@ssnl sure, i wrote a small benchmark for max, the piece of code reduces from import torch
from time import time
N = 2000
T = 35820
warmups = 100
count = 200
a = torch.randn(N, T)
def test_max():
for i in range(warmups):
b, _ = a.max(dim=1)
tstart = time()
for i in range(count):
b, _ = a.max(dim=1)
tend = time()
print("max reduction : %f ms" % ((tend-tstart)/count*1000))
test_max()I brought this up because i have been optimizing OpenNMT-py, |
colesbury
left a comment
There was a problem hiding this comment.
Nice speed-ups. LGTM with a few code style comments.
I'm working on changing how reductions are implemented and unifying some of the CPU and CUDA code, but it'll probably take a while, so this speed-up is very welcome.
|
|
||
| template <> | ||
| bool _isnan(float val) { | ||
| return std::isnan(val); |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| return std::isnan(val); | ||
| } | ||
|
|
||
| #define isnan_break(val) \ |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
facebook-github-bot
left a comment
There was a problem hiding this comment.
ezyang has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: optimize max and min reduction for ATen CPU path, current code path from TH module runs in sequential on CPU. Pull Request resolved: pytorch/pytorch#10343 Differential Revision: D9330799 Pulled By: ezyang fbshipit-source-id: 5b8271e0ca3e3e73f88a9075aa541c8756001b7c
optimize max and min reduction for ATen CPU path, current code path from TH module runs in sequential on CPU.