Conversation
b2e7f05 to
1c3926e
Compare
|
Timing were retrieved using this benchmark. The command run was This is a single core benchmark. There are significant gains for the non-contiguous cases if the Tensor is larger than 10 elements. However, there is a regression for the regular contiguous case. This needs to be resolved before this can be merged. |
ef0bb33 to
c48f626
Compare
|
We found and mitigated the perf issue and will treat it separately. These are the new speedups Command Benchmark commit: d7b07460f401363888e6e5343eee7079b70374c8 |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@cpuhrsch has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
aa13038 to
0ac6d73
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@cpuhrsch has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
ROCM build succeeded separately. |
aten/src/ATen/cpu/vml.h
Outdated
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
5554acf to
774bf85
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
@cpuhrsch has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@cpuhrsch has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: This PR ports the vectorization of sigmoid to also enable better performance for non-contiguous arrays. Detailed timings will follow shortly. Pull Request resolved: pytorch/pytorch#8612 Reviewed By: ezyang Differential Revision: D8712298 Pulled By: cpuhrsch fbshipit-source-id: 01a3d06af8d04513edd024ab1d01a6b753fc6f6a
Summary: This PR ports the vectorization of sigmoid to also enable better performance for non-contiguous arrays. Detailed timings will follow shortly. Pull Request resolved: pytorch/pytorch#8612 Reviewed By: ezyang Differential Revision: D8712298 Pulled By: cpuhrsch fbshipit-source-id: 01a3d06af8d04513edd024ab1d01a6b753fc6f6a
Summary: This PR ports the vectorization of sigmoid to also enable better performance for non-contiguous arrays. Detailed timings will follow shortly. Pull Request resolved: pytorch#8612 Reviewed By: ezyang Differential Revision: D8712298 Pulled By: cpuhrsch fbshipit-source-id: 01a3d06af8d04513edd024ab1d01a6b753fc6f6a
This PR ports the vectorization of sigmoid to also enable better performance for non-contiguous arrays. Detailed timings will follow shortly.