Conversation
facebook-github-bot
left a comment
There was a problem hiding this comment.
apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
288640d to
71140ab
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
71140ab to
a95ec20
Compare
facebook-github-bot
left a comment
There was a problem hiding this comment.
apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
facebook-github-bot
left a comment
There was a problem hiding this comment.
apaszke has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: - Disable addmm fusion. The reason for this is explained in the comment. - Tiny change in `stack.h` that lets us avoid constructing an unnecessary temporary `IValue` on the (C++) stack (it will only get created on the interpreter stack directly). - Fixed a correctness issue in requires grad propagation Pull Request resolved: pytorch/pytorch#11654 Reviewed By: colesbury Differential Revision: D9813739 Pulled By: apaszke fbshipit-source-id: 23e83bc8605802f39bfecf447efad9239b9421c3
|
@apaszke, this is causing a ~200ms latency regression for NMT models. Here are the top lines from perf before/after this change: Before:
After:
Is there more info I can provide to help debug this? Also cc @jamesr66a who is familiar with these models and how to optimize them. |
|
@jmp84 can you please provide me with some example tensor sizes that appear as inputs to your GEMMs? It looks like we started triggering the transposed kernels in some cases which was not the case previously (TN vs NN). |
|
Finally, how do you run those models? Is that via ONNX export or what? |
|
@apaszke I think what's going on here is that we are not hitting the special quantized implementation of FC in the caffe2 backend due to the ONNX-caffe2 backend emitting "MatMul" + "Add" instead of "FC". This can be seen from |
|
So for the addmm optimization - “not helpful at all” means neutral or negative? Of neutral - why not kee it if it matters for ONNX export? |
|
It means negative in some RNN use cases I saw |
stack.hthat lets us avoid constructing an unnecessary temporaryIValueon the (C++) stack (it will only get created on the interpreter stack directly).