addmm: Reduce constant time overhead#41374
addmm: Reduce constant time overhead#41374peterbell10 wants to merge 1 commit intopytorch:masterfrom
Conversation
💊 CI failures summary and remediationsAs of commit ef2e973 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 9 times. |
f6ba74c to
ef2e973
Compare
ngimel
left a comment
There was a problem hiding this comment.
Awesome, thanks for fixing!
| result.resize_({ self.size(0), mat2.size(1) }); | ||
| TORCH_CHECK(self.dim() == 2, "self must be a matrix"); | ||
| TORCH_CHECK(mat2.dim() == 2, "mat2 must be a matrix"); | ||
| native::resize_(result, {self.sizes()[0], mat2.sizes()[1]}); |
There was a problem hiding this comment.
nit: since you are resizing result in addmm_cpu_impl, do you need this resize here?
There was a problem hiding this comment.
result is also the self tensor in addmm which must at least broadcast to the correct dimensions.
Ugggh. Is this because size(n) is doing checked index access and [n] is doing unchecked indexing? Another reason we need to devirtualize size so that compiler can optimize away bounds test if you check dim earlier... |
|
In addition to checked access, Tensor.size(n) also maybe_wraps n, for additional overhead. |
|
|
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Fixes the overhead reported by ngimel in pytorch#40927 (comment) As it turns out, `Tensor.size(n)` has more overhead than `Tensor.sizes()[n]`. Since addmm does a lot of introspection of the input matrix sizes and strides, this added up to a noticeable (~1 us) constant time overhead. With this change, a 1x1 matmul takes 2.85 us on my machine compared to 2.90 us on pytorch 1.5. Pull Request resolved: pytorch#41374 Reviewed By: ailzhang Differential Revision: D22519924 Pulled By: ngimel fbshipit-source-id: b29504bee7de79ce42e5e50f91523dde42b073b7
Fixes the overhead reported by @ngimel in #40927 (comment)
As it turns out,
Tensor.size(n)has more overhead thanTensor.sizes()[n]. Since addmm does a lot of introspection of the input matrix sizes and strides, this added up to a noticeable (~1 us) constant time overhead.With this change, a 1x1 matmul takes 2.85 us on my machine compared to 2.90 us on pytorch 1.5.