Skip to content

addmm: Reduce constant time overhead#41374

Closed
peterbell10 wants to merge 1 commit intopytorch:masterfrom
peterbell10:addmm-overhead
Closed

addmm: Reduce constant time overhead#41374
peterbell10 wants to merge 1 commit intopytorch:masterfrom
peterbell10:addmm-overhead

Conversation

@peterbell10
Copy link
Copy Markdown
Collaborator

@peterbell10 peterbell10 commented Jul 13, 2020

Fixes the overhead reported by @ngimel in #40927 (comment)

As it turns out, Tensor.size(n) has more overhead than Tensor.sizes()[n]. Since addmm does a lot of introspection of the input matrix sizes and strides, this added up to a noticeable (~1 us) constant time overhead.

With this change, a 1x1 matmul takes 2.85 us on my machine compared to 2.90 us on pytorch 1.5.

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Jul 13, 2020

💊 CI failures summary and remediations

As of commit ef2e973 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 9 times.

Copy link
Copy Markdown
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, thanks for fixing!

result.resize_({ self.size(0), mat2.size(1) });
TORCH_CHECK(self.dim() == 2, "self must be a matrix");
TORCH_CHECK(mat2.dim() == 2, "mat2 must be a matrix");
native::resize_(result, {self.sizes()[0], mat2.sizes()[1]});
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: since you are resizing result in addmm_cpu_impl, do you need this resize here?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

result is also the self tensor in addmm which must at least broadcast to the correct dimensions.

@ezyang
Copy link
Copy Markdown
Contributor

ezyang commented Jul 14, 2020

As it turns out, Tensor.size(n) has more overhead than Tensor.sizes()[n]

Ugggh. Is this because size(n) is doing checked index access and [n] is doing unchecked indexing? Another reason we need to devirtualize size so that compiler can optimize away bounds test if you check dim earlier...

@ngimel
Copy link
Copy Markdown
Collaborator

ngimel commented Jul 14, 2020

In addition to checked access, Tensor.size(n) also maybe_wraps n, for additional overhead.

@peterbell10
Copy link
Copy Markdown
Collaborator Author

size(n) does dim wrapping and bounds checking but is also not inlined and does runtime dispatch for each call.

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ngimel merged this pull request in e2c4c2f.

laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Fixes the overhead reported by ngimel in pytorch#40927 (comment)

As it turns out, `Tensor.size(n)` has more overhead than `Tensor.sizes()[n]`. Since addmm does a lot of introspection of the input matrix sizes and strides, this added up to a noticeable (~1 us) constant time overhead.

With this change, a 1x1 matmul takes 2.85 us on my machine compared to 2.90 us on pytorch 1.5.

Pull Request resolved: pytorch#41374

Reviewed By: ailzhang

Differential Revision: D22519924

Pulled By: ngimel

fbshipit-source-id: b29504bee7de79ce42e5e50f91523dde42b073b7
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants