Micro-optimisations for matmul 2.0: Electric boogaloo by lezcano · Pull Request #75197 · pytorch/pytorch

lezcano · 2022-04-04T16:21:52Z

Stack from ghstack:

This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:

Make the code const correct.
Create DimVector's more efficiently (e.g. prefer append over
insert).
Access sizes of the tensors via sizes().front() / sizes().back()
/ sizes().end()[-2]
Do not create intermediary tensors / vectors when it can be avoided.
Call reshape rather than expect_contiguous + view

On top of these, it fixes a correctness issue of matmul_out, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of set_ from the calling code, as
requested by @ezyang, and it incurs on most of the complexity of the
code that this PR adds.

@ezyang

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by @ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

facebook-github-bot · 2022-04-04T16:22:10Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/75197
📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓Need help or want to give feedback on the CI? Visit our office hours

✅ No Failures (0 Pending)

As of commit d34aa56 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. ghstack-source-id: 97a8759 Pull Request resolved: #75197

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. ghstack-source-id: 67b1813 Pull Request resolved: #75197

ezyang · 2022-04-05T00:04:03Z

should we wait for #75195 before reviewing this?

With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment) Fixes #76702 [ghstack-poisoned]

With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment) For the approach taken, see #75197 (comment) Fixes #76702 ghstack-source-id: c1018a1 Pull Request resolved: #76828

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment) Fixes #76702 [ghstack-poisoned]

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment) Fixes #76702 [ghstack-poisoned]

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment) Fixes #76702 [ghstack-poisoned]

With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment) For the approach taken, see #75197 (comment) Fixes #76702 ghstack-source-id: fc5294a Pull Request resolved: #76828

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. We add tests for this to make sure that our algorithm to detect this is accurate. For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment) Fixes #76702 [ghstack-poisoned]

With this PR, matmul just folds a bmm into a mm o mv if and only if it can achieve so without copying. For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment) For the approach taken, see #75197 (comment) Fixes #76702 ghstack-source-id: 4952fb2 Pull Request resolved: #76828

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

ngimel

Can you please rebase to get CI signal?

lezcano · 2022-05-14T09:19:23Z

I'm starting to think that all this test_jit failures may not be unrelated, as I just rebased it on viable/strict... I'll look further into them on Monday

This PR implements the bulk of #64387 Part of the optimisations were already merged in #72230 A number of these optimisations include: - Make the code `const` correct. - Create `DimVector`'s more efficiently (e.g. prefer `append` over `insert`). - Access sizes of the tensors via `sizes().front()` / `sizes().back()` / `sizes().end()[-2]` - Do not create intermediary tensors / vectors when it can be avoided. - Call `reshape` rather than `expect_contiguous` + `view` On top of these, it fixes a correctness issue of `matmul_out`, where the out parameter was not resized correctly when passed to the backends. This involves removing the use of `set_` from the calling code, as requested by ezyang, and it incurs on most of the complexity of the code that this PR adds. [ghstack-poisoned]

lezcano · 2022-05-18T16:22:16Z

@pytorchbot merge

lezcano requested review from IvanYashchuk and nikitaved as code owners April 4, 2022 16:21

facebook-github-bot added the cla signed label Apr 4, 2022

lezcano requested review from ezyang and swolchok and removed request for IvanYashchuk and nikitaved April 4, 2022 16:25

pytorchbot added the open source label Apr 4, 2022

lezcano mentioned this pull request Apr 4, 2022

Fix addmm_cpu for int64 #75200

Closed

lezcano mentioned this pull request Apr 4, 2022

Make mv and addmv support torch.float16 #75220

Closed

lezcano added 2 commits May 13, 2022 08:35

ngimel approved these changes May 14, 2022

View reviewed changes

lezcano added 2 commits May 17, 2022 16:11

lezcano mentioned this pull request May 18, 2022

Use any_type in test_out #77735

Closed

lezcano mentioned this pull request Jul 8, 2022

Weird warning using matrix exp for complex tensors #80948

Closed

lezcano mentioned this pull request Jul 29, 2022

25% Performance regression from v0.1.1 to 0.2.0 when calculating hessian pytorch/functorch#989

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Micro-optimisations for matmul 2.0: Electric boogaloo#75197

Micro-optimisations for matmul 2.0: Electric boogaloo#75197
lezcano wants to merge 28 commits intogh/Lezcano/59/basefrom
gh/Lezcano/59/head

lezcano commented Apr 4, 2022 •

edited

Loading

Uh oh!

facebook-github-bot commented Apr 4, 2022 •

edited

Loading

Uh oh!

ezyang commented Apr 5, 2022

Uh oh!

ngimel left a comment

Uh oh!

lezcano commented May 14, 2022

Uh oh!

lezcano commented May 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

lezcano commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Apr 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

✅ No Failures (0 Pending)

Uh oh!

ezyang commented Apr 5, 2022

Uh oh!

ngimel left a comment

Choose a reason for hiding this comment

Uh oh!

lezcano commented May 14, 2022

Uh oh!

lezcano commented May 18, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

lezcano commented Apr 4, 2022 •

edited

Loading

facebook-github-bot commented Apr 4, 2022 •

edited

Loading