Skip to content

Micro-optimisations for matmul 2.0: Electric boogaloo#75197

Closed
lezcano wants to merge 28 commits intogh/Lezcano/59/basefrom
gh/Lezcano/59/head
Closed

Micro-optimisations for matmul 2.0: Electric boogaloo#75197
lezcano wants to merge 28 commits intogh/Lezcano/59/basefrom
gh/Lezcano/59/head

Conversation

@lezcano
Copy link
Collaborator

@lezcano lezcano commented Apr 4, 2022

Stack from ghstack:

This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:

  • Make the code const correct.
  • Create DimVector's more efficiently (e.g. prefer append over
    insert).
  • Access sizes of the tensors via sizes().front() / sizes().back()
    / sizes().end()[-2]
  • Do not create intermediary tensors / vectors when it can be avoided.
  • Call reshape rather than expect_contiguous + view

On top of these, it fixes a correctness issue of matmul_out, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of set_ from the calling code, as
requested by @ezyang, and it incurs on most of the complexity of the
code that this PR adds.

This PR implements the bulk of
#64387

Part of the optimisations were already merged in
#72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by @ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Contributor

facebook-github-bot commented Apr 4, 2022

🔗 Helpful links

✅ No Failures (0 Pending)

As of commit d34aa56 (more details on the Dr. CI page):

Expand to see more

💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@lezcano lezcano requested review from ezyang and swolchok and removed request for IvanYashchuk and nikitaved April 4, 2022 16:25
@lezcano lezcano added module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul topic: performance topic category topic: not user facing topic category release notes: performance_as_product and removed topic: not user facing topic category labels Apr 4, 2022
@lezcano lezcano mentioned this pull request Apr 4, 2022
This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
lezcano added a commit that referenced this pull request Apr 4, 2022
This PR implements the bulk of
#64387

Part of the optimisations were already merged in
#72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

ghstack-source-id: 97a8759
Pull Request resolved: #75197
This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
lezcano added a commit that referenced this pull request Apr 4, 2022
This PR implements the bulk of
#64387

Part of the optimisations were already merged in
#72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

ghstack-source-id: 67b1813
Pull Request resolved: #75197
@ezyang
Copy link
Contributor

ezyang commented Apr 5, 2022

should we wait for #75195 before reviewing this?

lezcano added a commit that referenced this pull request May 11, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying. We add tests for this to make sure that
our algorithm to detect this is accurate.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

Fixes #76702

[ghstack-poisoned]
lezcano added a commit that referenced this pull request May 11, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying. We add tests for this to make sure that
our algorithm to detect this is accurate.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

Fixes #76702

[ghstack-poisoned]
lezcano added a commit that referenced this pull request May 11, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

For the approach taken, see #75197 (comment)

Fixes #76702

ghstack-source-id: c1018a1
Pull Request resolved: #76828
This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
lezcano added a commit that referenced this pull request May 11, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying. We add tests for this to make sure that
our algorithm to detect this is accurate.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

Fixes #76702

[ghstack-poisoned]
lezcano added a commit that referenced this pull request May 11, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying. We add tests for this to make sure that
our algorithm to detect this is accurate.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

Fixes #76702

[ghstack-poisoned]
This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
lezcano added a commit that referenced this pull request May 11, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying. We add tests for this to make sure that
our algorithm to detect this is accurate.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

Fixes #76702

[ghstack-poisoned]
lezcano added a commit that referenced this pull request May 11, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying. We add tests for this to make sure that
our algorithm to detect this is accurate.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

Fixes #76702

[ghstack-poisoned]
This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
lezcano added a commit that referenced this pull request May 11, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying. We add tests for this to make sure that
our algorithm to detect this is accurate.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

Fixes #76702

[ghstack-poisoned]
lezcano added a commit that referenced this pull request May 11, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying. We add tests for this to make sure that
our algorithm to detect this is accurate.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

Fixes #76702

[ghstack-poisoned]
lezcano added a commit that referenced this pull request May 11, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

For the approach taken, see #75197 (comment)

Fixes #76702

ghstack-source-id: fc5294a
Pull Request resolved: #76828
This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
lezcano added a commit that referenced this pull request May 12, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying. We add tests for this to make sure that
our algorithm to detect this is accurate.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

Fixes #76702

[ghstack-poisoned]
lezcano added a commit that referenced this pull request May 12, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying. We add tests for this to make sure that
our algorithm to detect this is accurate.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

Fixes #76702

[ghstack-poisoned]
lezcano added a commit that referenced this pull request May 12, 2022
With this PR, matmul just folds a bmm into a mm o mv if and only if it
can achieve so without copying.

For the cases where it was copying before see #75197 (comment) #75197 (comment) #75197 (comment)

For the approach taken, see #75197 (comment)

Fixes #76702

ghstack-source-id: 4952fb2
Pull Request resolved: #76828
lezcano added 2 commits May 13, 2022 08:35
This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
Copy link
Collaborator

@ngimel ngimel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please rebase to get CI signal?

@lezcano
Copy link
Collaborator Author

lezcano commented May 14, 2022

I'm starting to think that all this test_jit failures may not be unrelated, as I just rebased it on viable/strict... I'll look further into them on Monday

lezcano added 2 commits May 17, 2022 16:11
This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
This PR implements the bulk of #64387

Part of the optimisations were already merged in #72230

A number of these optimisations include:
- Make the code `const` correct.
- Create `DimVector`'s more efficiently (e.g. prefer `append` over
`insert`).
- Access sizes of the tensors via `sizes().front()` / `sizes().back()`
  / `sizes().end()[-2]`
- Do not create intermediary tensors / vectors when it can be avoided.
- Call `reshape` rather than `expect_contiguous`  + `view`

On top of these, it fixes a correctness issue of `matmul_out`, where the
out parameter was not resized correctly when passed to the backends.
This involves removing the use of `set_` from the calling code, as
requested by ezyang, and it incurs on most of the complexity of the
code that this PR adds.

[ghstack-poisoned]
@lezcano
Copy link
Collaborator Author

lezcano commented May 18, 2022

@pytorchbot merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul open source topic: performance topic category with-ssh

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants