Added cuBLAS path for torch.linalg.lstsq#54725
Closed
IvanYashchuk wants to merge 28 commits intogh/ivanyashchuk/7/basefrom
Closed
Added cuBLAS path for torch.linalg.lstsq#54725IvanYashchuk wants to merge 28 commits intogh/ivanyashchuk/7/basefrom
IvanYashchuk wants to merge 28 commits intogh/ivanyashchuk/7/basefrom
Conversation
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. [ghstack-poisoned]
This was referenced Mar 25, 2021
Contributor
💊 CI failures summary and remediationsAs of commit c01d3ce (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. [ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]
This was referenced May 4, 2021
IvanYashchuk
added a commit
that referenced
this pull request
May 4, 2021
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: #54725 (comment). Performance comparison MAGMA vs cuBLAS: #54725 (comment). [ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: #54725 (comment). Performance comparison MAGMA vs cuBLAS: #54725 (comment). [ghstack-poisoned]
IvanYashchuk
added a commit
that referenced
this pull request
May 4, 2021
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: #54725 (comment). Performance comparison MAGMA vs cuBLAS: #54725 (comment). [ghstack-poisoned]
Collaborator
|
Rest of this stack looks good; how's this diff going, @IvanYashchuk, @xwang233? I'll start landing the other PRs in the stack now. |
xwang233
approved these changes
May 6, 2021
Collaborator
xwang233
left a comment
There was a problem hiding this comment.
LGTM, thanks for the work!
mruberry
approved these changes
May 6, 2021
Collaborator
mruberry
left a comment
There was a problem hiding this comment.
Stamped; thanks @IvanYashchuk, @xwang233!
Collaborator
|
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: #54725 (comment). Performance comparison MAGMA vs cuBLAS: #54725 (comment). Differential Revision: [D28248803](https://our.internmc.facebook.com/intern/diff/D28248803) [ghstack-poisoned]
IvanYashchuk
added a commit
that referenced
this pull request
May 7, 2021
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: #54725 (comment). Performance comparison MAGMA vs cuBLAS: #54725 (comment). Differential Revision: [D28248803](https://our.internmc.facebook.com/intern/diff/D28248803) [ghstack-poisoned]
Collaborator
|
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
sab-regime
approved these changes
May 8, 2021
Contributor
krshrimali
pushed a commit
to krshrimali/pytorch
that referenced
this pull request
May 19, 2021
Summary: Pull Request resolved: pytorch#54725 cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: pytorch#54725 (comment). Performance comparison MAGMA vs cuBLAS: pytorch#54725 (comment). Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28248803 Pulled By: mruberry fbshipit-source-id: d3661bccb85c6fc1cee3a246ae8233492964f400
pytorchmergebot
pushed a commit
that referenced
this pull request
Mar 22, 2022
…atrices This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA). Performance results (also previously reported at #54725 (comment)): ``` | | before current PR | current PR | speedup | |----------------------------|-------------------|------------|---------| | torch.Size([32, 32, 32]) | 870 | 440 | 2x | | torch.Size([64, 32, 32]) | 1340 | 450 | 3x | | torch.Size([32, 64, 64]) | 9040 | 1839 | 5x | | torch.Size([64, 64, 64]) | 17000 | 1830 | 9.2x | | torch.Size([32, 128, 128]) | 23210 | 8560 | 2.7x | | torch.Size([64, 128, 128]) | 40000 | 8662 | 4.6x | | torch.Size([32, 256, 256]) | 58160 | 46150 | 1.2x | | torch.Size([64, 256, 256]) | 73220 | 52080 | 1.4x | Times are in microseconds (us). ``` Pull Request resolved: #74434 Approved by: https://github.com/mruberry
facebook-github-bot
pushed a commit
that referenced
this pull request
Mar 23, 2022
…atrices (#74434) Summary: This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA). Performance results (also previously reported at #54725 (comment)): ``` | | before current PR | current PR | speedup | |----------------------------|-------------------|------------|---------| | torch.Size([32, 32, 32]) | 870 | 440 | 2x | | torch.Size([64, 32, 32]) | 1340 | 450 | 3x | | torch.Size([32, 64, 64]) | 9040 | 1839 | 5x | | torch.Size([64, 64, 64]) | 17000 | 1830 | 9.2x | | torch.Size([32, 128, 128]) | 23210 | 8560 | 2.7x | | torch.Size([64, 128, 128]) | 40000 | 8662 | 4.6x | | torch.Size([32, 256, 256]) | 58160 | 46150 | 1.2x | | torch.Size([64, 256, 256]) | 73220 | 52080 | 1.4x | Times are in microseconds (us). ``` Pull Request resolved: #74434 Approved by: https://github.com/mruberry Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4de870f6040d2799acc11b9bdeb6508eb6fd33d9 Reviewed By: malfet Differential Revision: D35047957 fbshipit-source-id: c21c3fdabcf2fc747089e5915fe66561760602c3
shahofblah
pushed a commit
that referenced
this pull request
Mar 25, 2022
…atrices This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA). Performance results (also previously reported at #54725 (comment)): ``` | | before current PR | current PR | speedup | |----------------------------|-------------------|------------|---------| | torch.Size([32, 32, 32]) | 870 | 440 | 2x | | torch.Size([64, 32, 32]) | 1340 | 450 | 3x | | torch.Size([32, 64, 64]) | 9040 | 1839 | 5x | | torch.Size([64, 64, 64]) | 17000 | 1830 | 9.2x | | torch.Size([32, 128, 128]) | 23210 | 8560 | 2.7x | | torch.Size([64, 128, 128]) | 40000 | 8662 | 4.6x | | torch.Size([32, 256, 256]) | 58160 | 46150 | 1.2x | | torch.Size([64, 256, 256]) | 73220 | 52080 | 1.4x | Times are in microseconds (us). ``` Pull Request resolved: #74434 Approved by: https://github.com/mruberry
laurentdupin
pushed a commit
to laurentdupin/pytorch
that referenced
this pull request
Apr 25, 2026
Summary: Pull Request resolved: pytorch#54725 cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: pytorch#54725 (comment). Performance comparison MAGMA vs cuBLAS: pytorch#54725 (comment). Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28248803 Pulled By: mruberry fbshipit-source-id: d3661bccb85c6fc1cee3a246ae8233492964f400
laurentdupin
pushed a commit
to laurentdupin/pytorch
that referenced
this pull request
Apr 25, 2026
…atrices This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA). Performance results (also previously reported at pytorch#54725 (comment)): ``` | | before current PR | current PR | speedup | |----------------------------|-------------------|------------|---------| | torch.Size([32, 32, 32]) | 870 | 440 | 2x | | torch.Size([64, 32, 32]) | 1340 | 450 | 3x | | torch.Size([32, 64, 64]) | 9040 | 1839 | 5x | | torch.Size([64, 64, 64]) | 17000 | 1830 | 9.2x | | torch.Size([32, 128, 128]) | 23210 | 8560 | 2.7x | | torch.Size([64, 128, 128]) | 40000 | 8662 | 4.6x | | torch.Size([32, 256, 256]) | 58160 | 46150 | 1.2x | | torch.Size([64, 256, 256]) | 73220 | 52080 | 1.4x | Times are in microseconds (us). ``` Pull Request resolved: pytorch#74434 Approved by: https://github.com/mruberry
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stack from ghstack:
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.
Performance comparison cuSOLVER vs cuBLAS: #54725 (comment).
Performance comparison MAGMA vs cuBLAS: #54725 (comment).
Differential Revision: D28248803