Added cuBLAS path for torch.linalg.lstsq by IvanYashchuk · Pull Request #54725 · pytorch/pytorch

IvanYashchuk · 2021-03-25T19:44:52Z

Stack from ghstack:

Added cuBLAS path for torch.linalg.lstsq #54725 Added cuBLAS path for torch.linalg.lstsq
Add cuSOLVER path for torch.linalg.lstsq #57317 Add cuSOLVER path for torch.linalg.lstsq

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Performance comparison cuSOLVER vs cuBLAS: #54725 (comment).
Performance comparison MAGMA vs cuBLAS: #54725 (comment).

Differential Revision: D28248803

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. [ghstack-poisoned]

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: 4ebab5f Pull Request resolved: #54725

facebook-github-bot · 2021-03-25T19:45:22Z

💊 CI failures summary and remediations

As of commit c01d3ce (more details on the Dr. CI page):

1/1 failures possibly* introduced in this PR
- 1/1 non-scanned failure(s)

ci.pytorch.org: 1 failed

Failed: pr/pytorch-linux-bionic-rocm4.1-py3.6

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. [ghstack-poisoned]

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: 84c6d16 Pull Request resolved: #54725

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: f57250a Pull Request resolved: #54725

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: 428f285 Pull Request resolved: #54725

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: 2924eff Pull Request resolved: #54725

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: #54725 (comment). Performance comparison MAGMA vs cuBLAS: #54725 (comment). [ghstack-poisoned]

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: 2b4e3a6 Pull Request resolved: #54725

mruberry · 2021-05-06T00:02:00Z

Rest of this stack looks good; how's this diff going, @IvanYashchuk, @xwang233?

I'll start landing the other PRs in the stack now.

xwang233

LGTM, thanks for the work!

mruberry

Stamped; thanks @IvanYashchuk, @xwang233!

mruberry · 2021-05-06T05:45:38Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: #54725 (comment). Performance comparison MAGMA vs cuBLAS: #54725 (comment). Differential Revision: [D28248803](https://our.internmc.facebook.com/intern/diff/D28248803) [ghstack-poisoned]

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: 4df8d67 Pull Request resolved: #54725

mruberry · 2021-05-08T23:43:27Z

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-05-10T04:20:33Z

@mruberry merged this pull request in e7e7319.

Summary: Pull Request resolved: pytorch#54725 cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: pytorch#54725 (comment). Performance comparison MAGMA vs cuBLAS: pytorch#54725 (comment). Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28248803 Pulled By: mruberry fbshipit-source-id: d3661bccb85c6fc1cee3a246ae8233492964f400

…atrices This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA). Performance results (also previously reported at #54725 (comment)): ``` | | before current PR | current PR | speedup | |----------------------------|-------------------|------------|---------| | torch.Size([32, 32, 32]) | 870 | 440 | 2x | | torch.Size([64, 32, 32]) | 1340 | 450 | 3x | | torch.Size([32, 64, 64]) | 9040 | 1839 | 5x | | torch.Size([64, 64, 64]) | 17000 | 1830 | 9.2x | | torch.Size([32, 128, 128]) | 23210 | 8560 | 2.7x | | torch.Size([64, 128, 128]) | 40000 | 8662 | 4.6x | | torch.Size([32, 256, 256]) | 58160 | 46150 | 1.2x | | torch.Size([64, 256, 256]) | 73220 | 52080 | 1.4x | Times are in microseconds (us). ``` Pull Request resolved: #74434 Approved by: https://github.com/mruberry

…atrices (#74434) Summary: This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA). Performance results (also previously reported at #54725 (comment)): ``` | | before current PR | current PR | speedup | |----------------------------|-------------------|------------|---------| | torch.Size([32, 32, 32]) | 870 | 440 | 2x | | torch.Size([64, 32, 32]) | 1340 | 450 | 3x | | torch.Size([32, 64, 64]) | 9040 | 1839 | 5x | | torch.Size([64, 64, 64]) | 17000 | 1830 | 9.2x | | torch.Size([32, 128, 128]) | 23210 | 8560 | 2.7x | | torch.Size([64, 128, 128]) | 40000 | 8662 | 4.6x | | torch.Size([32, 256, 256]) | 58160 | 46150 | 1.2x | | torch.Size([64, 256, 256]) | 73220 | 52080 | 1.4x | Times are in microseconds (us). ``` Pull Request resolved: #74434 Approved by: https://github.com/mruberry Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4de870f6040d2799acc11b9bdeb6508eb6fd33d9 Reviewed By: malfet Differential Revision: D35047957 fbshipit-source-id: c21c3fdabcf2fc747089e5915fe66561760602c3

…atrices This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA). Performance results (also previously reported at #54725 (comment)): ``` | | before current PR | current PR | speedup | |----------------------------|-------------------|------------|---------| | torch.Size([32, 32, 32]) | 870 | 440 | 2x | | torch.Size([64, 32, 32]) | 1340 | 450 | 3x | | torch.Size([32, 64, 64]) | 9040 | 1839 | 5x | | torch.Size([64, 64, 64]) | 17000 | 1830 | 9.2x | | torch.Size([32, 128, 128]) | 23210 | 8560 | 2.7x | | torch.Size([64, 128, 128]) | 40000 | 8662 | 4.6x | | torch.Size([32, 256, 256]) | 58160 | 46150 | 1.2x | | torch.Size([64, 256, 256]) | 73220 | 52080 | 1.4x | Times are in microseconds (us). ``` Pull Request resolved: #74434 Approved by: https://github.com/mruberry

Summary: Pull Request resolved: pytorch#54725 cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: pytorch#54725 (comment). Performance comparison MAGMA vs cuBLAS: pytorch#54725 (comment). Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D28248803 Pulled By: mruberry fbshipit-source-id: d3661bccb85c6fc1cee3a246ae8233492964f400

…atrices This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA). Performance results (also previously reported at pytorch#54725 (comment)): ``` | | before current PR | current PR | speedup | |----------------------------|-------------------|------------|---------| | torch.Size([32, 32, 32]) | 870 | 440 | 2x | | torch.Size([64, 32, 32]) | 1340 | 450 | 3x | | torch.Size([32, 64, 64]) | 9040 | 1839 | 5x | | torch.Size([64, 64, 64]) | 17000 | 1830 | 9.2x | | torch.Size([32, 128, 128]) | 23210 | 8560 | 2.7x | | torch.Size([64, 128, 128]) | 40000 | 8662 | 4.6x | | torch.Size([32, 256, 256]) | 58160 | 46150 | 1.2x | | torch.Size([64, 256, 256]) | 73220 | 52080 | 1.4x | Times are in microseconds (us). ``` Pull Request resolved: pytorch#74434 Approved by: https://github.com/mruberry

Added cuBLAS path for torch.linalg.lstsq

e1f1ad5

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. [ghstack-poisoned]

IvanYashchuk added a commit that referenced this pull request Mar 25, 2021

Added cuBLAS path for torch.linalg.lstsq

3a65d70

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: 4ebab5f Pull Request resolved: #54725

facebook-github-bot added the cla signed label Mar 25, 2021

pytorchbot added the open source label Mar 25, 2021

IvanYashchuk requested a review from mruberry March 25, 2021 19:47

Update on "Added cuBLAS path for torch.linalg.lstsq"

aaee06b

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. [ghstack-poisoned]

IvanYashchuk added a commit that referenced this pull request Mar 25, 2021

Added cuBLAS path for torch.linalg.lstsq

3932d71

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: 84c6d16 Pull Request resolved: #54725

Update on "Added cuBLAS path for torch.linalg.lstsq"

f3377ac

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]

Update on "Added cuBLAS path for torch.linalg.lstsq"

daf7a4b

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]

IvanYashchuk mentioned this pull request Mar 29, 2021

Linear algebra GPU backend tracking issue [magma/cusolver/cublas] #47953

Open

Update on "Added cuBLAS path for torch.linalg.lstsq"

991c86b

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]

Update on "Added cuBLAS path for torch.linalg.lstsq"

d4e8e22

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]

IvanYashchuk added a commit that referenced this pull request Mar 30, 2021

Added cuBLAS path for torch.linalg.lstsq

ef658a3

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: f57250a Pull Request resolved: #54725

Update on "Added cuBLAS path for torch.linalg.lstsq"

33f6cf7

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]

IvanYashchuk added a commit that referenced this pull request Mar 30, 2021

Added cuBLAS path for torch.linalg.lstsq

c6f5ae2

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: 428f285 Pull Request resolved: #54725

Update on "Added cuBLAS path for torch.linalg.lstsq"

3277233

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]

Update on "Added cuBLAS path for torch.linalg.lstsq"

c789bbc

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]

IvanYashchuk added a commit that referenced this pull request Mar 30, 2021

Added cuBLAS path for torch.linalg.lstsq

b0d22ed

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: 2924eff Pull Request resolved: #54725

Update on "Added cuBLAS path for torch.linalg.lstsq"

fc173b8

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Ref. #47953 [ghstack-poisoned]

This was referenced May 4, 2021

Add CUDA support for torch.ormqr #57316

Closed

Add cuSOLVER path for torch.linalg.lstsq #57317

Closed

Update on "Added cuBLAS path for torch.linalg.lstsq"

0da995f

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. Performance comparison cuSOLVER vs cuBLAS: #54725 (comment). Performance comparison MAGMA vs cuBLAS: #54725 (comment). [ghstack-poisoned]

IvanYashchuk added a commit that referenced this pull request May 4, 2021

Added cuBLAS path for torch.linalg.lstsq

616e58c

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: 2b4e3a6 Pull Request resolved: #54725

xwang233 approved these changes May 6, 2021

View reviewed changes

mruberry approved these changes May 6, 2021

View reviewed changes

IvanYashchuk added a commit that referenced this pull request May 7, 2021

Added cuBLAS path for torch.linalg.lstsq

92df9c1

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less than 128. ghstack-source-id: 4df8d67 Pull Request resolved: #54725

sab-regime approved these changes May 8, 2021

View reviewed changes

facebook-github-bot closed this in e7e7319 May 10, 2021

facebook-github-bot added the Merged label May 10, 2021

facebook-github-bot deleted the gh/ivanyashchuk/7/head branch May 13, 2021 14:17

IvanYashchuk added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Jun 8, 2021

IvanYashchuk mentioned this pull request Mar 19, 2022

Enable faster cuBLAS path for torch.linalg.lstsq for batch of small matrices #74434

Closed

cleonard530 mentioned this pull request Mar 6, 2026

Added 'gelsd' to CUDA torch.linalg.lstsq driver so that underdetermined least square systems can be solved on CUDA #176746

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added cuBLAS path for torch.linalg.lstsq#54725

Added cuBLAS path for torch.linalg.lstsq#54725
IvanYashchuk wants to merge 28 commits intogh/ivanyashchuk/7/basefrom
gh/ivanyashchuk/7/head

IvanYashchuk commented Mar 25, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Mar 25, 2021 •

edited

Loading

Uh oh!

mruberry commented May 6, 2021

Uh oh!

xwang233 left a comment

Uh oh!

mruberry left a comment

Uh oh!

mruberry commented May 6, 2021

Uh oh!

mruberry commented May 8, 2021

Uh oh!

facebook-github-bot commented May 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

IvanYashchuk commented Mar 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Mar 25, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

ci.pytorch.org: 1 failed

Uh oh!

mruberry commented May 6, 2021

Uh oh!

xwang233 left a comment

Choose a reason for hiding this comment

Uh oh!

mruberry left a comment

Choose a reason for hiding this comment

Uh oh!

mruberry commented May 6, 2021

Uh oh!

mruberry commented May 8, 2021

Uh oh!

facebook-github-bot commented May 10, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

IvanYashchuk commented Mar 25, 2021 •

edited

Loading

facebook-github-bot commented Mar 25, 2021 •

edited

Loading