Skip to content

Added cuBLAS path for torch.linalg.lstsq#54725

Closed
IvanYashchuk wants to merge 28 commits intogh/ivanyashchuk/7/basefrom
gh/ivanyashchuk/7/head
Closed

Added cuBLAS path for torch.linalg.lstsq#54725
IvanYashchuk wants to merge 28 commits intogh/ivanyashchuk/7/basefrom
gh/ivanyashchuk/7/head

Conversation

@IvanYashchuk
Copy link
Copy Markdown
Collaborator

@IvanYashchuk IvanYashchuk commented Mar 25, 2021

Stack from ghstack:

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Performance comparison cuSOLVER vs cuBLAS: #54725 (comment).
Performance comparison MAGMA vs cuBLAS: #54725 (comment).

Differential Revision: D28248803

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Mar 25, 2021

💊 CI failures summary and remediations

As of commit c01d3ce (more details on the Dr. CI page):


  • 1/1 failures possibly* introduced in this PR
    • 1/1 non-scanned failure(s)

ci.pytorch.org: 1 failed


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request Mar 25, 2021
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

ghstack-source-id: 84c6d16
Pull Request resolved: #54725
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Ref. #47953

[ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Ref. #47953

[ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Ref. #47953

[ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Ref. #47953

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request Mar 30, 2021
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

ghstack-source-id: f57250a
Pull Request resolved: #54725
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Ref. #47953

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request Mar 30, 2021
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

ghstack-source-id: 428f285
Pull Request resolved: #54725
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Ref. #47953

[ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Ref. #47953

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request Mar 30, 2021
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

ghstack-source-id: 2924eff
Pull Request resolved: #54725
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Ref. #47953

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request May 4, 2021
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Performance comparison cuSOLVER vs cuBLAS: #54725 (comment).
Performance comparison MAGMA vs cuBLAS: #54725 (comment).

[ghstack-poisoned]
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Performance comparison cuSOLVER vs cuBLAS: #54725 (comment).
Performance comparison MAGMA vs cuBLAS: #54725 (comment).

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request May 4, 2021
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Performance comparison cuSOLVER vs cuBLAS: #54725 (comment).
Performance comparison MAGMA vs cuBLAS: #54725 (comment).

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request May 4, 2021
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

ghstack-source-id: 2b4e3a6
Pull Request resolved: #54725
@mruberry
Copy link
Copy Markdown
Collaborator

mruberry commented May 6, 2021

Rest of this stack looks good; how's this diff going, @IvanYashchuk, @xwang233?

I'll start landing the other PRs in the stack now.

Copy link
Copy Markdown
Collaborator

@xwang233 xwang233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the work!

Copy link
Copy Markdown
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamped; thanks @IvanYashchuk, @xwang233!

@mruberry
Copy link
Copy Markdown
Collaborator

mruberry commented May 6, 2021

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Performance comparison cuSOLVER vs cuBLAS: #54725 (comment).
Performance comparison MAGMA vs cuBLAS: #54725 (comment).

Differential Revision: [D28248803](https://our.internmc.facebook.com/intern/diff/D28248803)

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request May 7, 2021
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Performance comparison cuSOLVER vs cuBLAS: #54725 (comment).
Performance comparison MAGMA vs cuBLAS: #54725 (comment).

Differential Revision: [D28248803](https://our.internmc.facebook.com/intern/diff/D28248803)

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request May 7, 2021
cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

ghstack-source-id: 4df8d67
Pull Request resolved: #54725
@mruberry
Copy link
Copy Markdown
Collaborator

mruberry commented May 8, 2021

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@mruberry merged this pull request in e7e7319.

@facebook-github-bot facebook-github-bot deleted the gh/ivanyashchuk/7/head branch May 13, 2021 14:17
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Pull Request resolved: pytorch#54725

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Performance comparison cuSOLVER vs cuBLAS: pytorch#54725 (comment).
Performance comparison MAGMA vs cuBLAS: pytorch#54725 (comment).

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D28248803

Pulled By: mruberry

fbshipit-source-id: d3661bccb85c6fc1cee3a246ae8233492964f400
@IvanYashchuk IvanYashchuk added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Jun 8, 2021
pytorchmergebot pushed a commit that referenced this pull request Mar 22, 2022
…atrices

This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA).

Performance results (also previously reported at #54725 (comment)):
```
|                            | before current PR | current PR | speedup |
|----------------------------|-------------------|------------|---------|
| torch.Size([32, 32, 32])   | 870               | 440        | 2x      |
| torch.Size([64, 32, 32])   | 1340              | 450        | 3x      |
| torch.Size([32, 64, 64])   | 9040              | 1839       | 5x      |
| torch.Size([64, 64, 64])   | 17000             | 1830       | 9.2x    |
| torch.Size([32, 128, 128]) | 23210             | 8560       | 2.7x    |
| torch.Size([64, 128, 128]) | 40000             | 8662       | 4.6x    |
| torch.Size([32, 256, 256]) | 58160             | 46150      | 1.2x    |
| torch.Size([64, 256, 256]) | 73220             | 52080      | 1.4x    |
Times are in microseconds (us).
```

Pull Request resolved: #74434
Approved by: https://github.com/mruberry
facebook-github-bot pushed a commit that referenced this pull request Mar 23, 2022
…atrices (#74434)

Summary:
This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA).

Performance results (also previously reported at #54725 (comment)):
```
|                            | before current PR | current PR | speedup |
|----------------------------|-------------------|------------|---------|
| torch.Size([32, 32, 32])   | 870               | 440        | 2x      |
| torch.Size([64, 32, 32])   | 1340              | 450        | 3x      |
| torch.Size([32, 64, 64])   | 9040              | 1839       | 5x      |
| torch.Size([64, 64, 64])   | 17000             | 1830       | 9.2x    |
| torch.Size([32, 128, 128]) | 23210             | 8560       | 2.7x    |
| torch.Size([64, 128, 128]) | 40000             | 8662       | 4.6x    |
| torch.Size([32, 256, 256]) | 58160             | 46150      | 1.2x    |
| torch.Size([64, 256, 256]) | 73220             | 52080      | 1.4x    |
Times are in microseconds (us).
```

Pull Request resolved: #74434
Approved by: https://github.com/mruberry

Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4de870f6040d2799acc11b9bdeb6508eb6fd33d9

Reviewed By: malfet

Differential Revision: D35047957

fbshipit-source-id: c21c3fdabcf2fc747089e5915fe66561760602c3
shahofblah pushed a commit that referenced this pull request Mar 25, 2022
…atrices

This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA).

Performance results (also previously reported at #54725 (comment)):
```
|                            | before current PR | current PR | speedup |
|----------------------------|-------------------|------------|---------|
| torch.Size([32, 32, 32])   | 870               | 440        | 2x      |
| torch.Size([64, 32, 32])   | 1340              | 450        | 3x      |
| torch.Size([32, 64, 64])   | 9040              | 1839       | 5x      |
| torch.Size([64, 64, 64])   | 17000             | 1830       | 9.2x    |
| torch.Size([32, 128, 128]) | 23210             | 8560       | 2.7x    |
| torch.Size([64, 128, 128]) | 40000             | 8662       | 4.6x    |
| torch.Size([32, 256, 256]) | 58160             | 46150      | 1.2x    |
| torch.Size([64, 256, 256]) | 73220             | 52080      | 1.4x    |
Times are in microseconds (us).
```

Pull Request resolved: #74434
Approved by: https://github.com/mruberry
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
Summary:
Pull Request resolved: pytorch#54725

cuBLAS's gelsBatched is faster than MAGMA's for matrices with rows less
than 128.

Performance comparison cuSOLVER vs cuBLAS: pytorch#54725 (comment).
Performance comparison MAGMA vs cuBLAS: pytorch#54725 (comment).

Test Plan: Imported from OSS

Reviewed By: albanD

Differential Revision: D28248803

Pulled By: mruberry

fbshipit-source-id: d3661bccb85c6fc1cee3a246ae8233492964f400
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
…atrices

This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA).

Performance results (also previously reported at pytorch#54725 (comment)):
```
|                            | before current PR | current PR | speedup |
|----------------------------|-------------------|------------|---------|
| torch.Size([32, 32, 32])   | 870               | 440        | 2x      |
| torch.Size([64, 32, 32])   | 1340              | 450        | 3x      |
| torch.Size([32, 64, 64])   | 9040              | 1839       | 5x      |
| torch.Size([64, 64, 64])   | 17000             | 1830       | 9.2x    |
| torch.Size([32, 128, 128]) | 23210             | 8560       | 2.7x    |
| torch.Size([64, 128, 128]) | 40000             | 8662       | 4.6x    |
| torch.Size([32, 256, 256]) | 58160             | 46150      | 1.2x    |
| torch.Size([64, 256, 256]) | 73220             | 52080      | 1.4x    |
Times are in microseconds (us).
```

Pull Request resolved: pytorch#74434
Approved by: https://github.com/mruberry
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants