Skip to content

Add cuSOLVER path for torch.linalg.qr#56256

Closed
IvanYashchuk wants to merge 14 commits intogh/ivanyashchuk/17/basefrom
gh/ivanyashchuk/17/head
Closed

Add cuSOLVER path for torch.linalg.qr#56256
IvanYashchuk wants to merge 14 commits intogh/ivanyashchuk/17/basefrom
gh/ivanyashchuk/17/head

Conversation

@IvanYashchuk
Copy link
Copy Markdown
Collaborator

@IvanYashchuk IvanYashchuk commented Apr 16, 2021

Stack from ghstack:

Using cuSOLVER path with pytest test/test_ops.py -k 'linalg_qr' --durations=5 cuts the runtime for these tests by 1 minute locally. See #56256 (comment).

Performance comparison: #56256 (comment).

Differential Revision: D27960154

Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally.
Ref. #51552

[ghstack-poisoned]
@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Apr 16, 2021

💊 CI failures summary and remediations

As of commit 9cc5653 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 16, 2021
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally.
Ref. pytorch#51552

ghstack-source-id: 4f0cbb7
Pull Request resolved: pytorch#56256
@IvanYashchuk IvanYashchuk removed the request for review from ezyang April 16, 2021 10:03
@IvanYashchuk IvanYashchuk added the module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul label Apr 16, 2021
@IvanYashchuk IvanYashchuk requested a review from mruberry April 16, 2021 10:04
@IvanYashchuk
Copy link
Copy Markdown
Collaborator Author

Time spent for running pytest test/test_ops.py -k 'linalg_qr' --durations=5.
cuSOLVER:

====================================================== slowest 5 durations =======================================================
8.03s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_linalg_qr_cuda_complex64
2.67s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_linalg_qr_cuda_float32
2.65s call     test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_qr_cpu_complex128
1.73s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_linalg_qr_cuda_complex128
1.37s call     test/test_ops.py::TestOpInfoCUDA::test_duplicate_method_tests_linalg_qr_cuda_float32
================================= 49 passed, 41 skipped, 12294 deselected, 5 warnings in 31.98s ==================================

MAGMA:

====================================================== slowest 5 durations =======================================================
39.57s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_linalg_qr_cuda_complex128
11.12s call     test/test_ops.py::TestGradientsCUDA::test_fn_grad_linalg_qr_cuda_float64
5.31s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_linalg_qr_cuda_float32
5.28s call     test/test_ops.py::TestCommonCUDA::test_variant_consistency_jit_linalg_qr_cuda_complex64
2.75s call     test/test_ops.py::TestGradientsCPU::test_fn_gradgrad_linalg_qr_cpu_complex128
============================ 49 passed, 41 skipped, 12294 deselected, 5 warnings in 81.28s (0:01:21) =============================

@IvanYashchuk
Copy link
Copy Markdown
Collaborator Author

IvanYashchuk commented Apr 16, 2021

Here is MAGMA vs cuSOLVER comparison for non-batched square inputs for modes 'complete', 'reduced', 'r':

|                          | cuSOLVER, 'complete' | MAGMA, 'complete' | cuSOLVER, 'reduced' | MAGMA, 'reduced' | cuSOLVER, 'r' | MAGMA, 'r' |
|--------------------------|----------------------|-------------------|---------------------|------------------|---------------|------------|
| torch.Size([2, 2])       | 0.084                | 8.0               | 0.0774              | 7.6              | 0.0504        | 3.3        |
| torch.Size([8, 8])       | 0.0877               | 7.6               | 0.0872              | 8.1              | 0.0474        | 3.2        |
| torch.Size([16, 16])     | 0.158                | 7.6               | 0.1569              | 8.3              | 0.1577        | 3.3        |
| torch.Size([32, 32])     | 0.4164               | 7.6               | 0.413               | 8.5              | 0.2835        | 3.3        |
| torch.Size([64, 64])     | 0.9334               | 8.0               | 0.9257              | 8.4              | 0.6559        | 3.3        |
| torch.Size([128, 128])   | 2.0622               | 9.3               | 2.045               | 9.8              | 1.554         | 3.9        |
| torch.Size([256, 256])   | 3.5756               | 12.4              | 3.548               | 12.9             | 2.342         | 5.1        |
| torch.Size([512, 512])   | 8.6611               | 17.4              | 8.593               | 18.7             | 5.797         | 8.3        |
| torch.Size([1024, 1024]) | 23.4609              | 36.9              | 23.342              | 37.4             | 15.196        | 15.6       |
| torch.Size([2048, 2048]) | 92.3197              | 118.7             | 92.247              | 120.1            | 54.483        | 43.9       |
| torch.Size([4096, 4096]) | 497.0645             | 694.1             | 494.418             | 695.7            | 277.952       | 243.5      |
| torch.Size([8192, 8192]) | 3267.1995            | 4603.7            | 3250.727            | 4617.3           | 1713.537      | 1536.7     |

Times are in milliseconds (ms).

MAGMA is only faster than cuSOLVER for large size inputs and mode='r'. For all other cases cuSOLVER is better.

Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally.
Ref. #51552, #47953

[ghstack-poisoned]
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 16, 2021
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally.
Ref. pytorch#51552

ghstack-source-id: 4f5361a
Pull Request resolved: pytorch#56256
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally.
Ref. #51552, #47953

[ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally.
Ref. #51552, #47953

[ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally.
Ref. #51552, #47953

[ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally.
Ref. #51552, #47953

[ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment).

Performance comparison: #56256 (comment).

Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154)

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request Apr 26, 2021
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment).

Performance comparison: #56256 (comment).

Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154)

[ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment).

Performance comparison: #56256 (comment).

Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154)

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request Apr 26, 2021
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment).

Performance comparison: #56256 (comment).

Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154)

[ghstack-poisoned]
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 26, 2021
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally.
Ref. pytorch#51552

ghstack-source-id: 2f98cde
Pull Request resolved: pytorch#56256
@mruberry
Copy link
Copy Markdown
Collaborator

Time to start landing the second part of this stack!

@xwang233 would you take a look at this PR in the stack?

Copy link
Copy Markdown
Collaborator

@xwang233 xwang233 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is very concise and LGTM.

Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment).

Performance comparison: #56256 (comment).

Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154)

[ghstack-poisoned]
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 29, 2021
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally.
Ref. pytorch#51552

ghstack-source-id: e94b357
Pull Request resolved: pytorch#56256
IvanYashchuk added a commit that referenced this pull request Apr 29, 2021
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment).

Performance comparison: #56256 (comment).

Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154)

[ghstack-poisoned]
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment).

Performance comparison: #56256 (comment).

Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154)

[ghstack-poisoned]
IvanYashchuk added a commit that referenced this pull request Apr 29, 2021
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally. See #56256 (comment).

Performance comparison: #56256 (comment).

Differential Revision: [D27960154](https://our.internmc.facebook.com/intern/diff/D27960154)

[ghstack-poisoned]
IvanYashchuk added a commit to IvanYashchuk/pytorch that referenced this pull request Apr 29, 2021
Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally.
Ref. pytorch#51552

ghstack-source-id: 574f15d
Pull Request resolved: pytorch#56256
Copy link
Copy Markdown
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamped

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@mruberry merged this pull request in ff59039.

@facebook-github-bot facebook-github-bot deleted the gh/ivanyashchuk/17/head branch May 4, 2021 14:16
krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
Summary:
Pull Request resolved: pytorch#56256

Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally. See pytorch#56256 (comment).

Performance comparison: pytorch#56256 (comment).

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27960154

Pulled By: mruberry

fbshipit-source-id: 5312330d82337dec2856ec5527156a3a547a0b50
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
Summary:
Pull Request resolved: pytorch#56256

Using cuSOLVER path with `pytest test/test_ops.py -k 'linalg_qr'
--durations=5` cuts the runtime for these tests by 1 minute locally. See pytorch#56256 (comment).

Performance comparison: pytorch#56256 (comment).

Test Plan: Imported from OSS

Reviewed By: ngimel

Differential Revision: D27960154

Pulled By: mruberry

fbshipit-source-id: 5312330d82337dec2856ec5527156a3a547a0b50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul open source

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants