Enable faster cuBLAS path for torch.linalg.lstsq for batch of small matrices#74434
Enable faster cuBLAS path for torch.linalg.lstsq for batch of small matrices#74434IvanYashchuk wants to merge 2 commits intopytorch:masterfrom
Conversation
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit 42b868c (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
| #endif // AT_MAGMA_ENABLED() | ||
| // On ROCm platform we can only use MAGMA here | ||
| // If MAGMA is not available, an error will be thrown | ||
| gels_magma(a, b, infos); |
There was a problem hiding this comment.
Why change the behavior on ROCm?
There was a problem hiding this comment.
It doesn't change it. It makes it more explicit that on ROCm magma function is called. Previously the chain of calls was "this_function() -> gels_looped() -> gels_magma()", now it's "this_function() -> gels_magma()".
There was a problem hiding this comment.
Cool, thanks for the explanation
mruberry
left a comment
There was a problem hiding this comment.
Cool! -- One question for you, @IvanYashchuk
|
@pytorchbot merge this please |
|
Hey @IvanYashchuk. |
…atrices (#74434) Summary: This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA). Performance results (also previously reported at #54725 (comment)): ``` | | before current PR | current PR | speedup | |----------------------------|-------------------|------------|---------| | torch.Size([32, 32, 32]) | 870 | 440 | 2x | | torch.Size([64, 32, 32]) | 1340 | 450 | 3x | | torch.Size([32, 64, 64]) | 9040 | 1839 | 5x | | torch.Size([64, 64, 64]) | 17000 | 1830 | 9.2x | | torch.Size([32, 128, 128]) | 23210 | 8560 | 2.7x | | torch.Size([64, 128, 128]) | 40000 | 8662 | 4.6x | | torch.Size([32, 256, 256]) | 58160 | 46150 | 1.2x | | torch.Size([64, 256, 256]) | 73220 | 52080 | 1.4x | Times are in microseconds (us). ``` Pull Request resolved: #74434 Approved by: https://github.com/mruberry Test Plan: contbuild & OSS CI, see https://hud.pytorch.org/commit/pytorch/pytorch/4de870f6040d2799acc11b9bdeb6508eb6fd33d9 Reviewed By: malfet Differential Revision: D35047957 fbshipit-source-id: c21c3fdabcf2fc747089e5915fe66561760602c3
…atrices This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA). Performance results (also previously reported at #54725 (comment)): ``` | | before current PR | current PR | speedup | |----------------------------|-------------------|------------|---------| | torch.Size([32, 32, 32]) | 870 | 440 | 2x | | torch.Size([64, 32, 32]) | 1340 | 450 | 3x | | torch.Size([32, 64, 64]) | 9040 | 1839 | 5x | | torch.Size([64, 64, 64]) | 17000 | 1830 | 9.2x | | torch.Size([32, 128, 128]) | 23210 | 8560 | 2.7x | | torch.Size([64, 128, 128]) | 40000 | 8662 | 4.6x | | torch.Size([32, 256, 256]) | 58160 | 46150 | 1.2x | | torch.Size([64, 256, 256]) | 73220 | 52080 | 1.4x | Times are in microseconds (us). ``` Pull Request resolved: #74434 Approved by: https://github.com/mruberry
…atrices This PR enables cuBLAS path for `torch.linalg.lstsq`. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA). Performance results (also previously reported at pytorch#54725 (comment)): ``` | | before current PR | current PR | speedup | |----------------------------|-------------------|------------|---------| | torch.Size([32, 32, 32]) | 870 | 440 | 2x | | torch.Size([64, 32, 32]) | 1340 | 450 | 3x | | torch.Size([32, 64, 64]) | 9040 | 1839 | 5x | | torch.Size([64, 64, 64]) | 17000 | 1830 | 9.2x | | torch.Size([32, 128, 128]) | 23210 | 8560 | 2.7x | | torch.Size([64, 128, 128]) | 40000 | 8662 | 4.6x | | torch.Size([32, 256, 256]) | 58160 | 46150 | 1.2x | | torch.Size([64, 256, 256]) | 73220 | 52080 | 1.4x | Times are in microseconds (us). ``` Pull Request resolved: pytorch#74434 Approved by: https://github.com/mruberry
This PR enables cuBLAS path for
torch.linalg.lstsq. Before this PR only cuSOLVER path was used for regular PyTorch builds (when built with MAGMA).Performance results (also previously reported at #54725 (comment)):