Enable cusolver potrf batched for Cholesky decomposition when cuda >= 11.3#57788
Enable cusolver potrf batched for Cholesky decomposition when cuda >= 11.3#57788
Conversation
💊 CI failures summary and remediationsAs of commit 003fe38 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
reserved |
Codecov Report
@@ Coverage Diff @@
## master #57788 +/- ##
==========================================
- Coverage 76.83% 76.83% -0.01%
==========================================
Files 1986 1986
Lines 197430 197430
==========================================
- Hits 151691 151690 -1
- Misses 45739 45740 +1 |
|
If both cuSOLVER and MAGMA are available and CUDA version is < 11.3 we should continue using batched MAGMA as it has better performance, than single input cuSOLVER variant called in a loop, right? This PR modifies the behavior for < 11.3 versions to use looped cuSOLVER instead of batched MAGMA. Besides that dispatch issue, everything looks good. |
|
Ohhh, yes, you're right. Let me fix that dispatch logic. 😄 |
…holesky-batched_cuda11.3
|
|
||
| // Implementation of Cholesky decomposition using batched cusolverDn<T>potrfBatched | ||
| // Warning: cusolverDn<T>potrfBatched doesn't work quite well when matrix size or batch size is zero. | ||
| // If you write your own C++ extension and use this function, make sure you do a zero numel check for the input. |
| #define USE_CUSOLVER | ||
| #endif | ||
|
|
||
| // cusolverDn<T>potrfBatched may have numerical issue before cuda 11.3 release, |
|
@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
… 11.3 (pytorch#57788) Summary: This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (`torch.linalg.cholesky` and `torch.linalg.cholesky_ex`) when cuda version is greater than or equal to 11.3. Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases. ## cholesky dispatch heuristics: ### before: - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched ### after: cuda >= 11.3: - batch size == 1: cusolver potrf - batch size > 1: cusolver potrf batched cuda < 11.3 (not changed): - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched --- See also pytorch#42666 pytorch#47953 pytorch#53104 pytorch#53879 Pull Request resolved: pytorch#57788 Reviewed By: ngimel Differential Revision: D28345530 Pulled By: mruberry fbshipit-source-id: 3022cf73b2750e1953c0e00a9e8b093dfc551f61
… 11.3 (pytorch#57788) Summary: This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (`torch.linalg.cholesky` and `torch.linalg.cholesky_ex`) when cuda version is greater than or equal to 11.3. Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases. ## cholesky dispatch heuristics: ### before: - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched ### after: cuda >= 11.3: - batch size == 1: cusolver potrf - batch size > 1: cusolver potrf batched cuda < 11.3 (not changed): - batch size == 1: cusolver potrf - batch size > 1: magma xpotrf batched --- See also pytorch#42666 pytorch#47953 pytorch#53104 pytorch#53879 Pull Request resolved: pytorch#57788 Reviewed By: ngimel Differential Revision: D28345530 Pulled By: mruberry fbshipit-source-id: 3022cf73b2750e1953c0e00a9e8b093dfc551f61
This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (
torch.linalg.choleskyandtorch.linalg.cholesky_ex) when cuda version is greater than or equal to 11.3.Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases.
cholesky dispatch heuristics:
before:
after:
cuda >= 11.3:
cuda < 11.3 (not changed):
See also #42666 #47953 #53104 #53879