Skip to content

Enable cusolver potrf batched for Cholesky decomposition when cuda >= 11.3#57788

Closed
xwang233 wants to merge 4 commits intomasterfrom
ci-all/cusolver-cholesky-batched_cuda11.3
Closed

Enable cusolver potrf batched for Cholesky decomposition when cuda >= 11.3#57788
xwang233 wants to merge 4 commits intomasterfrom
ci-all/cusolver-cholesky-batched_cuda11.3

Conversation

@xwang233
Copy link
Copy Markdown
Collaborator

@xwang233 xwang233 commented May 7, 2021

This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (torch.linalg.cholesky and torch.linalg.cholesky_ex) when cuda version is greater than or equal to 11.3.

Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases.

cholesky dispatch heuristics:

before:

  • batch size == 1: cusolver potrf
  • batch size > 1: magma xpotrf batched

after:

cuda >= 11.3:

  • batch size == 1: cusolver potrf
  • batch size > 1: cusolver potrf batched

cuda < 11.3 (not changed):

  • batch size == 1: cusolver potrf
  • batch size > 1: magma xpotrf batched

See also #42666 #47953 #53104 #53879

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented May 7, 2021

💊 CI failures summary and remediations

As of commit 003fe38 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@xwang233
Copy link
Copy Markdown
Collaborator Author

xwang233 commented May 7, 2021

reserved

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2021

Codecov Report

Merging #57788 (003fe38) into master (747312b) will decrease coverage by 0.00%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master   #57788      +/-   ##
==========================================
- Coverage   76.83%   76.83%   -0.01%     
==========================================
  Files        1986     1986              
  Lines      197430   197430              
==========================================
- Hits       151691   151690       -1     
- Misses      45739    45740       +1     

@ngimel ngimel added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label May 7, 2021
@IvanYashchuk
Copy link
Copy Markdown
Collaborator

If both cuSOLVER and MAGMA are available and CUDA version is < 11.3 we should continue using batched MAGMA as it has better performance, than single input cuSOLVER variant called in a loop, right? This PR modifies the behavior for < 11.3 versions to use looped cuSOLVER instead of batched MAGMA.

Besides that dispatch issue, everything looks good.

@xwang233
Copy link
Copy Markdown
Collaborator Author

Ohhh, yes, you're right. Let me fix that dispatch logic. 😄


// Implementation of Cholesky decomposition using batched cusolverDn<T>potrfBatched
// Warning: cusolverDn<T>potrfBatched doesn't work quite well when matrix size or batch size is zero.
// If you write your own C++ extension and use this function, make sure you do a zero numel check for the input.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice note

#define USE_CUSOLVER
#endif

// cusolverDn<T>potrfBatched may have numerical issue before cuda 11.3 release,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great comment

Copy link
Copy Markdown
Collaborator

@mruberry mruberry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An excellent CUDA performance PR to cap the many performance improvements realized for the release of torch.linalg. in PyTorch 1.9

cc @ptrblck

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@mruberry has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@mruberry merged this pull request in 7faac08.

krshrimali pushed a commit to krshrimali/pytorch that referenced this pull request May 19, 2021
… 11.3 (pytorch#57788)

Summary:
This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (`torch.linalg.cholesky` and `torch.linalg.cholesky_ex`) when cuda version is greater than or equal to 11.3.

Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases.

## cholesky dispatch heuristics:

### before:

- batch size == 1: cusolver potrf
- batch size > 1: magma xpotrf batched

### after:

cuda >= 11.3:
- batch size == 1: cusolver potrf
- batch size > 1: cusolver potrf batched

cuda < 11.3 (not changed):
- batch size == 1: cusolver potrf
- batch size > 1: magma xpotrf batched

 ---

See also pytorch#42666 pytorch#47953 pytorch#53104 pytorch#53879

Pull Request resolved: pytorch#57788

Reviewed By: ngimel

Differential Revision: D28345530

Pulled By: mruberry

fbshipit-source-id: 3022cf73b2750e1953c0e00a9e8b093dfc551f61
@github-actions github-actions Bot deleted the ci-all/cusolver-cholesky-batched_cuda11.3 branch February 11, 2024 01:57
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
… 11.3 (pytorch#57788)

Summary:
This PR enables the usage of cusolver potrf batched as the backend of Cholesky decomposition (`torch.linalg.cholesky` and `torch.linalg.cholesky_ex`) when cuda version is greater than or equal to 11.3.

Benchmark available at https://github.com/xwang233/code-snippet/tree/master/linalg/cholesky-new. It is seen that cusolver potrf batched performs better than magma potrf batched in most cases.

## cholesky dispatch heuristics:

### before:

- batch size == 1: cusolver potrf
- batch size > 1: magma xpotrf batched

### after:

cuda >= 11.3:
- batch size == 1: cusolver potrf
- batch size > 1: cusolver potrf batched

cuda < 11.3 (not changed):
- batch size == 1: cusolver potrf
- batch size > 1: magma xpotrf batched

 ---

See also pytorch#42666 pytorch#47953 pytorch#53104 pytorch#53879

Pull Request resolved: pytorch#57788

Reviewed By: ngimel

Differential Revision: D28345530

Pulled By: mruberry

fbshipit-source-id: 3022cf73b2750e1953c0e00a9e8b093dfc551f61
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla signed Merged open source triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants