Skip to content

[Linalg] Add cusolver syevjBatched path for torch.linalg.eigh when cuda >= 11.3 U1#62003

Closed
xwang233 wants to merge 9 commits intopytorch:masterfrom
xwang233:linalg-eigh-syevj-batched-cuda11.3.1
Closed

[Linalg] Add cusolver syevjBatched path for torch.linalg.eigh when cuda >= 11.3 U1#62003
xwang233 wants to merge 9 commits intopytorch:masterfrom
xwang233:linalg-eigh-syevj-batched-cuda11.3.1

Conversation

@xwang233
Copy link
Copy Markdown
Collaborator

@xwang233 xwang233 commented Jul 21, 2021

This PR adds the cusolverDn<T>SyevjBatched fuction to the backend of torch.linalg.eigh (eigenvalue solver for Hermitian matrix). Using the heuristics from #53040 (comment) and my local tests, the syevj_batched path is only used when batch_size > 1 and matrix_size <= 32. This would give us huge performance boost in those cases.

Since there were known numerical issues on cusolver syevj_batched before cuda 11.3 update 1, this PR only enables the dispatch when cuda version is no less than that.

See also #42666 #47953 #53040

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Jul 21, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 39b7245 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@xwang233 xwang233 changed the title [WIP] [Linalg] Add cusolver syevjBatched path for torch.linalg.eigh when cuda >= 11.3.1 [WIP] [Linalg] Add cusolver syevjBatched path for torch.linalg.eigh when cuda >= 11.3 U1 Jul 21, 2021
@xwang233 xwang233 changed the title [WIP] [Linalg] Add cusolver syevjBatched path for torch.linalg.eigh when cuda >= 11.3 U1 [Linalg] Add cusolver syevjBatched path for torch.linalg.eigh when cuda >= 11.3 U1 Jul 23, 2021
Copy link
Copy Markdown
Collaborator

@IvanYashchuk IvanYashchuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great that the bug is fixed!

#endif

// cusolverDn<T>syevjBatched may have numerical issue before cuda 11.3.1 release,
// (which is cusolver version 11102 in the header), so we only use cusolver potrf batched
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// (which is cusolver version 11102 in the header), so we only use cusolver potrf batched
// (which is cusolver version 11102 in the header), so we only use cusolver syevj batched

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding this! 😂

Co-authored-by: Ivan Yashchuk <IvanYashchuk@users.noreply.github.com>
}

void linalg_eigh_cusolver(const Tensor& eigenvalues, const Tensor& eigenvectors, const Tensor& infos, bool upper, bool compute_eigenvectors) {
// syevj is better than syevd for float32 dtype and matrix sizes 32x32 - 512x512
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment is misplaced (it refers to line 1256 now). Can you also add another comment with link to the data about why we are using syevjBatched for heuristics in line 1254?

@mruberry
Copy link
Copy Markdown
Collaborator

fyi @ngimel is merging this one

@xwang233
Copy link
Copy Markdown
Collaborator Author

Thank you!

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ngimel merged this pull request in d57ce8c.

laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
…da >= 11.3 U1 (pytorch#62003)

Summary:
This PR adds the `cusolverDn<T>SyevjBatched` fuction to the backend of `torch.linalg.eigh` (eigenvalue solver for Hermitian matrix). Using the heuristics from pytorch#53040 (comment) and my local tests, the `syevj_batched` path is only used when `batch_size > 1` and `matrix_size <= 32`. This would give us huge performance boost in those cases.

Since there were known numerical issues on cusolver `syevj_batched` before cuda 11.3 update 1, this PR only enables the dispatch when cuda version is no less than that.

See also pytorch#42666 pytorch#47953 pytorch#53040

Pull Request resolved: pytorch#62003

Reviewed By: heitorschueroff

Differential Revision: D30006316

Pulled By: ngimel

fbshipit-source-id: 3a65c5fc9adbbe776524f8957df5442c3d3aeb8e
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants