Fix test_inverse_singular for cublas path; fix cusolver inverse multi-stream issue by xwang233 · Pull Request #47026 · pytorch/pytorch

xwang233 · 2020-10-28T22:28:15Z

test_inverse_singular for cublas failure

Related
#46616 (comment)
https://app.circleci.com/pipelines/github/pytorch/pytorch/232112/workflows/4131d4ca-cd51-44e3-8e6c-b1c3555c62fa/jobs/8523970/tests

The cuda 11.1 CI container doesn't have MAGMA library, so cublas matrix inverse path is enabled.

Oct 27 23:13:47 -- MAGMA not found. Compiling without MAGMA support

The test_inverse_singular was introduced in #46625, but I forgot to fix that functionality for cublas path as well.

cusolver inverse multi-stream failure

fix #47272

The original cuda event record/block stream was wrong, which could cause NaN in output tensor.

On my machine, the original code observes NaN in about 50k~500k loops. After this change, no NaN is observed in more than 2.5m loops.

The performance for batch 2 matrix inverse is still the same as those in #42403.

xwang233 · 2020-10-28T22:28:44Z

cc @ptrblck @malfet

facebook-github-bot · 2020-10-30T17:32:09Z

Hi @xwang233!

Thank you for your pull request. We require contributors to sign our Contributor License Agreement, and yours needs attention.

You currently have a record in our system, but we do not have a signature on file.

In order for us to review and merge your code, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

…singular-cublas

ngimel · 2020-11-07T07:09:25Z

    auto dataPtr = allocator.allocate(sizeof(int)*batch_size*n);
    int* ipiv_array = reinterpret_cast<int*>(dataPtr.get());

+    Tensor _info1 = at::zeros({batch_size}, self.options().dtype(at::kInt));


why are you allocating infos here again?

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-11-18T07:10:32Z

@ngimel merged this pull request in a1f494c.

…-stream issue (pytorch#47026) Summary: ### test_inverse_singular for cublas failure Related pytorch#46616 (comment) https://app.circleci.com/pipelines/github/pytorch/pytorch/232112/workflows/4131d4ca-cd51-44e3-8e6c-b1c3555c62fa/jobs/8523970/tests The cuda 11.1 CI container doesn't have MAGMA library, so cublas matrix inverse path is enabled. ``` Oct 27 23:13:47 -- MAGMA not found. Compiling without MAGMA support ``` The test_inverse_singular was introduced in pytorch#46625, but I forgot to fix that functionality for cublas path as well. ### cusolver inverse multi-stream failure fix pytorch#47272 The original cuda event record/block stream was wrong, which could cause NaN in output tensor. On my machine, the original code observes NaN in about 50k~500k loops. After this change, no NaN is observed in more than 2.5m loops. The performance for batch 2 matrix inverse is still the same as those in pytorch#42403. Pull Request resolved: pytorch#47026 Reviewed By: mruberry Differential Revision: D24838546 Pulled By: ngimel fbshipit-source-id: 3b83e4ab8e6b47a8273cba277251765bd6d97911

fix test_inverse_singular for cublas path

c40a46e

xwang233 requested review from ngimel and zasdfgbnm October 28, 2020 22:28

pytorchbot added the open source label Oct 28, 2020

albanD added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Oct 29, 2020

facebook-github-bot added the cla signed label Nov 2, 2020

xwang233 added 2 commits November 5, 2020 22:17

Merge remote-tracking branch 'upstream/master' into fix-test-inverse-…

e92627d

…singular-cublas

fix cusolver multi-stream issue

987164b

This comment has been minimized.

Sign in to view

xwang233 changed the title ~~Fix test_inverse_singular for cublas path~~ Fix test_inverse_singular for cublas path; fix cusolver inverse multi-stream issue Nov 6, 2020

ngimel reviewed Nov 7, 2020

View reviewed changes

facebook-github-bot reviewed Nov 9, 2020

View reviewed changes

ngimel approved these changes Nov 18, 2020

View reviewed changes

facebook-github-bot closed this in a1f494c Nov 18, 2020

facebook-github-bot added the Merged label Nov 18, 2020

xwang233 mentioned this pull request Nov 18, 2020

Replacing CUDA11.0 config with CUDA11.1 in CI #47942

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix test_inverse_singular for cublas path; fix cusolver inverse multi-stream issue#47026

Fix test_inverse_singular for cublas path; fix cusolver inverse multi-stream issue#47026
xwang233 wants to merge 3 commits intopytorch:masterfrom
xwang233:fix-test-inverse-singular-cublas

xwang233 commented Oct 28, 2020 •

edited

Loading

Uh oh!

xwang233 commented Oct 28, 2020

Uh oh!

facebook-github-bot commented Oct 30, 2020

Uh oh!

This comment has been minimized.

ngimel Nov 7, 2020

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Nov 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

xwang233 commented Oct 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

test_inverse_singular for cublas failure

cusolver inverse multi-stream failure

Uh oh!

xwang233 commented Oct 28, 2020

Uh oh!

facebook-github-bot commented Oct 30, 2020

Uh oh!

This comment has been minimized.

ngimel Nov 7, 2020

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Nov 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

xwang233 commented Oct 28, 2020 •

edited

Loading