Cusolver inverse check info by xwang233 · Pull Request #46625 · pytorch/pytorch

xwang233 · 2020-10-20T22:59:49Z

Fixes #46557

xwang233 · 2020-10-20T23:00:03Z

cc @ptrblck

dr-ci · 2020-10-21T00:34:43Z

💊 CI failures summary and remediations

As of commit 0f4a684 (more details on the Dr. CI page):

1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

pytorch_linux_xenial_py3_6_gcc5_4_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) <confirmed not flaky by 2 failures>

Oct 21 02:19:24 torch/quantization/fake_quantize.py:217: error: Argument 1 to "int" has incompatible type "Union[Tensor, Module]"; expected "Union[str, bytes, SupportsInt, _SupportsIndex]"  [arg-type]

Oct 21 02:19:24   test_type_hint_examples (__main__.TestTypeHints) ... ok (23.130s) 
Oct 21 02:19:24  
Oct 21 02:19:24 ====================================================================== 
Oct 21 02:19:24 FAIL [67.343s]: test_run_mypy (__main__.TestTypeHints) 
Oct 21 02:19:24 ---------------------------------------------------------------------- 
Oct 21 02:19:24 Traceback (most recent call last): 
Oct 21 02:19:24   File "test_type_hints.py", line 217, in test_run_mypy 
Oct 21 02:19:24     self.fail(f"mypy failed: {stdout} {stderr}") 
Oct 21 02:19:24 AssertionError: mypy failed: torch/quantization/fake_quantize.py:215: error: Value of type "Union[Tensor, Module]" is not indexable  [index] 
Oct 21 02:19:24 torch/quantization/fake_quantize.py:216: error: Argument 1 to "float" has incompatible type "Union[Tensor, Module]"; expected "Union[SupportsFloat, _SupportsIndex, str, bytes, bytearray]"  [arg-type] 
Oct 21 02:19:24 torch/quantization/fake_quantize.py:217: error: Argument 1 to "int" has incompatible type "Union[Tensor, Module]"; expected "Union[str, bytes, SupportsInt, _SupportsIndex]"  [arg-type] 
Oct 21 02:19:24 Found 3 errors in 1 file (checked 1100 source files) 
Oct 21 02:19:24   
Oct 21 02:19:24  
Oct 21 02:19:24 ---------------------------------------------------------------------- 
Oct 21 02:19:24 Ran 4 tests in 107.905s 
Oct 21 02:19:24  
Oct 21 02:19:24 FAILED (failures=1) 
Oct 21 02:19:24  
Oct 21 02:19:24 Generating XML reports... 
Oct 21 02:19:24 Generated XML report: test-reports/dist-gloo/TEST-TestTypeHints-20201021021736.xml

1 failure confirmed as flaky and can be ignored:

pytorch_windows_vs2019_py36_cuda11.0_build

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 6 times.

xwang233 · 2020-10-21T00:50:13Z

cc @ailzhang for XLA failure. XLA matrix inverse doesn't raise exception for singular matrix. https://circleci.com/api/v1.1/project/github/pytorch/pytorch/8356835/output/107/0?file=true&allocation-id=5f8f785ab625b21332b517b0-0-build%2F3CD01BB1

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-10-22T06:18:55Z

@ngimel merged this pull request in fe4f90c.

facebook-github-bot · 2020-10-22T06:19:25Z

@ngimel merged this pull request in fe4f90c.

…-stream issue (#47026) Summary: ### test_inverse_singular for cublas failure Related #46616 (comment) https://app.circleci.com/pipelines/github/pytorch/pytorch/232112/workflows/4131d4ca-cd51-44e3-8e6c-b1c3555c62fa/jobs/8523970/tests The cuda 11.1 CI container doesn't have MAGMA library, so cublas matrix inverse path is enabled. ``` Oct 27 23:13:47 -- MAGMA not found. Compiling without MAGMA support ``` The test_inverse_singular was introduced in #46625, but I forgot to fix that functionality for cublas path as well. ### cusolver inverse multi-stream failure fix #47272 The original cuda event record/block stream was wrong, which could cause NaN in output tensor. On my machine, the original code observes NaN in about 50k~500k loops. After this change, no NaN is observed in more than 2.5m loops. The performance for batch 2 matrix inverse is still the same as those in #42403. Pull Request resolved: #47026 Reviewed By: mruberry Differential Revision: D24838546 Pulled By: ngimel fbshipit-source-id: 3b83e4ab8e6b47a8273cba277251765bd6d97911

Summary: Fixes pytorch#46557 Pull Request resolved: pytorch#46625 Reviewed By: zou3519 Differential Revision: D24438577 Pulled By: ngimel fbshipit-source-id: d00e6eb2eae4aa39ca6ecf5914fe9cf37c24b906

…-stream issue (pytorch#47026) Summary: ### test_inverse_singular for cublas failure Related pytorch#46616 (comment) https://app.circleci.com/pipelines/github/pytorch/pytorch/232112/workflows/4131d4ca-cd51-44e3-8e6c-b1c3555c62fa/jobs/8523970/tests The cuda 11.1 CI container doesn't have MAGMA library, so cublas matrix inverse path is enabled. ``` Oct 27 23:13:47 -- MAGMA not found. Compiling without MAGMA support ``` The test_inverse_singular was introduced in pytorch#46625, but I forgot to fix that functionality for cublas path as well. ### cusolver inverse multi-stream failure fix pytorch#47272 The original cuda event record/block stream was wrong, which could cause NaN in output tensor. On my machine, the original code observes NaN in about 50k~500k loops. After this change, no NaN is observed in more than 2.5m loops. The performance for batch 2 matrix inverse is still the same as those in pytorch#42403. Pull Request resolved: pytorch#47026 Reviewed By: mruberry Differential Revision: D24838546 Pulled By: ngimel fbshipit-source-id: 3b83e4ab8e6b47a8273cba277251765bd6d97911

Summary: Fixes pytorch#46557 Pull Request resolved: pytorch#46625 Reviewed By: zou3519 Differential Revision: D24438577 Pulled By: ngimel fbshipit-source-id: d00e6eb2eae4aa39ca6ecf5914fe9cf37c24b906

…-stream issue (pytorch#47026) Summary: ### test_inverse_singular for cublas failure Related pytorch#46616 (comment) https://app.circleci.com/pipelines/github/pytorch/pytorch/232112/workflows/4131d4ca-cd51-44e3-8e6c-b1c3555c62fa/jobs/8523970/tests The cuda 11.1 CI container doesn't have MAGMA library, so cublas matrix inverse path is enabled. ``` Oct 27 23:13:47 -- MAGMA not found. Compiling without MAGMA support ``` The test_inverse_singular was introduced in pytorch#46625, but I forgot to fix that functionality for cublas path as well. ### cusolver inverse multi-stream failure fix pytorch#47272 The original cuda event record/block stream was wrong, which could cause NaN in output tensor. On my machine, the original code observes NaN in about 50k~500k loops. After this change, no NaN is observed in more than 2.5m loops. The performance for batch 2 matrix inverse is still the same as those in pytorch#42403. Pull Request resolved: pytorch#47026 Reviewed By: mruberry Differential Revision: D24838546 Pulled By: ngimel fbshipit-source-id: 3b83e4ab8e6b47a8273cba277251765bd6d97911

xwang233 added 2 commits October 20, 2020 15:54

check info

3b451e4

test

ff19297

xwang233 requested review from IvanYashchuk and ngimel October 20, 2020 23:00

pytorchbot added the open source label Oct 20, 2020

lint

63c2e60

skip XLA

0f4a684

ngimel approved these changes Oct 21, 2020

View reviewed changes

facebook-github-bot reviewed Oct 21, 2020

View reviewed changes

IvanYashchuk approved these changes Oct 21, 2020

View reviewed changes

facebook-github-bot closed this in fe4f90c Oct 22, 2020

facebook-github-bot added the Merged label Oct 22, 2020

This was referenced Oct 28, 2020

Fix test_inverse_singular for cublas path; fix cusolver inverse multi-stream issue #47026

Closed

The return of torch.inverse contains nan sometime #47272

Closed

xwang233 mentioned this pull request Nov 30, 2020

torch.inverse and torch.lu_solve give wrong results for singular matrices #48572

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cusolver inverse check info#46625

Cusolver inverse check info#46625
xwang233 wants to merge 4 commits intopytorch:masterfrom
xwang233:cusolver-inverse-check-info

xwang233 commented Oct 20, 2020

Uh oh!

xwang233 commented Oct 20, 2020

Uh oh!

dr-ci Bot commented Oct 21, 2020 •

edited

Loading

Uh oh!

xwang233 commented Oct 21, 2020

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Oct 22, 2020

Uh oh!

facebook-github-bot commented Oct 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

xwang233 commented Oct 20, 2020

Uh oh!

xwang233 commented Oct 20, 2020

Uh oh!

dr-ci Bot commented Oct 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

🕵️ 1 new failure recognized by patterns

pytorch_linux_xenial_py3_6_gcc5_4_test (1/1)

Uh oh!

xwang233 commented Oct 21, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 22, 2020

Uh oh!

facebook-github-bot commented Oct 22, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

dr-ci Bot commented Oct 21, 2020 •

edited

Loading