Skip to content

Cusolver inverse check info#46625

Closed
xwang233 wants to merge 4 commits intopytorch:masterfrom
xwang233:cusolver-inverse-check-info
Closed

Cusolver inverse check info#46625
xwang233 wants to merge 4 commits intopytorch:masterfrom
xwang233:cusolver-inverse-check-info

Conversation

@xwang233
Copy link
Copy Markdown
Collaborator

Fixes #46557

@xwang233
Copy link
Copy Markdown
Collaborator Author

cc @ptrblck

@dr-ci
Copy link
Copy Markdown

dr-ci Bot commented Oct 21, 2020

💊 CI failures summary and remediations

As of commit 0f4a684 (more details on the Dr. CI page):


  • 1/1 failures introduced in this PR

🕵️ 1 new failure recognized by patterns

The following CI failures do not appear to be due to upstream breakages:

See CircleCI build pytorch_linux_xenial_py3_6_gcc5_4_test (1/1)

Step: "Run tests" (full log | diagnosis details | 🔁 rerun) <confirmed not flaky by 2 failures>

Oct 21 02:19:24 torch/quantization/fake_quantize.py:217: error: Argument 1 to "int" has incompatible type "Union[Tensor, Module]"; expected "Union[str, bytes, SupportsInt, _SupportsIndex]" [arg-type]
Oct 21 02:19:24   test_type_hint_examples (__main__.TestTypeHints) ... ok (23.130s) 
Oct 21 02:19:24  
Oct 21 02:19:24 ====================================================================== 
Oct 21 02:19:24 FAIL [67.343s]: test_run_mypy (__main__.TestTypeHints) 
Oct 21 02:19:24 ---------------------------------------------------------------------- 
Oct 21 02:19:24 Traceback (most recent call last): 
Oct 21 02:19:24   File "test_type_hints.py", line 217, in test_run_mypy 
Oct 21 02:19:24     self.fail(f"mypy failed: {stdout} {stderr}") 
Oct 21 02:19:24 AssertionError: mypy failed: torch/quantization/fake_quantize.py:215: error: Value of type "Union[Tensor, Module]" is not indexable  [index] 
Oct 21 02:19:24 torch/quantization/fake_quantize.py:216: error: Argument 1 to "float" has incompatible type "Union[Tensor, Module]"; expected "Union[SupportsFloat, _SupportsIndex, str, bytes, bytearray]"  [arg-type] 
Oct 21 02:19:24 torch/quantization/fake_quantize.py:217: error: Argument 1 to "int" has incompatible type "Union[Tensor, Module]"; expected "Union[str, bytes, SupportsInt, _SupportsIndex]"  [arg-type] 
Oct 21 02:19:24 Found 3 errors in 1 file (checked 1100 source files) 
Oct 21 02:19:24   
Oct 21 02:19:24  
Oct 21 02:19:24 ---------------------------------------------------------------------- 
Oct 21 02:19:24 Ran 4 tests in 107.905s 
Oct 21 02:19:24  
Oct 21 02:19:24 FAILED (failures=1) 
Oct 21 02:19:24  
Oct 21 02:19:24 Generating XML reports... 
Oct 21 02:19:24 Generated XML report: test-reports/dist-gloo/TEST-TestTypeHints-20201021021736.xml 

1 failure confirmed as flaky and can be ignored:

  • pytorch_windows_vs2019_py36_cuda11.0_build

This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 6 times.

@xwang233
Copy link
Copy Markdown
Collaborator Author

Copy link
Copy Markdown
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ngimel merged this pull request in fe4f90c.

1 similar comment
@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ngimel merged this pull request in fe4f90c.

facebook-github-bot pushed a commit that referenced this pull request Nov 18, 2020
…-stream issue (#47026)

Summary:
### test_inverse_singular for cublas failure

Related
#46616 (comment)
https://app.circleci.com/pipelines/github/pytorch/pytorch/232112/workflows/4131d4ca-cd51-44e3-8e6c-b1c3555c62fa/jobs/8523970/tests

The cuda 11.1 CI container doesn't have MAGMA library, so cublas matrix inverse path is enabled.
```
Oct 27 23:13:47 -- MAGMA not found. Compiling without MAGMA support
```

The test_inverse_singular was introduced in #46625, but I forgot to fix that functionality for cublas path as well.

### cusolver inverse multi-stream failure

fix #47272

The original cuda event record/block stream was wrong, which could cause NaN in output tensor.

On my machine, the original code observes NaN in about 50k~500k loops. After this change, no NaN is observed in more than 2.5m loops.

The performance for batch 2 matrix inverse is still the same as those in #42403.

Pull Request resolved: #47026

Reviewed By: mruberry

Differential Revision: D24838546

Pulled By: ngimel

fbshipit-source-id: 3b83e4ab8e6b47a8273cba277251765bd6d97911
emcastillo pushed a commit to emcastillo/pytorch that referenced this pull request Mar 16, 2022
Summary:
Fixes pytorch#46557

Pull Request resolved: pytorch#46625

Reviewed By: zou3519

Differential Revision: D24438577

Pulled By: ngimel

fbshipit-source-id: d00e6eb2eae4aa39ca6ecf5914fe9cf37c24b906
emcastillo pushed a commit to emcastillo/pytorch that referenced this pull request Mar 16, 2022
…-stream issue (pytorch#47026)

Summary:
### test_inverse_singular for cublas failure

Related
pytorch#46616 (comment)
https://app.circleci.com/pipelines/github/pytorch/pytorch/232112/workflows/4131d4ca-cd51-44e3-8e6c-b1c3555c62fa/jobs/8523970/tests

The cuda 11.1 CI container doesn't have MAGMA library, so cublas matrix inverse path is enabled.
```
Oct 27 23:13:47 -- MAGMA not found. Compiling without MAGMA support
```

The test_inverse_singular was introduced in pytorch#46625, but I forgot to fix that functionality for cublas path as well.

### cusolver inverse multi-stream failure

fix pytorch#47272

The original cuda event record/block stream was wrong, which could cause NaN in output tensor.

On my machine, the original code observes NaN in about 50k~500k loops. After this change, no NaN is observed in more than 2.5m loops.

The performance for batch 2 matrix inverse is still the same as those in pytorch#42403.

Pull Request resolved: pytorch#47026

Reviewed By: mruberry

Differential Revision: D24838546

Pulled By: ngimel

fbshipit-source-id: 3b83e4ab8e6b47a8273cba277251765bd6d97911
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
Summary:
Fixes pytorch#46557

Pull Request resolved: pytorch#46625

Reviewed By: zou3519

Differential Revision: D24438577

Pulled By: ngimel

fbshipit-source-id: d00e6eb2eae4aa39ca6ecf5914fe9cf37c24b906
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 24, 2026
…-stream issue (pytorch#47026)

Summary:
### test_inverse_singular for cublas failure

Related
pytorch#46616 (comment)
https://app.circleci.com/pipelines/github/pytorch/pytorch/232112/workflows/4131d4ca-cd51-44e3-8e6c-b1c3555c62fa/jobs/8523970/tests

The cuda 11.1 CI container doesn't have MAGMA library, so cublas matrix inverse path is enabled.
```
Oct 27 23:13:47 -- MAGMA not found. Compiling without MAGMA support
```

The test_inverse_singular was introduced in pytorch#46625, but I forgot to fix that functionality for cublas path as well.

### cusolver inverse multi-stream failure

fix pytorch#47272

The original cuda event record/block stream was wrong, which could cause NaN in output tensor.

On my machine, the original code observes NaN in about 50k~500k loops. After this change, no NaN is observed in more than 2.5m loops.

The performance for batch 2 matrix inverse is still the same as those in pytorch#42403.

Pull Request resolved: pytorch#47026

Reviewed By: mruberry

Differential Revision: D24838546

Pulled By: ngimel

fbshipit-source-id: 3b83e4ab8e6b47a8273cba277251765bd6d97911
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

torch.inverse based on cuSOLVER does not raise error for singular input

5 participants