Improve zero sized input for addmv by zasdfgbnm · Pull Request #41824 · pytorch/pytorch

zasdfgbnm · 2020-07-21T23:09:30Z

fixes #41340

Unfortunately, I still can not get a K80 to verify the fix, but it should be working.

dr-ci · 2020-07-21T23:19:16Z

💊 CI failures summary and remediations

As of commit 9dd5bad (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 14 times.

ngimel · 2020-07-22T03:45:24Z

I don't think this is the right fix, because the root cause of that failing test case was fp16 gemm for fp16, not fp16 gemmv

device="cuda"
dtype=torch.float16
mat = torch.ones((2, 0), dtype=dtype, device=device)
vec = torch.ones((0,), dtype=dtype, device=device)
print(torch.mm(mat,vec.view(0,1))) #errors out on K80

This fix will mask gemm failure because gemv for degenerate case won't go through gemm, but gemm failure will still remain.

zasdfgbnm · 2020-07-22T04:05:56Z

(Just realize that I put the wrong issue number. It is fixed now)

@ngimel I think both addmm and addmv are wrong? And in the issue (#41340), the author says addmv:

Traceback (most recent call last):
  File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 777, in wrapper
    method(*args, **kwargs)
  File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 777, in wrapper
    method(*args, **kwargs)
  File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 241, in instantiated_test
    result = test(self, device_arg, dtype)
  File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 411, in dep_fn
    return fn(slf, device, *args, **kwargs)
  File "test_torch.py", line 13909, in test_blas_alpha_beta_empty
    torch.addmv(input=input, mat=mat, vec=vec, alpha=alpha, beta=beta))
  File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1080, in assertEqual
    exact_device=exact_device)
  File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 971, in _compareTensors
    return _compare_tensors_internal(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan)
  File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/__init__.py", line 122, in _compare_tensors_internal
    if torch.allclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan):
RuntimeError: CUDA error: an illegal memory access was encountered

ngimel · 2020-07-22T04:30:22Z

Yeah, the failing test is addmv. Addmv/mv call at::cuda::blas::gemv, however, if you remember, for fp16 at::cuda::blas::gemv does not call cublas gemv (because there's no such thing), instead it calls at:cuda::blas::gemm, which, in turn, calls cublasSgemmEx, which finally throws an error. fp16 mm/addmm also call cublasSgemmEx, so even though the empty calls with them are not tested in the test suite, they would still be affected, as demonstrated in the snippet above. So both mv and mm are affected with the same rootcause.

zasdfgbnm · 2020-07-28T04:59:32Z

@ngimel now addmm is fixed too

zasdfgbnm · 2020-08-17T16:14:52Z

ping @ngimel

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ngimel · 2020-08-17T20:22:22Z

Thank you for the fix!

facebook-github-bot · 2020-08-18T04:14:33Z

@ngimel merged this pull request in aef2890.

This reverts commit aef2890.

This reverts commit 25c6141.

This reverts commit aef2890.

Summary: fixes pytorch#41340 Unfortunately, I still can not get a K80 to verify the fix, but it should be working. Pull Request resolved: pytorch#41824 Reviewed By: mruberry Differential Revision: D23172775 Pulled By: ngimel fbshipit-source-id: aa6af96fe74e3bb07982c006cb35ecc7f18181bc

Improve zero sized input for addmv

3135472

pytorchbot added the open source label Jul 21, 2020

zasdfgbnm added 2 commits July 21, 2020 16:21

fix

37ef55d

fix more

7b88012

zasdfgbnm changed the title ~~[WIP] Improve zero sized input for addmv~~ Improve zero sized input for addmv Jul 21, 2020

zasdfgbnm requested a review from ngimel July 21, 2020 23:28

try fix windows

d85b113

mrshenli added module: operators triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jul 23, 2020

zasdfgbnm added 2 commits July 27, 2020 18:06

fix addmm

afb4302

Merge branch 'master' of github.com:pytorch/pytorch into empty

d59674e

Merge branch 'master' of github.com:pytorch/pytorch into empty

9dd5bad

ngimel approved these changes Aug 17, 2020

View reviewed changes

facebook-github-bot reviewed Aug 17, 2020

View reviewed changes

facebook-github-bot closed this in aef2890 Aug 18, 2020

facebook-github-bot added the merged label Aug 18, 2020

zasdfgbnm added a commit that referenced this pull request Aug 29, 2020

Revert "Improve zero sized input for addmv (#41824)"

25c6141

This reverts commit aef2890.

zasdfgbnm added a commit that referenced this pull request Aug 29, 2020

Revert "Revert "Improve zero sized input for addmv (#41824)""

19fbc1f

This reverts commit 25c6141.

ngimel pushed a commit to ngimel/pytorch that referenced this pull request Aug 30, 2020

Revert "Improve zero sized input for addmv (pytorch#41824)"

47d882e

This reverts commit aef2890.

mruberry added the Merged label Oct 28, 2020

facebook-github-bot deleted the empty branch January 27, 2021 18:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve zero sized input for addmv#41824

Improve zero sized input for addmv#41824
zasdfgbnm wants to merge 7 commits intomasterfrom
empty

zasdfgbnm commented Jul 21, 2020 •

edited

Loading

Uh oh!

dr-ci Bot commented Jul 21, 2020 •

edited

Loading

Uh oh!

ngimel commented Jul 22, 2020

Uh oh!

zasdfgbnm commented Jul 22, 2020 •

edited

Loading

Uh oh!

ngimel commented Jul 22, 2020

Uh oh!

zasdfgbnm commented Jul 28, 2020 •

edited

Loading

Uh oh!

zasdfgbnm commented Aug 17, 2020

Uh oh!

facebook-github-bot left a comment

Uh oh!

ngimel commented Aug 17, 2020

Uh oh!

facebook-github-bot commented Aug 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

zasdfgbnm commented Jul 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci Bot commented Jul 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

Uh oh!

ngimel commented Jul 22, 2020

Uh oh!

zasdfgbnm commented Jul 22, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ngimel commented Jul 22, 2020

Uh oh!

zasdfgbnm commented Jul 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zasdfgbnm commented Aug 17, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ngimel commented Aug 17, 2020

Uh oh!

facebook-github-bot commented Aug 18, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

zasdfgbnm commented Jul 21, 2020 •

edited

Loading

dr-ci Bot commented Jul 21, 2020 •

edited

Loading

zasdfgbnm commented Jul 22, 2020 •

edited

Loading

zasdfgbnm commented Jul 28, 2020 •

edited

Loading