Conversation
💊 CI failures summary and remediationsAs of commit 9dd5bad (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 14 times. |
|
I don't think this is the right fix, because the root cause of that failing test case was fp16 gemm for fp16, not fp16 gemmv This fix will mask gemm failure because gemv for degenerate case won't go through gemm, but gemm failure will still remain. |
|
(Just realize that I put the wrong issue number. It is fixed now) @ngimel I think both addmm and addmv are wrong? And in the issue (#41340), the author says addmv: Traceback (most recent call last):
File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 777, in wrapper
method(*args, **kwargs)
File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 777, in wrapper
method(*args, **kwargs)
File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 241, in instantiated_test
result = test(self, device_arg, dtype)
File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/_internal/common_device_type.py", line 411, in dep_fn
return fn(slf, device, *args, **kwargs)
File "test_torch.py", line 13909, in test_blas_alpha_beta_empty
torch.addmv(input=input, mat=mat, vec=vec, alpha=alpha, beta=beta))
File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 1080, in assertEqual
exact_device=exact_device)
File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/_internal/common_utils.py", line 971, in _compareTensors
return _compare_tensors_internal(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan)
File "/tmp/easybuild-tmp/eb-1Ebm0K/tmpcR9xV8/lib/python3.7/site-packages/torch/testing/__init__.py", line 122, in _compare_tensors_internal
if torch.allclose(a, b, rtol=rtol, atol=atol, equal_nan=equal_nan):
RuntimeError: CUDA error: an illegal memory access was encountered |
|
Yeah, the failing test is addmv. Addmv/mv call |
|
@ngimel now addmm is fixed too |
|
ping @ngimel |
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Thank you for the fix! |
This reverts commit aef2890.
This reverts commit 25c6141.
This reverts commit aef2890.
Summary: fixes pytorch#41340 Unfortunately, I still can not get a K80 to verify the fix, but it should be working. Pull Request resolved: pytorch#41824 Reviewed By: mruberry Differential Revision: D23172775 Pulled By: ngimel fbshipit-source-id: aa6af96fe74e3bb07982c006cb35ecc7f18181bc
fixes #41340
Unfortunately, I still can not get a K80 to verify the fix, but it should be working.