[SOW MS3] Workaround MIOPEN output tensor memory format issues #2#1105
[SOW MS3] Workaround MIOPEN output tensor memory format issues #2#1105jithunnair-amd merged 9 commits intosow_ms3from
Conversation
|
@jithunnair-amd The 3 tests pass but a bunch of tests fail in pytorch-test-1. They don't seem related but I am not sure. pytorch-test-distributed-1 and pytorch-test-2 are failing to some jenkins thing. pytorch-test-distributed-2 is a bunch of NCCL errors, unlikely to be related. pytorch-test-1 has a bunch of types of error. There is an attribute error related to tuples which I don't think are related but not sure |
|
Confirmed that the 3 unit tests enabled in this PR pass in CI: http://rocmhead.amd.com:8080/job/pytorch/job/pytorch-test-1/346/consoleText This CI failure in the same log seems related though: |
|
Is an unexpected success a bad thing? |
|
@jithunnair-amd I am looking at the unexpected success. There is a comment that explains why they expect a failure. For some reason, CUDNN doesnot have the desired behavior of following the weight tensor memory layout. The reason it is passing for use is because we are adding the workaround. I am not sure how to proceed. If remove the |
In terms of strictly matching the unit test expectations, maybe. It seems like we should have it be "expected failure" only for CUDA then, even in our upstream PR. @jeffdaily, would you be okay with that? |
|
We should look at unexpected successes and at least understand them on a case by case basis. There was another recent unexpected success in upstream on rocm, but it wasn't rocm's fault, so the test had to be changed to Skipped for both platforms. In this case, the comment clearly explains why this is expected failure for cuda, and that case doesn't apply to us. So we need to conditionalize the skip condition here to differentiate between rocm and cuda. |
|
@micmelesse: http://rocmhead.amd.com:8080/job/pytorch/job/pytorch-test-1/373/consoleText |
|
http://rocmhead.amd.com:8080/job/pytorch/job/pytorch-test-1/377/consoleText: |
Follow up to #1072.
This PR works around the issue by enforcing the output tensor to be contigous and match ouput format of the weight tensor.