[pytorch] cuBLAS addmm malfunction test#85084
[pytorch] cuBLAS addmm malfunction test#85084souravmandal wants to merge 1 commit intopytorch:masterfrom
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85084
Note: Links to docs will display an error until the docs builds have been completed. ✅ No Failures, 4 PendingAs of commit 55b4804: This comment was automatically generated by Dr. CI and updates every 15 minutes. |
|
This pull request was exported from Phabricator. Differential Revision: D39433029 |
lezcano
left a comment
There was a problem hiding this comment.
We are already testing quite extensively matmul in the tests test_matmul_small_brute_force_{1,2,3}d_Nd against NumPy. What's the reason for wanting to test addmm on its own?
lezcano
left a comment
There was a problem hiding this comment.
In any case, I think it'd be better to user NumPy to compare against, as we are sure that it will return correct results.
Also, this way, you could simply skip torch.float16 in CPU and the code could be heavily simplified. See for example how check_single_matmul does this, together with dynamic tolerances for robustness.
|
Also, this issue may be relevant #84538. I think @srossross is looking into implementing this one. |
|
Right, this is not numeric accuracy, this is ensuring that cuBLAS does not crash or produce wildly incorrect results. As such, we do want to exercise the bfloat16, and compare with reasonable bounds to the epxected result obtained by computing with another numeric representation. |
|
This pull request was exported from Phabricator. Differential Revision: D39433029 |
99764e5 to
377700f
Compare
|
This pull request was exported from Phabricator. Differential Revision: D39433029 |
377700f to
3797628
Compare
|
@pytorchbot merge |
|
@pytorchbot successfully started a merge job. Check the current status here. |
Merge failedReason: The following mandatory check(s) failed (Rule Dig deeper by viewing the failures on hud Details for Dev Infra teamRaised by workflow job |
Summary: Pull Request resolved: pytorch#85084 Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 fbshipit-source-id: b308ecceb44eab1afb039c98f4e1b6aa8ddb8f53
|
This pull request was exported from Phabricator. Differential Revision: D39433029 |
3797628 to
55b4804
Compare
|
@pytorchbot merge |
|
@pytorchbot successfully started a merge job. Check the current status here. |
|
@pytorchbot revert -m "broke tests on trunk, https://github.com/pytorch/pytorch/actions/runs/3098347639/jobs/5017166419" -m nosignal |
|
❌ 🤖 pytorchbot command failed: Try |
|
@pytorchbot revert -m "broke tests on trunk, https://github.com/pytorch/pytorch/actions/runs/3098347639/jobs/5017166419" -c nosignal |
|
@pytorchbot successfully started a revert job. Check the current status here. |
|
@souravmandal your PR has been successfully reverted. |
This reverts commit 0297c75. Reverted #85084 on behalf of https://github.com/clee2000 due to broke tests on trunk, https://github.com/pytorch/pytorch/actions/runs/3098347639/jobs/5017166419
|
This link instead: https://github.com/pytorch/pytorch/actions/runs/3098294186 |
|
By the way, how much total test time this PR adds? (Though addmm even for 10kx10k matrices should be pretty quick ) |
|
This pull request was exported from Phabricator. Differential Revision: D39433029 |
Summary: Re-submit for approved PR that was then reverted: pytorch#85084 Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 fbshipit-source-id: e8f5d5f722047f31d2804932539408b1beb2ad55
Summary: Re-submit for approved PR that was then reverted: #85084 Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 Pull Request resolved: #85432 Approved by: https://github.com/zrphercule
Summary: Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 Pull Request resolved: #85084 Approved by: https://github.com/zrphercule
This reverts commit 0297c75. Reverted #85084 on behalf of https://github.com/clee2000 due to broke tests on trunk, https://github.com/pytorch/pytorch/actions/runs/3098347639/jobs/5017166419
Summary: Re-submit for approved PR that was then reverted: #85084 Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations Test Plan: Sample unit test output -- [...] test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok [...] Reviewed By: mikekgfb Differential Revision: D39433029 Pull Request resolved: #85432 Approved by: https://github.com/zrphercule
Summary: Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations
Test Plan:
Sample unit test output --
[...]
test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
[...]
Reviewed By: mikekgfb
Differential Revision: D39433029