Skip to content

[pytorch] cuBLAS addmm malfunction test#85084

Closed
souravmandal wants to merge 1 commit intopytorch:masterfrom
souravmandal:export-D39433029
Closed

[pytorch] cuBLAS addmm malfunction test#85084
souravmandal wants to merge 1 commit intopytorch:masterfrom
souravmandal:export-D39433029

Conversation

@souravmandal
Copy link
Contributor

Summary: Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations

Test Plan:
Sample unit test output --

[...]
test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
[...]

Reviewed By: mikekgfb

Differential Revision: D39433029

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 15, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/85084

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures, 4 Pending

As of commit 55b4804:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39433029

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are already testing quite extensively matmul in the tests test_matmul_small_brute_force_{1,2,3}d_Nd against NumPy. What's the reason for wanting to test addmm on its own?

@souravmandal
Copy link
Contributor Author

We (@mikekgfb @ngimel) have observed in the past that cuBLAS breaks such that it produces random data, or crashes outright, esp. for large input arrays. It would be useful to test whether a new cuBLAS version triggers that.

Copy link
Collaborator

@lezcano lezcano left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, I think it'd be better to user NumPy to compare against, as we are sure that it will return correct results.
Also, this way, you could simply skip torch.float16 in CPU and the code could be heavily simplified. See for example how check_single_matmul does this, together with dynamic tolerances for robustness.

@lezcano
Copy link
Collaborator

lezcano commented Sep 15, 2022

Also, this issue may be relevant #84538. I think @srossross is looking into implementing this one.

@souravmandal
Copy link
Contributor Author

The issue with numpy is that it does not support bfloat16 (ref1, ref2). To simplify one could just make the reference tensors in each call float32, and apply the dtype to just the torch tensors on GPU.

@mikekgfb
Copy link
Contributor

Right, this is not numeric accuracy, this is ensuring that cuBLAS does not crash or produce wildly incorrect results. As such, we do want to exercise the bfloat16, and compare with reasonable bounds to the epxected result obtained by computing with another numeric representation.

@zrphercule zrphercule self-requested a review September 19, 2022 21:34
Copy link
Contributor

@zrphercule zrphercule left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stamp

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39433029

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39433029

@mikekgfb
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered without a flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@pytorchmergebot
Copy link
Collaborator

Merge failed

Reason: The following mandatory check(s) failed (Rule superuser):

Dig deeper by viewing the failures on hud

Details for Dev Infra team Raised by workflow job

Summary:
Pull Request resolved: pytorch#85084

Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations

Test Plan:
Sample unit test output --

[...]
test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
[...]

Reviewed By: mikekgfb

Differential Revision: D39433029

fbshipit-source-id: b308ecceb44eab1afb039c98f4e1b6aa8ddb8f53
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39433029

@mikekgfb
Copy link
Contributor

@pytorchbot merge

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a merge job. Check the current status here.
The merge job was triggered without a flag. This means that your change will be merged once all checks on your PR have passed (ETA: 0-4 Hours). If this is not the intended behavior, feel free to use some of the other merge options in the wiki.
Please reach out to the PyTorch DevX Team with feedback or questions!

@clee2000
Copy link
Contributor

@pytorchbot revert -m "broke tests on trunk, https://github.com/pytorch/pytorch/actions/runs/3098347639/jobs/5017166419" -m nosignal

@pytorch-bot
Copy link

pytorch-bot bot commented Sep 21, 2022

❌ 🤖 pytorchbot command failed:

@pytorchbot revert: error: the following arguments are required: -c/--classification

usage: @pytorchbot revert -m MESSAGE -c
                          {nosignal,ignoredsignal,landrace,weird,ghfirst}

Try @pytorchbot --help for more info.

@clee2000
Copy link
Contributor

@pytorchbot revert -m "broke tests on trunk, https://github.com/pytorch/pytorch/actions/runs/3098347639/jobs/5017166419" -c nosignal

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Please reach out to the PyTorch DevX Team with feedback or questions!

@pytorchmergebot
Copy link
Collaborator

@souravmandal your PR has been successfully reverted.

@weiwangmeta
Copy link
Contributor

@malfet
Copy link
Contributor

malfet commented Sep 21, 2022

By the way, how much total test time this PR adds? (Though addmm even for 10kx10k matrices should be pretty quick )

@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D39433029

souravmandal added a commit to souravmandal/pytorch that referenced this pull request Sep 21, 2022
Summary:
Re-submit for approved PR that was then reverted: pytorch#85084

Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations

Test Plan:
Sample unit test output --

[...]
test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
[...]

Reviewed By: mikekgfb

Differential Revision: D39433029

fbshipit-source-id: e8f5d5f722047f31d2804932539408b1beb2ad55
pytorchmergebot pushed a commit that referenced this pull request Sep 21, 2022
Summary:
Re-submit for approved PR that was then reverted: #85084

Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations

Test Plan:
Sample unit test output --

[...]
test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
[...]

Reviewed By: mikekgfb

Differential Revision: D39433029

Pull Request resolved: #85432
Approved by: https://github.com/zrphercule
mehtanirav pushed a commit that referenced this pull request Oct 4, 2022
Summary: Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations

Test Plan:
Sample unit test output --

[...]
test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
[...]

Reviewed By: mikekgfb

Differential Revision: D39433029

Pull Request resolved: #85084
Approved by: https://github.com/zrphercule
mehtanirav pushed a commit that referenced this pull request Oct 4, 2022
Summary:
Re-submit for approved PR that was then reverted: #85084

Create unit test to detect cuBLAS breakage via large differences between CPU and GPU addmm invocations

Test Plan:
Sample unit test output --

[...]
test_cublas_addmm_size_10000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_10000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_1000_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_bfloat16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float16 (test_linalg.TestLinalgCPU) ... ok
test_cublas_addmm_size_100_cpu_float32 (test_linalg.TestLinalgCPU) ... ok
[...]

Reviewed By: mikekgfb

Differential Revision: D39433029

Pull Request resolved: #85432
Approved by: https://github.com/zrphercule
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants