Conversation
💊 CI failures summary and remediationsAs of commit b137059 (more details on the Dr. CI page):
ci.pytorch.org: 1 failedThis comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 138 times. |
Codecov Report
@@ Coverage Diff @@
## master #44240 +/- ##
==========================================
+ Coverage 68.01% 68.03% +0.02%
==========================================
Files 393 393
Lines 50847 50855 +8
==========================================
+ Hits 34583 34601 +18
+ Misses 16264 16254 -10
Continue to review full report at Codecov.
|
|
I will need to rebase and test to make sure new tests introduced in #43980 is not broken |
|
done |
|
Running 10 times does not do anything since the test set the seed deterministically? Or did you find a way to disable that? |
| @skipCUDAIfNoMagma | ||
| @skipCPUIfNoLapack | ||
| @dtypes(torch.float, torch.double) | ||
| @tf32_on_and_off(5.0) # values are 40x.xx vs 40x.xx |
There was a problem hiding this comment.
ouch! On the other hand, for matrix exp I'm not particularly surprised.
There was a problem hiding this comment.
So far we've disabled TF32 for linear algebra calls e.g. via MAGMA. Would it make sense to disable it for matrix_exp as well or at least raise a warning?
There was a problem hiding this comment.
According to my test, matrix_exp TF32 can be 6x faster than FP32. Different from magma, which is CPU bound, TF32 flags on matrix_exp do make a difference.
There was a problem hiding this comment.
Here is the test for precision:
import torch
for i in range(5, 15):
size = 2 ** i
print(f"size: {size}x{size}")
m1 = torch.randn(size, size, device='cuda')
m2 = torch.randn(size, size, device='cuda')
torch.backends.cuda.matmul.allow_tf32 = False
e1 = m1 @ m2
e2 = torch.matrix_exp(m1)
torch.backends.cuda.matmul.allow_tf32 = True
r1 = m1 @ m2
r2 = torch.matrix_exp(m1)
print("matmul error:", (r1 - e1).abs().max())
print("exp error:", (r2 - e2).abs().max())
print("matmul relative error:", (r1 - e1).abs().max() / e1.abs().max())
print("exp relative error:", (r2 - e2).abs().max() / e2.abs().max())
print()size: 32x32
matmul error: tensor(3.8147e-06, device='cuda:0')
exp error: tensor(4.7684e-05, device='cuda:0')
matmul relative error: tensor(1.3857e-07, device='cuda:0')
exp relative error: tensor(5.0852e-07, device='cuda:0')
size: 64x64
matmul error: tensor(0.0096, device='cuda:0')
exp error: tensor(7.3298, device='cuda:0')
matmul relative error: tensor(0.0004, device='cuda:0')
exp relative error: tensor(0.0063, device='cuda:0')
size: 128x128
matmul error: tensor(0.0167, device='cuda:0')
exp error: tensor(183.8311, device='cuda:0')
matmul relative error: tensor(0.0004, device='cuda:0')
exp relative error: tensor(0.0105, device='cuda:0')
size: 256x256
matmul error: tensor(0.0212, device='cuda:0')
exp error: tensor(11040.5312, device='cuda:0')
matmul relative error: tensor(0.0003, device='cuda:0')
exp relative error: tensor(0.0102, device='cuda:0')
size: 512x512
matmul error: tensor(0.0325, device='cuda:0')
exp error: tensor(14844504., device='cuda:0')
matmul relative error: tensor(0.0003, device='cuda:0')
exp relative error: tensor(0.0215, device='cuda:0')
size: 1024x1024
matmul error: tensor(0.0432, device='cuda:0')
exp error: tensor(3.1523e+11, device='cuda:0')
matmul relative error: tensor(0.0003, device='cuda:0')
exp relative error: tensor(0.0482, device='cuda:0')
size: 2048x2048
matmul error: tensor(0.0715, device='cuda:0')
exp error: tensor(2.8972e+17, device='cuda:0')
matmul relative error: tensor(0.0003, device='cuda:0')
exp relative error: tensor(0.1296, device='cuda:0')
size: 4096x4096
matmul error: tensor(0.1049, device='cuda:0')
exp error: tensor(7.2534e+25, device='cuda:0')
matmul relative error: tensor(0.0003, device='cuda:0')
exp relative error: tensor(0.3424, device='cuda:0')
size: 8192x8192
matmul error: tensor(0.1492, device='cuda:0')
exp error: tensor(2.3980e+37, device='cuda:0')
matmul relative error: tensor(0.0003, device='cuda:0')
exp relative error: tensor(0.5230, device='cuda:0')
size: 16384x16384
matmul error: tensor(0.2262, device='cuda:0')
exp error: tensor(nan, device='cuda:0')
matmul relative error: tensor(0.0003, device='cuda:0')
exp relative error: tensor(nan, device='cuda:0')
There was a problem hiding this comment.
I don't think there is an easy way to disable TF32 for matrix_exp. It is implemented using other aten operators, and at::matmul is where the TF32 flag play a role. You can not just disable this flag at the beginning of matrix_exp, because these flags are not threadsafe.
There was a problem hiding this comment.
@ngimel Can we not acquire the current cuBLAS handle and use the above guard before calling matrix_exp? As long as the functions matrix_exp is composed of don't use multiple cuBLAS handles maybe that will work?
matrix_exp appears to be very little used so we'd prefer to avoid a precision headache. If it's difficult to call it with tf32 disabled locally then throwing a warning seems OK.
There was a problem hiding this comment.
Are there other cases of poor precision that aren't covered yet?
@ngimel suggests all linear algebra computations other than matmul should disable tf32. Most of these operations use MAGMA currently, but this PR is going to use cuSolver and cuBLAS calls #42403. Do we need to update it or put warnings around these operations for how to handle tf32?
There was a problem hiding this comment.
I have disabled TF32 on matrix_exp by introducing a thread_local variable to override the TF32 flag.
There was a problem hiding this comment.
Can we deal with cusolver ops in later PR? I don't know which op will be affected now. After @xwang233 has migrated ops to cusolver, I will run tests to see if it is needed to disable TF32.
facebook-github-bot
left a comment
There was a problem hiding this comment.
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
Summary: Disable TF32 in some linalg functions See also pytorch/pytorch#67948 #50453 pytorch/pytorch#44240 Pull Request resolved: pytorch/pytorch#73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981 (cherry picked from commit cd75ec645b86c4b4a66c35696ce891d006f3833b)
Summary: Disable TF32 in some linalg functions See also pytorch/pytorch#67948 #50453 pytorch/pytorch#44240 Pull Request resolved: pytorch/pytorch#73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981 (cherry picked from commit cd75ec645b86c4b4a66c35696ce891d006f3833b)
Summary: - The thresholds of some tests are bumped up. Depending on the random generator, sometimes these tests fail with things like 0.0059 is not smaller than 0.005. I ran `test_nn.py` and `test_torch.py` for 10+ times to check these are no longer flaky. - Add `tf32_on_and_off` to new `matrix_exp` tests. - Disable TF32 on test suites other than `test_nn.py` and `test_torch.py` cc: ptrblck Pull Request resolved: pytorch#44240 Reviewed By: mruberry Differential Revision: D23882498 Pulled By: ngimel fbshipit-source-id: 44a9ec08802c93a2efaf4e01d7487222478b6df8
Summary: Disable TF32 in some linalg functions See also pytorch#67948 pytorch#50453 pytorch#44240 Pull Request resolved: pytorch#73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981 (cherry picked from commit cd75ec6)
test_nn.pyandtest_torch.pyfor 10+ times to check these are no longer flaky.@tf32_on_and_offto newmatrix_exptests.test_nn.pyandtest_torch.pycc: @ptrblck