Disable TF32 in some linalg functions#73460
Conversation
CI Flow Status⚛️ CI FlowRuleset - Version:
|
🔗 Helpful links
💊 CI failures summary and remediationsAs of commit e7008b5 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Please report bugs/suggestions to the (internal) Dr. CI Users group. |
|
See also #68020. Do we need to adjust some tests now to reflect better accuracy? We won't see it in our CI, but there should be tests that would catch bad linalg computations if someone were to run them on ampere cards. |
|
Why did it start happening only now? |
|
For reference, the formula for svdvals backwards is equivalent to the previous one we had, only that with a small optimisation for wide matrices. What that PR adds is forward AD support. Could it be that that's making this test fail, or is it just the standard flakiness from TF32? |
|
It's just TF32 precision issue on A100 and 3090. The test on V100 passed well (which doesn't have TF32). Your implementation is correct I think. Relax. 😄 |
|
What about the other functions? This PR fixes a handful of backward functions, not just svd. Were they failing before? |
|
There are some other tests that rely on pytorch/torch/csrc/autograd/FunctionsManual.cpp Lines 2663 to 2666 in e421492 pytorch/torch/csrc/autograd/FunctionsManual.cpp Lines 2907 to 2910 in e421492 There may be other linalg operators that failed before this PR due to TF32, but I didn't check that one by one. Since there is a chance to fix the backward, I think it's better to have all of them fixed. |
|
Fwiw, that's a check that I wrote as a "never false positive, fair if it's false negative" (note the tolerances). So if it's firing, it's fine with me if we make it more lax if that makes TF32 pass really. |
|
tf32 should not be used for linalg operations |
|
@ngimel has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
|
Hey @xwang233. |
Summary: Disable TF32 in some linalg functions See also pytorch/pytorch#67948 #50453 pytorch/pytorch#44240 Pull Request resolved: pytorch/pytorch#73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981 (cherry picked from commit cd75ec645b86c4b4a66c35696ce891d006f3833b)
Summary: Disable TF32 in some linalg functions See also pytorch/pytorch#67948 #50453 pytorch/pytorch#44240 Pull Request resolved: pytorch/pytorch#73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981 (cherry picked from commit cd75ec645b86c4b4a66c35696ce891d006f3833b)
Summary: Disable TF32 in some linalg functions See also pytorch#67948 pytorch#50453 pytorch#44240 Pull Request resolved: pytorch#73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981 (cherry picked from commit cd75ec6)
pytorch#73614) Summary: Follow up of pytorch#73460, pytorch#73461 Pull Request resolved: pytorch#73614 Reviewed By: malfet Differential Revision: D34772822 Pulled By: ngimel fbshipit-source-id: 4e2bea0173d1b6b01e857ef63ef5c2d8c3802544 (cherry picked from commit 5994863)
Disable TF32 in some linalg functions
See also #67948 #50453 #44240