Do not use TF32 matmul in linalg and DDP tests#56114
Do not use TF32 matmul in linalg and DDP tests#56114xwang233 wants to merge 10 commits intopytorch:masterfrom
Conversation
💊 CI failures summary and remediationsAs of commit a83b61c (more details on the Dr. CI page):
1 failure not recognized by patterns:
🚧 1 fixed upstream failure:These were probably caused by upstream breakages that were already fixed.
Please rebase on the
|
Codecov Report
@@ Coverage Diff @@
## master #56114 +/- ##
==========================================
+ Coverage 76.44% 77.00% +0.56%
==========================================
Files 1990 1912 -78
Lines 199690 189561 -10129
==========================================
- Hits 152651 145980 -6671
+ Misses 47039 43581 -3458 |
|
ping @ngimel 😄 |
| super(self.__class__, self).setUp() | ||
| torch.backends.cuda.matmul.allow_tf32 = False | ||
| self.precision_overrides = { | ||
| torch.float: 1e-4, |
There was a problem hiding this comment.
Does it mean that regular fp32 needs expanded tolerance? How are tests passing currently?
There was a problem hiding this comment.
Ohh, yes you're correct. I forgot to delete them. Let me modify this.
|
@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator. |
Summary: This PR does several things to relax test tolerance - Do not use TF32 in cuda matmul in test_c10d. See pytorch#52941. - Do not use TF32 in cuda matmul in test_linalg. Increase atol for float and cfloat. See pytorch#50453 The tolerance is increased because most linear algebra operators are not that stable in single precision. Pull Request resolved: pytorch#56114 Reviewed By: ailzhang Differential Revision: D28554467 Pulled By: ngimel fbshipit-source-id: 90416be8e4c048bedb16903b01315584d344ecdf
This PR does several things to relax test tolerance
The tolerance is increased because most linear algebra operators are not that stable in single precision.