Fix TF32 failures in test_linalg.py by zasdfgbnm · Pull Request #50453 · pytorch/pytorch

zasdfgbnm · 2021-01-13T01:36:11Z

On Ampere GPU, matmuls are computed by default with TF32 when the dtype is torch.float: https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices, which results in reduced precision in results. However, linear algebra usually need higher precision, therefore lots of tests in test_linalg.py are failing on Ampere GPU because of precision issue.

To fix this issue:

Most linear algebra methods, except for matmuls, should add NoTF32Guard
Expected results in unit tests should compute matmuls using numpy instead of pytorch cuda.

facebook-github-bot · 2021-01-13T01:36:23Z

💊 CI failures summary and remediations

As of commit 4b8f704 (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 1/2 non-CircleCI failure(s)

1 failure not recognized by patterns:

Job	Step	Action
^{docker-pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7}	^{Check if image should be built}	🔁 rerun

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions to the (internal) Dr. CI Users group.

ngimel · 2021-01-13T01:50:47Z

Linear algebra methods for the most part call magma, and either create a cublas handle for magma with disabled tf32, or magma itself creates a default cublas handle ( :-( ) that does not have tf32 enabled. So as far as implementations are concerned we should be fine? (Problems with cusolver are separate, cc @xwang233 ).
Re: tests, I agree, for "expected" we should disable tf32 either by setting a switch or calling cpu matmuls.

codecov · 2021-01-13T07:11:58Z

Codecov Report

Merging #50453 (6a170d8) into master (5834438) will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master   #50453      +/-   ##
==========================================
- Coverage   80.70%   80.70%   -0.01%     
==========================================
  Files        1905     1905              
  Lines      206813   206814       +1     
==========================================
- Hits       166916   166913       -3     
- Misses      39897    39901       +4

zasdfgbnm · 2021-01-13T17:59:20Z

This should be ready

mruberry · 2021-01-22T00:32:05Z


    @dtypes(torch.float32, torch.float64, torch.complex64, torch.complex128)
+    @tf32_on_and_off(1e-3)
    def test_kron(self, device, dtype):


This is interesting. @zasdfgbnm, can you elaborate on the impact of tf32 on the kronecker product? @ngimel is surprised that cuBLAS tries to run a tf32 kernel for its inputs.

kron internally uses tensordot, which internally uses matmul, which is running in tf32

I have removed all kron changes in this PR. kron will be fixed in #50927, and this PR can be landed independently.

mruberry · 2021-01-22T00:35:27Z

    def test_pinv(self, device, dtype):
        from torch.testing._internal.common_utils import random_hermitian_pd_matrix

+        def _(tensor):


Instead of this function, can you just do np_A and np_A_pinv below? That would prevent excessive copying if they're on CUDA, too.

mruberry

@ngimel and I took a look. Looks good. We have one suggestion and one question about kron.

We're wondering if we should guard kron against TF32 as well.

zasdfgbnm · 2021-01-22T00:53:48Z

I don't know. It is quite strange to me that Kronecker product internally uses matmul. From the definition, there is no sum at all...

zasdfgbnm · 2021-01-22T01:03:39Z

Why can not kron be implemented as below? This looks simple and fast:

import torch

def mykron(a, b):
    assert a.dim() == b.dim()
    a_view_shape = []
    b_view_shape = []
    ab_view_shape = []
    for i in range(a.dim()):
        a_view_shape.append(a.size(i))
        a_view_shape.append(1)
        b_view_shape.append(1)
        b_view_shape.append(b.size(i))
        ab_view_shape.append(a.size(i) * b.size(i))
    return (a.reshape(a_view_shape) * b.reshape(b_view_shape)).reshape(ab_view_shape)

a = torch.randn(5, 5, 5)
b = torch.randn(6, 6, 6)

r1 = torch.kron(a, b)
r2 = mykron(a, b)

diff = (r1 - r2).abs().max()
print(diff)

zasdfgbnm · 2021-01-22T01:16:24Z

Well, based on my benchmark, seems that we should use my implementation for kron: https://github.com/zasdfgbnm/things/blob/master/2021Q1/kron.ipynb

I will open a PR

Edit:
PR is #50927

zasdfgbnm · 2021-01-22T19:01:05Z

I have resolved all review comments

facebook-github-bot

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-01-27T03:56:26Z

@ngimel merged this pull request in ba316a7.

Summary: This PR does several things to relax test tolerance - Do not use TF32 in cuda matmul in test_c10d. See #52941. - Do not use TF32 in cuda matmul in test_linalg. Increase atol for float and cfloat. See #50453 The tolerance is increased because most linear algebra operators are not that stable in single precision. Pull Request resolved: #56114 Reviewed By: ailzhang Differential Revision: D28554467 Pulled By: ngimel fbshipit-source-id: 90416be8e4c048bedb16903b01315584d344ecdf

Summary: Disable TF32 in some linalg functions See also #67948 #50453 #44240 Pull Request resolved: #73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981

Summary: Disable TF32 in some linalg functions See also #67948 #50453 #44240 Pull Request resolved: #73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981 (cherry picked from commit cd75ec6)

Summary: On Ampere GPU, matmuls are computed by default with TF32 when the dtype is `torch.float`: https://pytorch.org/docs/stable/notes/cuda.html#tensorfloat-32-tf32-on-ampere-devices, which results in reduced precision in results. However, linear algebra usually need higher precision, therefore lots of tests in `test_linalg.py` are failing on Ampere GPU because of precision issue. To fix this issue: - Most linear algebra methods, except for matmuls, should add `NoTF32Guard` - Expected results in unit tests should compute matmuls using numpy instead of pytorch cuda. Pull Request resolved: pytorch#50453 Reviewed By: glaringlee Differential Revision: D26023005 Pulled By: ngimel fbshipit-source-id: f0ea533494fee322b07925565b57e3b0db2570c5

Summary: This PR does several things to relax test tolerance - Do not use TF32 in cuda matmul in test_c10d. See pytorch#52941. - Do not use TF32 in cuda matmul in test_linalg. Increase atol for float and cfloat. See pytorch#50453 The tolerance is increased because most linear algebra operators are not that stable in single precision. Pull Request resolved: pytorch#56114 Reviewed By: ailzhang Differential Revision: D28554467 Pulled By: ngimel fbshipit-source-id: 90416be8e4c048bedb16903b01315584d344ecdf

Summary: Disable TF32 in some linalg functions See also pytorch#67948 pytorch#50453 pytorch#44240 Pull Request resolved: pytorch#73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981 (cherry picked from commit cd75ec6)

zasdfgbnm added 5 commits January 12, 2021 17:04

save

95a6b0e

save

8555649

save

ddb327b

save

b50c978

save

16ee914

zasdfgbnm requested review from IvanYashchuk and ngimel January 13, 2021 01:36

facebook-github-bot added the cla signed label Jan 13, 2021

pytorchbot added the open source label Jan 13, 2021

save

6a170d8

zasdfgbnm marked this pull request as ready for review January 13, 2021 17:59

mrshenli added module: linear algebra Issues related to specialized linear algebra operations in PyTorch; includes matrix multiply matmul triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module labels Jan 15, 2021

mrshenli requested a review from mruberry January 15, 2021 03:08

mruberry reviewed Jan 22, 2021

View reviewed changes

zasdfgbnm added 3 commits January 21, 2021 18:38

Update test_linalg.py

603f9d2

save

c4dc55d

Merge branch 'master' of github.com:pytorch/pytorch into tf32-failures

4b8f704

ngimel approved these changes Jan 22, 2021

View reviewed changes

facebook-github-bot reviewed Jan 22, 2021

View reviewed changes

facebook-github-bot closed this in ba316a7 Jan 27, 2021

facebook-github-bot added the Merged label Jan 27, 2021

zasdfgbnm deleted the tf32-failures branch January 27, 2021 05:21

xwang233 mentioned this pull request Apr 15, 2021

Do not use TF32 matmul in linalg and DDP tests #56114

Closed

xwang233 mentioned this pull request Feb 25, 2022

Disable TF32 in some linalg functions #73460

Closed

Conversation

zasdfgbnm commented Jan 13, 2021

Uh oh!

facebook-github-bot commented Jan 13, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

1 failure not recognized by patterns:

Uh oh!

ngimel commented Jan 13, 2021

Uh oh!

codecov Bot commented Jan 13, 2021

Codecov Report

Uh oh!

zasdfgbnm commented Jan 13, 2021

Uh oh!

mruberry Jan 22, 2021

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm Jan 22, 2021

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm Jan 22, 2021

Choose a reason for hiding this comment

Uh oh!

mruberry Jan 22, 2021

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm Jan 22, 2021

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm Jan 22, 2021

Choose a reason for hiding this comment

Uh oh!

mruberry left a comment

Choose a reason for hiding this comment

Uh oh!

zasdfgbnm commented Jan 22, 2021

Uh oh!

zasdfgbnm commented Jan 22, 2021

Uh oh!

zasdfgbnm commented Jan 22, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zasdfgbnm commented Jan 22, 2021

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jan 27, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

facebook-github-bot commented Jan 13, 2021 •

edited

Loading

zasdfgbnm commented Jan 22, 2021 •

edited

Loading