Skip to content

Disable TF32 in pinv_jvp and pinv_backward#67948

Closed
crcrpar wants to merge 1 commit intopytorch:masterfrom
crcrpar:disable-tf32-pinv-bwd
Closed

Disable TF32 in pinv_jvp and pinv_backward#67948
crcrpar wants to merge 1 commit intopytorch:masterfrom
crcrpar:disable-tf32-pinv-bwd

Conversation

@crcrpar
Copy link
Copy Markdown
Collaborator

@crcrpar crcrpar commented Nov 6, 2021

@pytorch-probot
Copy link
Copy Markdown

pytorch-probot Bot commented Nov 6, 2021

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/crcrpar/pytorch/blob/66f27a2c395045d2e94c1b97fe0d0c3c6b1e0590/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows Labels (bold enabled) Status
Triggered Workflows
linux-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/noarch, ciflow/xla ✅ triggered
linux-vulkan-bionic-py3.6-clang9 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/vulkan ✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3-clang5-mobile-build ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-dynamic ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static ciflow/all, ciflow/default, ciflow/linux, ciflow/mobile ✅ triggered
linux-xenial-py3.6-clang7-asan ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/sanitizers ✅ triggered
linux-xenial-py3.6-clang7-onnx ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux, ciflow/onnx ✅ triggered
linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7 ciflow/all, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
linux-xenial-py3.6-gcc7-bazel-test ciflow/all, ciflow/bazel, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit ciflow/all, ciflow/android, ciflow/cpu, ciflow/default, ciflow/linux ✅ triggered
win-vs2019-cpu-py3 ciflow/all, ciflow/cpu, ciflow/default, ciflow/win ✅ triggered
win-vs2019-cuda11.3-py3 ciflow/all, ciflow/cuda, ciflow/default, ciflow/win ✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
docker-builds ciflow/all 🚫 skipped
ios-12-5-1-arm64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-custom-ops ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-arm64-metal ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64 ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-coreml ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
ios-12-5-1-x86-64-full-jit ciflow/all, ciflow/ios, ciflow/macos 🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux 🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/slow 🚫 skipped
linux-xenial-py3-clang5-mobile-code-analysis ciflow/all, ciflow/linux, ciflow/mobile 🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4 ciflow/all, ciflow/cpu, ciflow/linux 🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/libtorch, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled, ciflow/slow, ciflow/slow-gradcheck 🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7 ciflow/all, ciflow/cuda, ciflow/linux, ciflow/scheduled 🚫 skipped
periodic-win-vs2019-cuda11.1-py3 ciflow/all, ciflow/cuda, ciflow/scheduled, ciflow/win 🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:
# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

facebook-github-bot commented Nov 6, 2021

🔗 Helpful links

💊 CI failures summary and remediations

As of commit 66f27a2 (more details on the Dr. CI page):


💚 💚 Looks good so far! There are no failures yet. 💚 💚


This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

@ngimel
Copy link
Copy Markdown
Collaborator

ngimel commented Nov 8, 2021

I'll merge this, but can we have a more systematic solution? A lot of linear algebra functions use matmuls in the backward. We currently disable tf32 in cublas handle that magma uses, but I think not for any other circumstances for linalg functions, and we don't run ampere in CI.
cc @IvanYashchuk @xwang233 @lezcano who probably have better idea of the scope of the problem.
cc @albanD also, I suspect this is happening in a lot of linalg autograd functions.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@albanD
Copy link
Copy Markdown
Collaborator

albanD commented Nov 8, 2021

Do you have more context on when it is ok to use tf32 and when not? (like a doc/slides)

@ngimel
Copy link
Copy Markdown
Collaborator

ngimel commented Nov 8, 2021

In linear algebra - never. The only context where it's ok to use tf32 is in linear layers in nn networks.

@albanD
Copy link
Copy Markdown
Collaborator

albanD commented Nov 8, 2021

Should we change the default then to never use it except when used by linear or our layers?

@ngimel
Copy link
Copy Markdown
Collaborator

ngimel commented Nov 8, 2021

There;s #67384 discussing it, where using it only in nn.functional.linear is listed as option 3. However it's way to confusing when nn.functional.linear(x,weight,bias) produces different result compared to matmul(x,weight.t())+bias.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@ngimel merged this pull request in c5e5264.

@crcrpar crcrpar deleted the disable-tf32-pinv-bwd branch November 14, 2021 07:35
facebook-github-bot pushed a commit that referenced this pull request Feb 28, 2022
Summary:
Disable TF32 in some linalg functions

See also #67948 #50453 #44240

Pull Request resolved: #73460

Reviewed By: albanD

Differential Revision: D34493487

Pulled By: ngimel

fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981
pytorchmergebot pushed a commit that referenced this pull request Feb 28, 2022
Summary:
Disable TF32 in some linalg functions

See also #67948 #50453 #44240

Pull Request resolved: #73460

Reviewed By: albanD

Differential Revision: D34493487

Pulled By: ngimel

fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981
(cherry picked from commit cd75ec6)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 3, 2022
Summary:
Disable TF32 in some linalg functions

See also pytorch/pytorch#67948 #50453 pytorch/pytorch#44240

Pull Request resolved: pytorch/pytorch#73460

Reviewed By: albanD

Differential Revision: D34493487

Pulled By: ngimel

fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981
(cherry picked from commit cd75ec645b86c4b4a66c35696ce891d006f3833b)
cyyever pushed a commit to cyyever/pytorch_private that referenced this pull request Mar 3, 2022
Summary:
Disable TF32 in some linalg functions

See also pytorch/pytorch#67948 #50453 pytorch/pytorch#44240

Pull Request resolved: pytorch/pytorch#73460

Reviewed By: albanD

Differential Revision: D34493487

Pulled By: ngimel

fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981
(cherry picked from commit cd75ec645b86c4b4a66c35696ce891d006f3833b)
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
Summary:
Fixes pytorch#67947

cc ptrblck xwang233 zasdfgbnm

Pull Request resolved: pytorch#67948

Reviewed By: H-Huang

Differential Revision: D32251934

Pulled By: ngimel

fbshipit-source-id: a2b1a118337b38db61350c9e49f1ba19030d70ec
laurentdupin pushed a commit to laurentdupin/pytorch that referenced this pull request Apr 25, 2026
Summary:
Disable TF32 in some linalg functions

See also pytorch#67948 pytorch#50453 pytorch#44240

Pull Request resolved: pytorch#73460

Reviewed By: albanD

Differential Revision: D34493487

Pulled By: ngimel

fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981
(cherry picked from commit cd75ec6)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

TestCommonCUDA.test_noncontiguous_samples_linalg_pinv_hermitian_cuda_float32 fails when TF32 is enabled

5 participants