Disable TF32 in `pinv_jvp` and `pinv_backward` by crcrpar · Pull Request #67948 · pytorch/pytorch

crcrpar · 2021-11-06T02:00:29Z

pytorch-probot · 2021-11-06T02:00:33Z

CI Flow Status

⚛️ CI Flow

Ruleset - Version: v1
Ruleset - File: https://github.com/crcrpar/pytorch/blob/66f27a2c395045d2e94c1b97fe0d0c3c6b1e0590/.github/generated-ciflow-ruleset.json
PR ciflow labels: ciflow/default

Workflows	Labels (bold enabled)	Status
Triggered Workflows
linux-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/noarch`, `ciflow/xla`	✅ triggered
linux-vulkan-bionic-py3.6-clang9	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/vulkan`	✅ triggered
linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3-clang5-mobile-build	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-dynamic	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3-clang5-mobile-custom-build-static	`ciflow/all`, `ciflow/default`, `ciflow/linux`, `ciflow/mobile`	✅ triggered
linux-xenial-py3.6-clang7-asan	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/sanitizers`	✅ triggered
linux-xenial-py3.6-clang7-onnx	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`, `ciflow/onnx`	✅ triggered
linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
linux-xenial-py3.6-gcc7-bazel-test	`ciflow/all`, `ciflow/bazel`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-custom-build-single-full-jit	`ciflow/all`, `ciflow/android`, `ciflow/cpu`, `ciflow/default`, `ciflow/linux`	✅ triggered
win-vs2019-cpu-py3	`ciflow/all`, `ciflow/cpu`, `ciflow/default`, `ciflow/win`	✅ triggered
win-vs2019-cuda11.3-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/default`, `ciflow/win`	✅ triggered
Skipped Workflows
caffe2-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
docker-builds	`ciflow/all`	🚫 skipped
ios-12-5-1-arm64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-custom-ops	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-arm64-metal	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64-coreml	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
ios-12-5-1-x86-64-full-jit	`ciflow/all`, `ciflow/ios`, `ciflow/macos`	🚫 skipped
libtorch-linux-xenial-cuda10.2-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
libtorch-linux-xenial-cuda11.3-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`	🚫 skipped
linux-bionic-cuda10.2-py3.9-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/slow`	🚫 skipped
linux-xenial-py3-clang5-mobile-code-analysis	`ciflow/all`, `ciflow/linux`, `ciflow/mobile`	🚫 skipped
parallelnative-linux-xenial-py3.6-gcc5.4	`ciflow/all`, `ciflow/cpu`, `ciflow/linux`	🚫 skipped
periodic-libtorch-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/libtorch`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-linux-xenial-cuda10.2-py3-gcc7-slow-gradcheck	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`, `ciflow/slow`, `ciflow/slow-gradcheck`	🚫 skipped
periodic-linux-xenial-cuda11.1-py3.6-gcc7	`ciflow/all`, `ciflow/cuda`, `ciflow/linux`, `ciflow/scheduled`	🚫 skipped
periodic-win-vs2019-cuda11.1-py3	`ciflow/all`, `ciflow/cuda`, `ciflow/scheduled`, `ciflow/win`	🚫 skipped

You can add a comment to the PR and tag @pytorchbot with the following commands:

# ciflow rerun, "ciflow/default" will always be added automatically
@pytorchbot ciflow rerun

# ciflow rerun with additional labels "-l <ciflow/label_name>", which is equivalent to adding these labels manually and trigger the rerun
@pytorchbot ciflow rerun -l ciflow/scheduled -l ciflow/slow

For more information, please take a look at the CI Flow Wiki.

facebook-github-bot · 2021-11-06T02:00:36Z

🔗 Helpful links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/67948
📄 Preview docs built from this PR
📄 Preview C++ docs built from this PR
🔧 Opt-in to CIFlow to control what jobs run on your PRs

💊 CI failures summary and remediations

As of commit 66f27a2 (more details on the Dr. CI page):

💚 💚 Looks good so far! There are no failures yet. 💚 💚

This comment was automatically generated by Dr. CI (expand for details).

Please report bugs/suggestions to the (internal) Dr. CI Users group.

Click here to manually regenerate this comment.

ngimel · 2021-11-08T17:38:41Z

I'll merge this, but can we have a more systematic solution? A lot of linear algebra functions use matmuls in the backward. We currently disable tf32 in cublas handle that magma uses, but I think not for any other circumstances for linalg functions, and we don't run ampere in CI.
cc @IvanYashchuk @xwang233 @lezcano who probably have better idea of the scope of the problem.
cc @albanD also, I suspect this is happening in a lot of linalg autograd functions.

facebook-github-bot · 2021-11-08T17:39:13Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

albanD · 2021-11-08T17:52:44Z

Do you have more context on when it is ok to use tf32 and when not? (like a doc/slides)

ngimel · 2021-11-08T17:59:51Z

In linear algebra - never. The only context where it's ok to use tf32 is in linear layers in nn networks.

albanD · 2021-11-08T18:15:38Z

Should we change the default then to never use it except when used by linear or our layers?

ngimel · 2021-11-08T18:28:45Z

There;s #67384 discussing it, where using it only in nn.functional.linear is listed as option 3. However it's way to confusing when nn.functional.linear(x,weight,bias) produces different result compared to matmul(x,weight.t())+bias.

facebook-github-bot · 2021-11-08T20:10:10Z

@ngimel has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-11-09T06:35:45Z

@ngimel merged this pull request in c5e5264.

Summary: Disable TF32 in some linalg functions See also #67948 #50453 #44240 Pull Request resolved: #73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981

Summary: Disable TF32 in some linalg functions See also #67948 #50453 #44240 Pull Request resolved: #73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981 (cherry picked from commit cd75ec6)

Summary: Disable TF32 in some linalg functions See also pytorch/pytorch#67948 #50453 pytorch/pytorch#44240 Pull Request resolved: pytorch/pytorch#73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981 (cherry picked from commit cd75ec645b86c4b4a66c35696ce891d006f3833b)

Summary: Fixes pytorch#67947 cc ptrblck xwang233 zasdfgbnm Pull Request resolved: pytorch#67948 Reviewed By: H-Huang Differential Revision: D32251934 Pulled By: ngimel fbshipit-source-id: a2b1a118337b38db61350c9e49f1ba19030d70ec

Summary: Disable TF32 in some linalg functions See also pytorch#67948 pytorch#50453 pytorch#44240 Pull Request resolved: pytorch#73460 Reviewed By: albanD Differential Revision: D34493487 Pulled By: ngimel fbshipit-source-id: 958cd968ea09df3b5a4d2b4a26aaf0dfddc53981 (cherry picked from commit cd75ec6)

disable TF32 in pinv_jvp and pinv_backward

66f27a2

crcrpar requested review from albanD and soulitzer as code owners November 6, 2021 02:00

pytorch-probot Bot added the ciflow/default label Nov 6, 2021

facebook-github-bot added the cla signed label Nov 6, 2021

pytorchbot added the open source label Nov 6, 2021

albanD requested a review from ngimel November 8, 2021 09:22

ngimel approved these changes Nov 8, 2021

View reviewed changes

ngimel mentioned this pull request Nov 8, 2021

Audit use of matmuls in backward formulas of linear algebra operations #68020

Closed

facebook-github-bot closed this in c5e5264 Nov 9, 2021

facebook-github-bot added the Merged label Nov 9, 2021

crcrpar deleted the disable-tf32-pinv-bwd branch November 14, 2021 07:35

xwang233 mentioned this pull request Feb 25, 2022

Disable TF32 in some linalg functions #73460

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable TF32 in `pinv_jvp` and `pinv_backward`#67948

Disable TF32 in `pinv_jvp` and `pinv_backward`#67948
crcrpar wants to merge 1 commit intopytorch:masterfrom
crcrpar:disable-tf32-pinv-bwd

crcrpar commented Nov 6, 2021

Uh oh!

pytorch-probot Bot commented Nov 6, 2021

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Nov 6, 2021 •

edited

Loading

Uh oh!

ngimel commented Nov 8, 2021 •

edited

Loading

Uh oh!

facebook-github-bot commented Nov 8, 2021

Uh oh!

albanD commented Nov 8, 2021

Uh oh!

ngimel commented Nov 8, 2021

Uh oh!

albanD commented Nov 8, 2021

Uh oh!

ngimel commented Nov 8, 2021

Uh oh!

facebook-github-bot commented Nov 8, 2021

Uh oh!

facebook-github-bot commented Nov 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

crcrpar commented Nov 6, 2021

Uh oh!

pytorch-probot Bot commented Nov 6, 2021

⚛️ CI Flow

Uh oh!

facebook-github-bot commented Nov 6, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful links

💊 CI failures summary and remediations

Uh oh!

ngimel commented Nov 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Nov 8, 2021

Uh oh!

albanD commented Nov 8, 2021

Uh oh!

ngimel commented Nov 8, 2021

Uh oh!

albanD commented Nov 8, 2021

Uh oh!

ngimel commented Nov 8, 2021

Uh oh!

facebook-github-bot commented Nov 8, 2021

Uh oh!

facebook-github-bot commented Nov 9, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

facebook-github-bot commented Nov 6, 2021 •

edited

Loading

ngimel commented Nov 8, 2021 •

edited

Loading