bf16 support for per tensor backward by liangel-02 · Pull Request #165362 · pytorch/pytorch

liangel-02 · 2025-10-13T20:49:50Z

Adding bf16 for the backward pass of torch._fake_quantize_learnable_per_tensor_affine().

Note that for testing, we modified the seed to avoid increasing tolerance due to cases where difference in Python vs CPP downcasting causes tensor mismatches. (e.g. 27.87704 vs 27.8408 before downcasting, 27.7500 vs 27.8750 after downcasting for Python vs CPP op)

Stack from ghstack (oldest at bottom):

-> bf16 support for per tensor backward #165362

[ghstack-poisoned]

Follow up to #165098 - adding bf16 support for the backward pass. To avoid BC breaking changes/losing precision, we upcast the parameters to fp32 after the op gets called, and downcast the gradients to bf16 before returning. For testing, we upcast to fp32 before calling the reference function. [ghstack-poisoned]

[ghstack-poisoned]

Follow up to #165098 - adding bf16 support for the backward pass. To avoid BC breaking changes/losing precision, we upcast the parameters to fp32 after the op gets called, and downcast the gradients to bf16 before returning. For testing, we upcast to fp32 before calling the reference function. We increase the tolerance to 1e-2 for bf16 inputs because of a difference in casting calculations between python's `x.to(torch.bfloat16)` and cpp's `x.to(at::kBFloat16)` (after comparing intermediate tensors, we found that the numerics diverge after the final casting). We don't explicitly cast in the CPP op but rather let autograd/optimizer handle it. [ghstack-poisoned]

pytorch-bot · 2025-10-13T20:49:54Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165362

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (1 Unrelated Failure)

As of commit 6d643da with merge base fbe0d20 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

trunk / macos-py3-arm64 / test (default, 1, 3, macos-m1-stable) (gh) (similar failure)
RuntimeError: doctests 1/1 failed!

This comment was automatically generated by Dr. CI and updates every 15 minutes.

ghstack-source-id: d03e828 Pull Request resolved: #165362

[ghstack-poisoned]

ghstack-source-id: fb073e2 Pull Request resolved: #165362

[ghstack-poisoned]

ghstack-source-id: 7a6f0c0 Pull Request resolved: #165362

[ghstack-poisoned]

ghstack-source-id: 44b30c2 Pull Request resolved: #165362

[ghstack-poisoned]

ghstack-source-id: b9eb103 Pull Request resolved: #165362

test/quantization/core/test_workflow_ops.py

[ghstack-poisoned]

ghstack-source-id: 9776242 Pull Request resolved: #165362

Skylion007 · 2025-10-14T16:50:07Z

aten/src/ATen/native/quantized/FakeQuantPerTensorAffine.cpp

  */
-  float scale_val = scale[0].item<float>();
+
+  bool is_bfloat16 = (X.scalar_type() == at::kBFloat16);


So we shouldn't cast fp16

we enabled bf16 support for per_tensor alongside per_channel in this pr #165325, so if we want to enable fp16 we can do it in a separate pr for both these ops?

Skylion007 · 2025-10-14T16:50:28Z

aten/src/ATen/native/quantized/FakeQuantPerTensorAffine.cpp

+  auto dScale = dScale_vec.sum().unsqueeze(0).to(scale_.device());
+  auto dZeroPoint = dZeroPoint_vec.sum().unsqueeze(0).to(zero_point_.device());

  return std::make_tuple(dX, dScale, dZeroPoint);


All these should be std::move into make tuple btw

should we make this change in a follow up PR since this PR doesn't actually touch this code?

meta-codesync · 2025-10-14T20:17:45Z

@liangel-02 has imported this pull request. If you are a Meta employee, you can view this in D84639869.

liangel-02 · 2025-10-16T15:06:15Z

@pytorchbot merge

pytorchmergebot · 2025-10-16T15:08:32Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Adding bf16 for the backward pass of `torch._fake_quantize_learnable_per_tensor_affine()`. Note that for testing, we modified the seed to avoid increasing tolerance due to cases where difference in Python vs CPP downcasting causes tensor mismatches. (e.g. 27.87704 vs 27.8408 before downcasting, 27.7500 vs 27.8750 after downcasting for Python vs CPP op) Pull Request resolved: pytorch#165362 Approved by: https://github.com/andrewor14

liangel-02 added 11 commits October 13, 2025 08:12

bf16 support for per_channel bwd

2de110f

[ghstack-poisoned]

Update on "bf16 support for per_channel bwd"

e6b854e

[ghstack-poisoned]

Update on "bf16 support for per_channel bwd"

b4aa38d

[ghstack-poisoned]

Update on "bf16 support for per_channel bwd"

495401d

[ghstack-poisoned]

bf16 support for per tensor backward

d280f9f

[ghstack-poisoned]

liangel-02 requested review from digantdesai, jerryzh168, jianyuh, kimishpatel and salilsdesai as code owners October 13, 2025 20:49

liangel-02 mentioned this pull request Oct 13, 2025

bf16 support for per_channel bwd #165325

Closed

pytorch-bot bot added the release notes: quantization release notes category label Oct 13, 2025

liangel-02 added a commit that referenced this pull request Oct 13, 2025

bf16 support for per tensor backward

f184185

ghstack-source-id: d03e828 Pull Request resolved: #165362

Update on "bf16 support for per tensor backward"

ab8126f

[ghstack-poisoned]

liangel-02 added a commit that referenced this pull request Oct 13, 2025

bf16 support for per tensor backward

240854e

ghstack-source-id: fb073e2 Pull Request resolved: #165362

liangel-02 added 2 commits October 13, 2025 14:06

Update base for Update on "bf16 support for per tensor backward"

21155e0

[ghstack-poisoned]

Update on "bf16 support for per tensor backward"

9f06848

[ghstack-poisoned]

liangel-02 added a commit that referenced this pull request Oct 13, 2025

bf16 support for per tensor backward

daec586

ghstack-source-id: 7a6f0c0 Pull Request resolved: #165362

liangel-02 marked this pull request as draft October 13, 2025 21:10

liangel-02 added 2 commits October 14, 2025 07:31

Update base for Update on "bf16 support for per tensor backward"

77f7b7c

[ghstack-poisoned]

Update on "bf16 support for per tensor backward"

cecd671

[ghstack-poisoned]

liangel-02 added a commit that referenced this pull request Oct 14, 2025

bf16 support for per tensor backward

90e6c45

ghstack-source-id: 44b30c2 Pull Request resolved: #165362

liangel-02 marked this pull request as ready for review October 14, 2025 14:31

liangel-02 added 2 commits October 14, 2025 07:32

Update base for Update on "bf16 support for per tensor backward"

d0b5a78

[ghstack-poisoned]

Update on "bf16 support for per tensor backward"

5af666f

[ghstack-poisoned]

liangel-02 requested a review from andrewor14 October 14, 2025 14:32

liangel-02 added a commit that referenced this pull request Oct 14, 2025

bf16 support for per tensor backward

21029cd

ghstack-source-id: b9eb103 Pull Request resolved: #165362

liangel-02 changed the base branch from gh/liangel-02/2/base to main October 14, 2025 14:33

liangel-02 changed the base branch from main to gh/liangel-02/2/base October 14, 2025 14:36

andrewor14 approved these changes Oct 14, 2025

View reviewed changes

test/quantization/core/test_workflow_ops.py Show resolved Hide resolved

liangel-02 added 2 commits October 14, 2025 09:40

Update base for Update on "bf16 support for per tensor backward"

527be62

[ghstack-poisoned]

Update on "bf16 support for per tensor backward"

6d643da

[ghstack-poisoned]

liangel-02 added a commit that referenced this pull request Oct 14, 2025

bf16 support for per tensor backward

d52d638

ghstack-source-id: 9776242 Pull Request resolved: #165362

Skylion007 reviewed Oct 14, 2025

View reviewed changes

liangel-02 changed the base branch from gh/liangel-02/2/base to main October 14, 2025 20:05

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 16, 2025

pytorchmergebot added the merging label Oct 16, 2025

pytorchmergebot added the Merged label Oct 16, 2025

pytorchmergebot closed this in fe5ccb1 Oct 16, 2025

pytorchmergebot removed the merging label Oct 16, 2025

github-actions bot deleted the gh/liangel-02/2/head branch November 16, 2025 02:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

bf16 support for per tensor backward#165362

bf16 support for per tensor backward#165362
liangel-02 wants to merge 20 commits intomainfrom
gh/liangel-02/2/head

liangel-02 commented Oct 13, 2025 •

edited

Loading

Uh oh!

pytorch-bot bot commented Oct 13, 2025 •

edited

Loading

Uh oh!

Uh oh!

Skylion007 Oct 14, 2025

Uh oh!

liangel-02 Oct 14, 2025

Uh oh!

Skylion007 Oct 14, 2025

Uh oh!

liangel-02 Oct 14, 2025

Uh oh!

meta-codesync bot commented Oct 14, 2025

Uh oh!

liangel-02 commented Oct 16, 2025

Uh oh!

pytorchmergebot commented Oct 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

liangel-02 commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/165362

✅ You can merge normally! (1 Unrelated Failure)

Uh oh!

Uh oh!

Skylion007 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

liangel-02 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

Skylion007 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

liangel-02 Oct 14, 2025

Choose a reason for hiding this comment

Uh oh!

meta-codesync bot commented Oct 14, 2025

Uh oh!

liangel-02 commented Oct 16, 2025

Uh oh!

pytorchmergebot commented Oct 16, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

liangel-02 commented Oct 13, 2025 •

edited

Loading

pytorch-bot bot commented Oct 13, 2025 •

edited

Loading