Revert "mul: remove opmath cast sequence (pytorch#9663)" by rajkthakur · Pull Request #9701 · pytorch/xla

rajkthakur · 2025-11-04T19:34:20Z

Commit 2a9138a removed .use_opmathtype_for_compute() from element-wise 'mul' operation, this breaks mixed-precision accumulation behavior expected by the Neuron compiler that traces/compile on CPU and later execute the binary on neuron hardwares, causing significant accuracy degradation in:

Llama 3.1 70B models (16.7% throughput drop, accuracy failures)
Mixtral 8x22B models (accuracy test failures)
Other transformer models using mixed-precision compilation

Reverts: commit 2a9138a, other changes are result of rebase from r2.9
Fixes: Model accuracy failures with mixed-precision accumulation #9699

rajkthakur · 2025-11-05T00:53:00Z

Is the torchax test failure expected?

ysiraichi

The TorchAX CI could be fixed by this change
Instead of reverting the commit, I think it's better to make it backend specific, and land it on master and cherry-pick it to r2.9?

What do you think?

rajkthakur · 2025-11-06T00:11:52Z

While I understand the suggestion from #8545 to make this backend-specific, I believe a full revert is more appropriate atleast for release branch r2.9:

The change affects the fundamental numerical behavior of torch.mul, which is called extensively throughout any model. Even small precision differences compound in:
• Attention mechanisms with large sequence lengths
• Gradient accumulation/reduction over many steps and likely XLA based compilers need to explicitly handle this scenario.
Risk vs. benefit:
• Risk: Our testing shows concrete accuracy regressions and are blocking for production workloads(mixtral, llama) on current compiler, which I guess might impact other users too.
• Benefit: The original issue [Q][GPU][BF16] torch.mul is lowered to HLO as an f32 multiply #8545 was a question about the seeming unnecessary upcast/downcast that also appears in PyTorch CUDA. The upcast/downcast can be already removed by using PyTorch autocast so there's no need to fix [Q][GPU][BF16] torch.mul is lowered to HLO as an f32 multiply #8545 for r2.9. We can keep this in master branch for further investigation.

Let me know your thoughts?

ysiraichi · 2025-11-06T12:57:57Z

Thank you for your detailed analysis.

I'm sorry, but I still think that the best way to go about this is to make it backend specific. Actually, now that I'm thinking about it, I think it would make more sense to do it in the lowering step.

That said, since we don't really want to introduce a lot of changes here, and the infrastructure is already in place, I would say that we should make that use_opmathtype_for_compute() call backend specific.

It's a simple change (something like this condition for TPU) that fixes this problem for the Neuron backend, while leaving the other backends with the "backend-independent" choice of not doing that.

The original issue [Q][GPU][BF16] torch.mul is lowered to HLO as an f32 multiply #8545 was a question about the seeming unnecessary upcast/downcast that also appears in PyTorch CUDA. The upcast/downcast can be already removed by using PyTorch autocast so there's no need to fix [Q][GPU][BF16] torch.mul is lowered to HLO as an f32 multiply #8545 for r2.9. We can keep this in master branch for further investigation.

That "seeming unnecessary upcast/downcast" is exactly what we are talking about, here. And, no, PyTorch autocast does not solve this issue (see this comment).

ysiraichi

Thank you for the PR.
Let me know if you have any questions.

ysiraichi · 2025-11-11T12:36:32Z

Could you rebase this PR, so that there are only your commits?

This reverts commit 2a9138a.

rajkthakur · 2025-11-11T19:01:04Z

Could you rebase this PR, so that there are only your commits?

updated

ysiraichi

LGTM.
Let's just wait for the CI.

rajkthakur · 2025-11-11T22:57:54Z

@ysiraichi CI is completed.

jeffhataws · 2025-11-12T00:53:57Z

Thanks @rajkthakur @ysiraichi . It is merged now and ready to go. @bhavya01 will make a final 2.9 release candidate.

pytorch-bot Bot added the ci-no-td label Nov 4, 2025

jeffhataws requested a review from ysiraichi November 4, 2025 19:41

jeffhataws mentioned this pull request Nov 4, 2025

[torch-xla 2.9] Accuracy regressions for several large models #9699

Closed

ysiraichi requested a review from zhanyong-wan November 5, 2025 13:21

ysiraichi requested changes Nov 5, 2025

View reviewed changes

rajkthakur force-pushed the fix_logits_error branch from 9eb03b7 to ddfe243 Compare November 8, 2025 04:33

rajkthakur changed the title ~~Revert - "mul: remove opmath cast sequence (#9663)"~~ mul: add opmath cast sequence for Neuron/CPU Nov 8, 2025

jeffhataws requested review from qihqi and ysiraichi November 8, 2025 05:28

ysiraichi requested changes Nov 10, 2025

View reviewed changes

Comment thread torch_xla/csrc/aten_xla_type.cpp Outdated

Comment thread torch_xla/csrc/aten_xla_type.cpp Outdated

Comment thread torch_xla/csrc/aten_xla_type.cpp Outdated

Comment thread test/test_operations_hlo.py Outdated

rajkthakur changed the title ~~mul: add opmath cast sequence for Neuron/CPU~~ Revert "mul: remove opmath cast sequence (pytorch#9663)" Nov 10, 2025

rajkthakur added 3 commits November 11, 2025 18:57

Add opmath cast sequence for CPU or Neuron

80b4f78

test upcast on cpu

63e7bf1

Revert "mul: remove opmath cast sequence (pytorch#9663)"

214f075

This reverts commit 2a9138a.

rajkthakur force-pushed the fix_logits_error branch from 50f33f2 to 214f075 Compare November 11, 2025 19:00

jeffhataws requested review from bhavya01 and ysiraichi November 11, 2025 21:04

ysiraichi approved these changes Nov 11, 2025

View reviewed changes

jeffhataws merged commit 66f8859 into pytorch:r2.9 Nov 12, 2025
24 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Revert "mul: remove opmath cast sequence (pytorch#9663)"#9701

Revert "mul: remove opmath cast sequence (pytorch#9663)"#9701
jeffhataws merged 3 commits intopytorch:r2.9from
rajkthakur:fix_logits_error

rajkthakur commented Nov 4, 2025 •

edited

Loading

Uh oh!

rajkthakur commented Nov 5, 2025

Uh oh!

ysiraichi left a comment

Uh oh!

rajkthakur commented Nov 6, 2025 •

edited

Loading

Uh oh!

ysiraichi commented Nov 6, 2025 •

edited

Loading

Uh oh!

ysiraichi left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ysiraichi commented Nov 11, 2025

Uh oh!

rajkthakur commented Nov 11, 2025

Uh oh!

ysiraichi left a comment

Uh oh!

rajkthakur commented Nov 11, 2025

Uh oh!

Uh oh!

jeffhataws commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rajkthakur commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rajkthakur commented Nov 5, 2025

Uh oh!

ysiraichi left a comment

Choose a reason for hiding this comment

Uh oh!

rajkthakur commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ysiraichi commented Nov 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ysiraichi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ysiraichi commented Nov 11, 2025

Uh oh!

rajkthakur commented Nov 11, 2025

Uh oh!

ysiraichi left a comment

Choose a reason for hiding this comment

Uh oh!

rajkthakur commented Nov 11, 2025

Uh oh!

Uh oh!

jeffhataws commented Nov 12, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rajkthakur commented Nov 4, 2025 •

edited

Loading

rajkthakur commented Nov 6, 2025 •

edited

Loading

ysiraichi commented Nov 6, 2025 •

edited

Loading