[inductor] Skip addcdiv decomposition to enable FMA lowering by mlazos · Pull Request #175310 · pytorch/pytorch

mlazos · 2026-02-19T03:33:11Z

Stack from ghstack (oldest at bottom):

Add aten.addcdiv, aten.addcdiv_, and aten._foreach_addcdiv.Scalar to
decomps_to_exclude so that the FMA-based addcdiv lowering is used
instead of decomposition.

Also simplify the dynamo handlers for addcdiv to always skip inline
decomposition.

This enables bitwise precision parity with eager CUDA for addcdiv
operations used in Adam/AdamW optimizers.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo @Lucaskabela

[ghstack-poisoned]

pytorch-bot · 2026-02-19T03:33:16Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175310

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 1 Unrelated Failure

As of commit b342d39 with merge base 0b6476f ():

NEW FAILURE - The following job has failed:

inductor / inductor-test / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh)
lennard_jones

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / inductor-test-cuda13 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (#174930)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-02-19T03:33:19Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

[ghstack-poisoned]

Add aten.addcdiv, aten.addcdiv_, and aten._foreach_addcdiv.Scalar to decomps_to_exclude so that the FMA-based addcdiv lowering is used instead of decomposition. Also simplify the dynamo handlers for addcdiv to always skip inline decomposition. This enables bitwise precision parity with eager CUDA for addcdiv operations used in Adam/AdamW optimizers. ghstack-source-id: 370d088 Pull-Request: #175310

[ghstack-poisoned]

Add aten.addcdiv, aten.addcdiv_, and aten._foreach_addcdiv.Scalar to decomps_to_exclude so that the FMA-based addcdiv lowering is used instead of decomposition. Also simplify the dynamo handlers for addcdiv to always skip inline decomposition. This enables bitwise precision parity with eager CUDA for addcdiv operations used in Adam/AdamW optimizers. ghstack-source-id: 370d088 Pull-Request: #175310

[ghstack-poisoned]

Add aten.addcdiv, aten.addcdiv_, and aten._foreach_addcdiv.Scalar to decomps_to_exclude so that the FMA-based addcdiv lowering is used instead of decomposition. Also simplify the dynamo handlers for addcdiv to always skip inline decomposition. This enables bitwise precision parity with eager CUDA for addcdiv operations used in Adam/AdamW optimizers. ghstack-source-id: 7d5f500 Pull-Request: #175310

[ghstack-poisoned]

Add aten.addcdiv, aten.addcdiv_, and aten._foreach_addcdiv.Scalar to decomps_to_exclude so that the FMA-based addcdiv lowering is used instead of decomposition. Also simplify the dynamo handlers for addcdiv to always skip inline decomposition. This enables bitwise precision parity with eager CUDA for addcdiv operations used in Adam/AdamW optimizers. ghstack-source-id: 43afebc Pull-Request: #175310

[ghstack-poisoned]

Add aten.addcdiv, aten.addcdiv_, and aten._foreach_addcdiv.Scalar to decomps_to_exclude so that the FMA-based addcdiv lowering is used instead of decomposition. Also simplify the dynamo handlers for addcdiv to always skip inline decomposition. This enables bitwise precision parity with eager CUDA for addcdiv operations used in Adam/AdamW optimizers. ghstack-source-id: f30177a Pull-Request: #175310

[ghstack-poisoned]

Add aten.addcdiv, aten.addcdiv_, and aten._foreach_addcdiv.Scalar to decomps_to_exclude so that the FMA-based addcdiv lowering is used instead of decomposition. Also simplify the dynamo handlers for addcdiv to always skip inline decomposition. This enables bitwise precision parity with eager CUDA for addcdiv operations used in Adam/AdamW optimizers. ghstack-source-id: a180c70 Pull-Request: #175310

[ghstack-poisoned]

mlazos · 2026-02-28T19:56:43Z

@pytorchbot merge -i

pytorchmergebot · 2026-02-28T19:59:02Z

Merge started

Your change will be merged while ignoring the following 2 checks: inductor / inductor-test / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu), inductor / inductor-test-cuda13 / test (inductor_torchbench, 2, 2, linux.g5.4xlarge.nvidia.gpu)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Eager CUDA computes `a + alpha * b` as `fma(b, alpha, a)`. Without this, Triton computes `b * alpha` then adds to `a` as separate operations, losing the FMA precision guarantee. This affects optimizer weight_decay paths which use `grad.add(param, alpha=weight_decay)` and `_foreach_add` with alpha. Authored with Claude. Pull Request resolved: #175838 Approved by: https://github.com/v0i0 ghstack dependencies: #174912, #175309, #175310

Add aten.addcdiv, aten.addcdiv_, and aten._foreach_addcdiv.Scalar to decomps_to_exclude so that the FMA-based addcdiv lowering is used instead of decomposition. Also simplify the dynamo handlers for addcdiv to always skip inline decomposition. This enables bitwise precision parity with eager CUDA for addcdiv operations used in Adam/AdamW optimizers. ghstack-source-id: b0ac9d4 Pull-Request: pytorch/pytorch#175310

…#175310) Add aten.addcdiv, aten.addcdiv_, and aten._foreach_addcdiv.Scalar to decomps_to_exclude so that the FMA-based addcdiv lowering is used instead of decomposition. Also simplify the dynamo handlers for addcdiv to always skip inline decomposition. This enables bitwise precision parity with eager CUDA for addcdiv operations used in Adam/AdamW optimizers. Pull Request resolved: pytorch#175310 Approved by: https://github.com/v0i0 ghstack dependencies: pytorch#174912, pytorch#175309

Eager CUDA computes `a + alpha * b` as `fma(b, alpha, a)`. Without this, Triton computes `b * alpha` then adds to `a` as separate operations, losing the FMA precision guarantee. This affects optimizer weight_decay paths which use `grad.add(param, alpha=weight_decay)` and `_foreach_add` with alpha. Authored with Claude. Pull Request resolved: pytorch#175838 Approved by: https://github.com/v0i0 ghstack dependencies: pytorch#174912, pytorch#175309, pytorch#175310

Update

a454469

[ghstack-poisoned]

pytorch-bot Bot added ciflow/inductor module: dynamo module: inductor labels Feb 19, 2026

mlazos added 2 commits February 18, 2026 19:57

Update

852e62b

[ghstack-poisoned]

Update

9a0477c

[ghstack-poisoned]

mlazos changed the title ~~[inductor] Skip addcdiv decomposition for AdamW bitwise precision~~ [inductor] Skip addcdiv decomposition to enable FMA lowering Feb 19, 2026

mlazos added 2 commits February 20, 2026 16:21

Update

6c0a956

[ghstack-poisoned]

Update

2086f23

[ghstack-poisoned]

Update

696d607

[ghstack-poisoned]

mlazos added 4 commits February 21, 2026 01:14

Update

5da8801

[ghstack-poisoned]

Update

a4b3fd2

[ghstack-poisoned]

Update

0fde971

[ghstack-poisoned]

Update

54cd951

[ghstack-poisoned]

Update

47ca070

[ghstack-poisoned]

mlazos mentioned this pull request Feb 24, 2026

[inductor] Use CUDA toolkit libdevice for Triton pow precision #175594

Closed

Update

96405d8

[ghstack-poisoned]

Update

a00d5e0

[ghstack-poisoned]

This was referenced Feb 26, 2026

[inductor] Add FMA lowering for add-with-alpha on CUDA #175838

Closed

[inductor] Make addcmul/addcdiv decomp skip unconditional and add another decomp #175839

Closed

mlazos added 4 commits February 26, 2026 00:51

Update

aaae829

[ghstack-poisoned]

Update

0e7a5d9

[ghstack-poisoned]

Update

08f0df6

[ghstack-poisoned]

Update

369fe6e

[ghstack-poisoned]

Update

b96421b

[ghstack-poisoned]

mlazos added the release notes: inductor label Feb 27, 2026

Update

2e385a0

[ghstack-poisoned]

mlazos added the ciflow/trunk Trigger trunk jobs on your pull request label Feb 27, 2026

v0i0 approved these changes Feb 27, 2026

View reviewed changes

Update

b342d39

[ghstack-poisoned]

pytorchmergebot added the merging label Feb 28, 2026

pytorchmergebot added the Merged label Feb 28, 2026

pytorchmergebot closed this in d7e3966 Feb 28, 2026

pytorchmergebot removed the merging label Feb 28, 2026

github-actions Bot deleted the gh/mlazos/107/head branch March 31, 2026 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor] Skip addcdiv decomposition to enable FMA lowering#175310

[inductor] Skip addcdiv decomposition to enable FMA lowering#175310
mlazos wants to merge 22 commits intogh/mlazos/107/basefrom
gh/mlazos/107/head

mlazos commented Feb 19, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Feb 19, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Feb 19, 2026

Uh oh!

mlazos commented Feb 28, 2026

Uh oh!

pytorchmergebot commented Feb 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

mlazos commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175310

❌ 1 New Failure, 1 Unrelated Failure

Uh oh!

pytorch-bot Bot commented Feb 19, 2026

This PR needs a release notes: label

Uh oh!

mlazos commented Feb 28, 2026

Uh oh!

pytorchmergebot commented Feb 28, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mlazos commented Feb 19, 2026 •

edited

Loading

pytorch-bot Bot commented Feb 19, 2026 •

edited

Loading

This PR needs a `release notes:` label