[inductor] Add _foreach_addcdiv lowering to match _foreach_addcmul by mlazos · Pull Request #176237 · pytorch/pytorch

mlazos · 2026-03-02T23:22:21Z

Stack from ghstack (oldest at bottom):

The decomposition for _foreach_addcdiv was removed (to preserve FMA
semantics via the addcdiv lowering), but no foreach-level lowering existed
to replace it. This caused _foreach_addcdiv to fall back to the native
CUDA kernel instead of generating a fused Triton kernel.

Add _foreach_addcdiv_scalar (and its inplace variant) mirroring the
existing _foreach_addcmul_scalar pattern.

Authored with Claude.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

[ghstack-poisoned]

pytorch-bot · 2026-03-02T23:22:25Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176237

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 7c8f112 with merge base a6beff3 ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor / unit-test / inductor-test / test (inductor, 2, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (disabled by #168207)
test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_linalg_lu_factor_ex_cuda_float32

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

inductor / inductor-test / test (inductor_torchbench, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (trunk failure)
drq

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 1, 2, linux.2xlarge.amx, unstable) (gh) (#174929)
detectron2_maskrcnn_r_50_fpn

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-03-02T23:22:28Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

[ghstack-poisoned]

pytorchmergebot · 2026-03-03T07:16:29Z

Starting merge as part of PR stack under #174911

…ther decomp (#175839) The FMA lowerings for addcmul/addcdiv are now unconditional (not gated by emulate_precision_casts), but the decomposition skip in select_decomp_table() was still gated by that config. This meant the decompositions would override the FMA lowerings when emulate_precision_casts=False. Make the decomp skip unconditional to match the lowerings. Also add aten.addcmul_ (in-place) to the skip list. Authored with Claude. Pull Request resolved: #175839 Approved by: https://github.com/v0i0 ghstack dependencies: #176237

Add CompiledOptimizerBitwiseTests test suite that verifies compiled optimizers produce bitwise identical results to eager when precision configs are enabled: - eager_numerics.division_rounding = True - eager_numerics.pow_precision = True - emulate_precision_casts = True Tests cover Adam and AdamW with various configurations including amsgrad, maximize, and weight_decay options. Pull Request resolved: #174911 Approved by: https://github.com/v0i0 ghstack dependencies: #176237, #175839

…ytorch#176237) The decomposition for _foreach_addcdiv was removed (to preserve FMA semantics via the addcdiv lowering), but no foreach-level lowering existed to replace it. This caused _foreach_addcdiv to fall back to the native CUDA kernel instead of generating a fused Triton kernel. Add _foreach_addcdiv_scalar (and its inplace variant) mirroring the existing _foreach_addcmul_scalar pattern. Authored with Claude. Pull Request resolved: pytorch#176237 Approved by: https://github.com/v0i0, https://github.com/tianrengao

…ther decomp (pytorch#175839) The FMA lowerings for addcmul/addcdiv are now unconditional (not gated by emulate_precision_casts), but the decomposition skip in select_decomp_table() was still gated by that config. This meant the decompositions would override the FMA lowerings when emulate_precision_casts=False. Make the decomp skip unconditional to match the lowerings. Also add aten.addcmul_ (in-place) to the skip list. Authored with Claude. Pull Request resolved: pytorch#175839 Approved by: https://github.com/v0i0 ghstack dependencies: pytorch#176237

Add CompiledOptimizerBitwiseTests test suite that verifies compiled optimizers produce bitwise identical results to eager when precision configs are enabled: - eager_numerics.division_rounding = True - eager_numerics.pow_precision = True - emulate_precision_casts = True Tests cover Adam and AdamW with various configurations including amsgrad, maximize, and weight_decay options. Pull Request resolved: pytorch#174911 Approved by: https://github.com/v0i0 ghstack dependencies: pytorch#176237, pytorch#175839

The decomposition for _foreach_addcdiv was removed (to preserve FMA semantics via the addcdiv lowering), but no foreach-level lowering existed to replace it. This caused _foreach_addcdiv to fall back to the native CUDA kernel instead of generating a fused Triton kernel. Add _foreach_addcdiv_scalar (and its inplace variant) mirroring the existing _foreach_addcmul_scalar pattern. Authored with Claude. ghstack-source-id: 4b88b97 Pull-Request: pytorch/pytorch#176237

…ytorch#176237) The decomposition for _foreach_addcdiv was removed (to preserve FMA semantics via the addcdiv lowering), but no foreach-level lowering existed to replace it. This caused _foreach_addcdiv to fall back to the native CUDA kernel instead of generating a fused Triton kernel. Add _foreach_addcdiv_scalar (and its inplace variant) mirroring the existing _foreach_addcmul_scalar pattern. Authored with Claude. Pull Request resolved: pytorch#176237 Approved by: https://github.com/v0i0, https://github.com/tianrengao

…ther decomp (pytorch#175839) The FMA lowerings for addcmul/addcdiv are now unconditional (not gated by emulate_precision_casts), but the decomposition skip in select_decomp_table() was still gated by that config. This meant the decompositions would override the FMA lowerings when emulate_precision_casts=False. Make the decomp skip unconditional to match the lowerings. Also add aten.addcmul_ (in-place) to the skip list. Authored with Claude. Pull Request resolved: pytorch#175839 Approved by: https://github.com/v0i0 ghstack dependencies: pytorch#176237

Add CompiledOptimizerBitwiseTests test suite that verifies compiled optimizers produce bitwise identical results to eager when precision configs are enabled: - eager_numerics.division_rounding = True - eager_numerics.pow_precision = True - emulate_precision_casts = True Tests cover Adam and AdamW with various configurations including amsgrad, maximize, and weight_decay options. Pull Request resolved: pytorch#174911 Approved by: https://github.com/v0i0 ghstack dependencies: pytorch#176237, pytorch#175839

Update

45b2e2b

[ghstack-poisoned]

pytorch-bot Bot added ciflow/inductor module: inductor labels Mar 2, 2026

This was referenced Mar 2, 2026

[inductor] Make addcmul/addcdiv decomp skip unconditional and add another decomp #175839

Closed

[inductor] Add bitwise tests for compiled Adam/AdamW #174911

Closed

v0i0 approved these changes Mar 2, 2026

View reviewed changes

mlazos added the release notes: inductor label Mar 2, 2026

Update

1deeaac

[ghstack-poisoned]

tianrengao approved these changes Mar 2, 2026

View reviewed changes

Update

7c8f112

[ghstack-poisoned]

pytorchmergebot closed this in b87b807 Mar 3, 2026

pytorchmergebot added the Merged label Mar 3, 2026

github-actions Bot deleted the gh/mlazos/111/head branch April 3, 2026 02:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor] Add _foreach_addcdiv lowering to match _foreach_addcmul#176237

[inductor] Add _foreach_addcdiv lowering to match _foreach_addcmul#176237
mlazos wants to merge 3 commits intogh/mlazos/111/basefrom
gh/mlazos/111/head

mlazos commented Mar 2, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Mar 2, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Mar 2, 2026

Uh oh!

pytorchmergebot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mlazos commented Mar 2, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Mar 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176237

✅ You can merge normally! (3 Unrelated Failures)

Uh oh!

pytorch-bot Bot commented Mar 2, 2026

This PR needs a release notes: label

Uh oh!

pytorchmergebot commented Mar 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mlazos commented Mar 2, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Mar 2, 2026 •

edited

Loading

This PR needs a `release notes:` label