Skip to content

[inductor] Add _foreach_addcdiv lowering to match _foreach_addcmul#176237

Closed
mlazos wants to merge 3 commits intogh/mlazos/111/basefrom
gh/mlazos/111/head
Closed

[inductor] Add _foreach_addcdiv lowering to match _foreach_addcmul#176237
mlazos wants to merge 3 commits intogh/mlazos/111/basefrom
gh/mlazos/111/head

Conversation

@mlazos
Copy link
Copy Markdown
Contributor

@mlazos mlazos commented Mar 2, 2026

Stack from ghstack (oldest at bottom):

The decomposition for _foreach_addcdiv was removed (to preserve FMA
semantics via the addcdiv lowering), but no foreach-level lowering existed
to replace it. This caused _foreach_addcdiv to fall back to the native
CUDA kernel instead of generating a fused Triton kernel.

Add _foreach_addcdiv_scalar (and its inplace variant) mirroring the
existing _foreach_addcmul_scalar pattern.

Authored with Claude.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

[ghstack-poisoned]
@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 2, 2026

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176237

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit 7c8f112 with merge base a6beff3 (image):

FLAKY - The following job failed but was likely due to flakiness present on trunk:

BROKEN TRUNK - The following job failed but was present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@pytorch-bot
Copy link
Copy Markdown

pytorch-bot Bot commented Mar 2, 2026

This PR needs a release notes: label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

[ghstack-poisoned]
[ghstack-poisoned]
@pytorchmergebot
Copy link
Copy Markdown
Collaborator

Starting merge as part of PR stack under #174911

pytorchmergebot pushed a commit that referenced this pull request Mar 3, 2026
…ther decomp (#175839)

The FMA lowerings for addcmul/addcdiv are now unconditional (not gated by
emulate_precision_casts), but the decomposition skip in select_decomp_table()
was still gated by that config. This meant the decompositions would override
the FMA lowerings when emulate_precision_casts=False.

Make the decomp skip unconditional to match the lowerings. Also add
aten.addcmul_ (in-place) to the skip list.

Authored with Claude.

Pull Request resolved: #175839
Approved by: https://github.com/v0i0
ghstack dependencies: #176237
pytorchmergebot pushed a commit that referenced this pull request Mar 3, 2026
Add CompiledOptimizerBitwiseTests test suite that verifies compiled
optimizers produce bitwise identical results to eager when precision
configs are enabled:
- eager_numerics.division_rounding = True
- eager_numerics.pow_precision = True
- emulate_precision_casts = True

Tests cover Adam and AdamW with various configurations including
amsgrad, maximize, and weight_decay options.

Pull Request resolved: #174911
Approved by: https://github.com/v0i0
ghstack dependencies: #176237, #175839
postmath pushed a commit to postmath/pytorch that referenced this pull request Mar 3, 2026
…ytorch#176237)

The decomposition for _foreach_addcdiv was removed (to preserve FMA
semantics via the addcdiv lowering), but no foreach-level lowering existed
to replace it. This caused _foreach_addcdiv to fall back to the native
CUDA kernel instead of generating a fused Triton kernel.

Add _foreach_addcdiv_scalar (and its inplace variant) mirroring the
existing _foreach_addcmul_scalar pattern.

Authored with Claude.

Pull Request resolved: pytorch#176237
Approved by: https://github.com/v0i0, https://github.com/tianrengao
postmath pushed a commit to postmath/pytorch that referenced this pull request Mar 3, 2026
…ther decomp (pytorch#175839)

The FMA lowerings for addcmul/addcdiv are now unconditional (not gated by
emulate_precision_casts), but the decomposition skip in select_decomp_table()
was still gated by that config. This meant the decompositions would override
the FMA lowerings when emulate_precision_casts=False.

Make the decomp skip unconditional to match the lowerings. Also add
aten.addcmul_ (in-place) to the skip list.

Authored with Claude.

Pull Request resolved: pytorch#175839
Approved by: https://github.com/v0i0
ghstack dependencies: pytorch#176237
postmath pushed a commit to postmath/pytorch that referenced this pull request Mar 3, 2026
Add CompiledOptimizerBitwiseTests test suite that verifies compiled
optimizers produce bitwise identical results to eager when precision
configs are enabled:
- eager_numerics.division_rounding = True
- eager_numerics.pow_precision = True
- emulate_precision_casts = True

Tests cover Adam and AdamW with various configurations including
amsgrad, maximize, and weight_decay options.

Pull Request resolved: pytorch#174911
Approved by: https://github.com/v0i0
ghstack dependencies: pytorch#176237, pytorch#175839
sandy-gags pushed a commit to sandy-gags/pytorch that referenced this pull request Mar 12, 2026
The decomposition for _foreach_addcdiv was removed (to preserve FMA
semantics via the addcdiv lowering), but no foreach-level lowering existed
to replace it. This caused _foreach_addcdiv to fall back to the native
CUDA kernel instead of generating a fused Triton kernel.

Add _foreach_addcdiv_scalar (and its inplace variant) mirroring the
existing _foreach_addcmul_scalar pattern.

Authored with Claude.

ghstack-source-id: 4b88b97
Pull-Request: pytorch/pytorch#176237
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…ytorch#176237)

The decomposition for _foreach_addcdiv was removed (to preserve FMA
semantics via the addcdiv lowering), but no foreach-level lowering existed
to replace it. This caused _foreach_addcdiv to fall back to the native
CUDA kernel instead of generating a fused Triton kernel.

Add _foreach_addcdiv_scalar (and its inplace variant) mirroring the
existing _foreach_addcmul_scalar pattern.

Authored with Claude.

Pull Request resolved: pytorch#176237
Approved by: https://github.com/v0i0, https://github.com/tianrengao
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
…ther decomp (pytorch#175839)

The FMA lowerings for addcmul/addcdiv are now unconditional (not gated by
emulate_precision_casts), but the decomposition skip in select_decomp_table()
was still gated by that config. This meant the decompositions would override
the FMA lowerings when emulate_precision_casts=False.

Make the decomp skip unconditional to match the lowerings. Also add
aten.addcmul_ (in-place) to the skip list.

Authored with Claude.

Pull Request resolved: pytorch#175839
Approved by: https://github.com/v0i0
ghstack dependencies: pytorch#176237
EmanueleCoradin pushed a commit to EmanueleCoradin/pytorch that referenced this pull request Mar 30, 2026
Add CompiledOptimizerBitwiseTests test suite that verifies compiled
optimizers produce bitwise identical results to eager when precision
configs are enabled:
- eager_numerics.division_rounding = True
- eager_numerics.pow_precision = True
- emulate_precision_casts = True

Tests cover Adam and AdamW with various configurations including
amsgrad, maximize, and weight_decay options.

Pull Request resolved: pytorch#174911
Approved by: https://github.com/v0i0
ghstack dependencies: pytorch#176237, pytorch#175839
@github-actions github-actions Bot deleted the gh/mlazos/111/head branch April 3, 2026 02:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants