[inductor] Fix pow precision helper for fp64 inputs by mlazos · Pull Request #175268 · pytorch/pytorch

mlazos · 2026-02-18T19:36:56Z

Stack from ghstack (oldest at bottom):

The powf_cuda inline PTX helper only supports fp32 inputs. For fp64,
fall back to libdevice.pow which already matches eager exactly.

Also adds test_pow_precision_fp64 to verify fp64 pow matches eager.

Co-authored-by: Claude noreply@anthropic.com

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

[ghstack-poisoned]

pytorch-bot · 2026-02-18T19:37:00Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175268

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure

As of commit 406bf9c with merge base 197c376 ():

NEW FAILURE - The following job has failed:

inductor / unit-test / inductor-cpu-core-build (3.13) / build (gh)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-02-18T19:37:03Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

The powf_cuda inline PTX helper only supports fp32 inputs. For fp64, fall back to libdevice.pow which already matches eager exactly. Also adds test_pow_precision_fp64 to verify fp64 pow matches eager. Co-authored-by: Claude <noreply@anthropic.com> ghstack-source-id: 8334fa8 Pull-Request: #175268

[ghstack-poisoned]

The powf_cuda inline PTX helper only supports fp32 inputs. For fp64, fall back to libdevice.pow which already matches eager exactly. Also adds test_pow_precision_fp64 to verify fp64 pow matches eager. Co-authored-by: Claude <noreply@anthropic.com> ghstack-source-id: c16a2a3 Pull-Request: #175268

[ghstack-poisoned]

The powf_cuda inline PTX helper only supports fp32 inputs. For fp64, fall back to libdevice.pow which already matches eager exactly. Also adds test_pow_precision_fp64 to verify fp64 pow matches eager. Co-authored-by: Claude <noreply@anthropic.com> ghstack-source-id: c16a2a3 Pull-Request: #175268

[ghstack-poisoned]

eellison

couple easily addressable comments

eellison · 2026-02-24T16:00:17Z

+    def test_pow_precision_fp64(self):
+        # Test that pow matches eager bitwise for fp64.
+        # libdevice.pow matches CUDA's pow for fp64 (no FTZ issues).
+        def fn(base, exp):
+            return torch.pow(base, exp)
+
+        base = torch.tensor([0.9, 0.999, 0.5, 0.8], device="cuda", dtype=torch.float64)
+        exp = torch.tensor(
+            [50.0, 100.0, 10.0, 20.0], device="cuda", dtype=torch.float64
+        )


can you please, parametrize this across all dtypes ? this would be another good time to unify around @v0i0 test infra

eellison · 2026-02-24T16:01:28Z

-    @maybe_upcast_float32()
    def pow(a, b):
+        # Check dtype before potential upcast - powf_cuda only supports fp32
+        a_dtype = getattr(a, "dtype", None)


nit: isinstance(var, CSEVariable) and var.dtype in (torch.float16, torch.bfloat16) to be more explicit

eellison · 2026-02-24T16:02:39Z

+    @staticmethod
+    @maybe_upcast_float32()
+    def _pow_impl(a, b):
        if config.eager_numerics.pow_precision:


why dont we just check that the dtypes are float32 here ? could they also be integers ?

[ghstack-poisoned]

mlazos · 2026-02-26T20:27:43Z

closing - no longer needed

The powf_cuda inline PTX helper only supports fp32 inputs. For fp64, fall back to libdevice.pow which already matches eager exactly. Also adds test_pow_precision_fp64 to verify fp64 pow matches eager. Co-authored-by: Claude <noreply@anthropic.com> ghstack-source-id: 7af7efc Pull-Request: pytorch/pytorch#175268

The powf_cuda inline PTX helper only supports fp32 inputs. For fp64, fall back to libdevice.pow which already matches eager exactly. Also adds test_pow_precision_fp64 to verify fp64 pow matches eager. Co-authored-by: Claude <noreply@anthropic.com> ghstack-source-id: 98c9b3d Pull-Request: pytorch/pytorch#175268

Update

a15a318

[ghstack-poisoned]

pytorch-bot Bot added ciflow/inductor module: inductor labels Feb 18, 2026

This was referenced Feb 18, 2026

[inductor] Add FMA-based lerp lowering for CUDA parity #174749

Closed

[inductor] Use CUDA toolkit libdevice for Triton #174933

Closed

[inductor] Add inline PTX pow for bitwise CUDA parity #175227

Closed

Update

fb9abdd

[ghstack-poisoned]

mlazos requested a review from eellison February 19, 2026 02:03

mlazos added 2 commits February 18, 2026 18:11

Update

0fab79f

[ghstack-poisoned]

Update

c1bde14

[ghstack-poisoned]

This was referenced Feb 19, 2026

[inductor] Skip addcmul decomposition to enable FMA lowering #175309

Closed

[inductor] Skip addcdiv decomposition to enable FMA lowering #175310

Closed

Update

fd33c95

[ghstack-poisoned]

mlazos requested a review from v0i0 February 19, 2026 19:54

mlazos added 2 commits February 20, 2026 16:21

Update

6ce78b1

[ghstack-poisoned]

Update

b5c1d11

[ghstack-poisoned]

Update

531fab3

[ghstack-poisoned]

mlazos added 2 commits February 21, 2026 01:14

Update

2af248a

[ghstack-poisoned]

Update

f21ff42

[ghstack-poisoned]

mlazos added 3 commits February 23, 2026 15:49

Update

ae27e59

[ghstack-poisoned]

Update

6faf3a8

[ghstack-poisoned]

Update

4c29b39

[ghstack-poisoned]

mlazos added the release notes: inductor label Feb 24, 2026

Update

b0a31c9

[ghstack-poisoned]

eellison approved these changes Feb 24, 2026

View reviewed changes

Update

406bf9c

[ghstack-poisoned]

mlazos closed this Feb 26, 2026

github-actions Bot deleted the gh/mlazos/105/head branch March 29, 2026 02:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[inductor] Fix pow precision helper for fp64 inputs#175268

[inductor] Fix pow precision helper for fp64 inputs#175268
mlazos wants to merge 15 commits intogh/mlazos/105/basefrom
gh/mlazos/105/head

mlazos commented Feb 18, 2026 •

edited by pytorch-bot Bot

Loading

Uh oh!

pytorch-bot Bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Feb 18, 2026

Uh oh!

eellison left a comment

Uh oh!

eellison Feb 24, 2026

Uh oh!

eellison Feb 24, 2026

Uh oh!

eellison Feb 24, 2026

Uh oh!

mlazos commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mlazos commented Feb 18, 2026 • edited by pytorch-bot Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/175268

❌ 1 New Failure

Uh oh!

pytorch-bot Bot commented Feb 18, 2026

This PR needs a release notes: label

Uh oh!

eellison left a comment

Choose a reason for hiding this comment

Uh oh!

eellison Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

eellison Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

eellison Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

mlazos commented Feb 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mlazos commented Feb 18, 2026 •

edited by pytorch-bot Bot

Loading

pytorch-bot Bot commented Feb 18, 2026 •

edited

Loading

This PR needs a `release notes:` label