[Inductor] Add proper regression test for Voxtral compilation on MPS by malfet · Pull Request #177207 · pytorch/pytorch

malfet · 2026-03-11T22:21:26Z

Stack from ghstack (oldest at bottom):

-> [Inductor] Add proper regression test for Voxtral compilation on MPS #177207

Remove test_bfloat_constant, test_lowp_reduction, and test_lowp_where as they don't test for anything beyond what existing tests cover.
Add test_pad_after_gelu as a regression test for Voxtral compilation on MPS, exercising pad(gelu(x)) across fp32, fp16, and bfloat16.

Before #176436 test will fail with

torch._inductor.exc.InductorError: SyntaxError: failed to compile
    #include <c10/metal/utils.h>
    #include <c10/metal/special_math.h>
    kernel void generated_kernel(
        device bfloat* out_ptr0,
        constant bfloat* in_ptr0,
        uint xindex [[thread_position_in_grid]]
    ) {
        int x0 = (xindex) % (17);
        int x1 = c10::metal::floor_divide(xindex, 17);
        int x2 = xindex;
        auto tmp0 = (-1) + x0;
        auto tmp1 = static_cast<long>(tmp0);
        auto tmp2 = 0;
        auto tmp3 = tmp1 >= tmp2;
        bfloat tmp4;
        if (tmp3) {
            auto tmp_scoped_0 = static_cast<float>(in_ptr0[(-1) + x0 + 16*x1]);
            auto tmp_scoped_1 = static_cast<float>(tmp_scoped_0);
            auto tmp_scoped_2 = 0.5;
            auto tmp_scoped_3 = tmp_scoped_1 * tmp_scoped_2;
            auto tmp_scoped_4 = 0.7071067811865476;
            auto tmp_scoped_5 = tmp_scoped_1 * tmp_scoped_4;
            auto tmp_scoped_6 = c10::metal::erf(tmp_scoped_5);
            auto tmp_scoped_7 = 1.0;
            auto tmp_scoped_8 = tmp_scoped_6 + tmp_scoped_7;
            auto tmp_scoped_9 = tmp_scoped_3 * tmp_scoped_8;
            auto tmp_scoped_10 = static_cast<bfloat>(tmp_scoped_9);
            tmp4 = tmp_scoped_10;
        } else tmp4 = 0.0;
        out_ptr0[x2] = static_cast<bfloat>(tmp4);
    }
 with program_source:4495:23: error: assigning to 'bfloat' from incompatible type 'float'
        } else tmp4 = 0.0;
                      ^~~

Authored with Claude.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

[ghstack-poisoned]

pytorch-bot · 2026-03-11T22:21:30Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177207

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit bb6423e with merge base ad67e7a ():

FLAKY - The following job failed but was likely due to flakiness present on trunk:

inductor / unit-test / inductor-test / test (inductor, 1, 2, linux.g5.4xlarge.nvidia.gpu) (gh) (similar failure)
test/inductor/test_torchinductor_opinfo.py::TestInductorOpInfoCUDA::test_comprehensive_pca_lowrank_cuda_float32

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 1, 2, linux.2xlarge.amx, unstable) (gh) (#174929)
detectron2_maskrcnn_r_50_fpn

This comment was automatically generated by Dr. CI and updates every 15 minutes.

- Remove `test_bfloat_constant`, `test_lowp_reduction`, and `test_lowp_where` as they don't test for anything beyond what existing tests cover. - Add test_pad_after_gelu as a regression test for Voxtral compilation on MPS, exercising pad(gelu(x)) across fp32, fp16, and bfloat16. Before #176436 test will fail with ``` torch._inductor.exc.InductorError: SyntaxError: failed to compile #include <c10/metal/utils.h> #include <c10/metal/special_math.h> kernel void generated_kernel( device bfloat* out_ptr0, constant bfloat* in_ptr0, uint xindex [[thread_position_in_grid]] ) { int x0 = (xindex) % (17); int x1 = c10::metal::floor_divide(xindex, 17); int x2 = xindex; auto tmp0 = (-1) + x0; auto tmp1 = static_cast<long>(tmp0); auto tmp2 = 0; auto tmp3 = tmp1 >= tmp2; bfloat tmp4; if (tmp3) { auto tmp_scoped_0 = static_cast<float>(in_ptr0[(-1) + x0 + 16*x1]); auto tmp_scoped_1 = static_cast<float>(tmp_scoped_0); auto tmp_scoped_2 = 0.5; auto tmp_scoped_3 = tmp_scoped_1 * tmp_scoped_2; auto tmp_scoped_4 = 0.7071067811865476; auto tmp_scoped_5 = tmp_scoped_1 * tmp_scoped_4; auto tmp_scoped_6 = c10::metal::erf(tmp_scoped_5); auto tmp_scoped_7 = 1.0; auto tmp_scoped_8 = tmp_scoped_6 + tmp_scoped_7; auto tmp_scoped_9 = tmp_scoped_3 * tmp_scoped_8; auto tmp_scoped_10 = static_cast<bfloat>(tmp_scoped_9); tmp4 = tmp_scoped_10; } else tmp4 = 0.0; out_ptr0[x2] = static_cast<bfloat>(tmp4); } with program_source:4495:23: error: assigning to 'bfloat' from incompatible type 'float' } else tmp4 = 0.0; ^~~ ``` Authored with Claude. ghstack-source-id: 7919b53 Pull-Request: #177207

malfet · 2026-03-11T23:26:02Z

@pytorchbot fix-lint

atalman

lgtm

[ghstack-poisoned]

pytorchmergebot · 2026-03-11T23:28:02Z

Successfully applied lint patches in https://github.com/pytorch/pytorch/actions/runs/22979241446. Please pull locally before pushing more changes.

- Remove `test_bfloat_constant`, `test_lowp_reduction`, and `test_lowp_where` as they don't test for anything beyond what existing tests cover. - Add test_pad_after_gelu as a regression test for Voxtral compilation on MPS, exercising pad(gelu(x)) across fp32, fp16, and bfloat16. Before #176436 test will fail with ``` torch._inductor.exc.InductorError: SyntaxError: failed to compile #include <c10/metal/utils.h> #include <c10/metal/special_math.h> kernel void generated_kernel( device bfloat* out_ptr0, constant bfloat* in_ptr0, uint xindex [[thread_position_in_grid]] ) { int x0 = (xindex) % (17); int x1 = c10::metal::floor_divide(xindex, 17); int x2 = xindex; auto tmp0 = (-1) + x0; auto tmp1 = static_cast<long>(tmp0); auto tmp2 = 0; auto tmp3 = tmp1 >= tmp2; bfloat tmp4; if (tmp3) { auto tmp_scoped_0 = static_cast<float>(in_ptr0[(-1) + x0 + 16*x1]); auto tmp_scoped_1 = static_cast<float>(tmp_scoped_0); auto tmp_scoped_2 = 0.5; auto tmp_scoped_3 = tmp_scoped_1 * tmp_scoped_2; auto tmp_scoped_4 = 0.7071067811865476; auto tmp_scoped_5 = tmp_scoped_1 * tmp_scoped_4; auto tmp_scoped_6 = c10::metal::erf(tmp_scoped_5); auto tmp_scoped_7 = 1.0; auto tmp_scoped_8 = tmp_scoped_6 + tmp_scoped_7; auto tmp_scoped_9 = tmp_scoped_3 * tmp_scoped_8; auto tmp_scoped_10 = static_cast<bfloat>(tmp_scoped_9); tmp4 = tmp_scoped_10; } else tmp4 = 0.0; out_ptr0[x2] = static_cast<bfloat>(tmp4); } with program_source:4495:23: error: assigning to 'bfloat' from incompatible type 'float' } else tmp4 = 0.0; ^~~ ``` Authored with Claude. ghstack-source-id: f075662 Pull-Request: #177207

malfet · 2026-03-12T15:01:07Z

@pytorchbot merge -f "I do have enough signal on this one"

pytorchmergebot · 2026-03-12T15:03:05Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

…ytorch#177207) ---- - Remove `test_bfloat_constant`, `test_lowp_reduction`, and `test_lowp_where` as they don't test for anything beyond what existing tests cover. - Add `test_pad_after_gelu` as a regression test for Voxtral compilation on MPS, exercising pad(gelu(x)) across fp32, fp16, and bfloat16. Before pytorch#176436 test will fail with ``` torch._inductor.exc.InductorError: SyntaxError: failed to compile #include <c10/metal/utils.h> #include <c10/metal/special_math.h> kernel void generated_kernel( device bfloat* out_ptr0, constant bfloat* in_ptr0, uint xindex [[thread_position_in_grid]] ) { int x0 = (xindex) % (17); int x1 = c10::metal::floor_divide(xindex, 17); int x2 = xindex; auto tmp0 = (-1) + x0; auto tmp1 = static_cast<long>(tmp0); auto tmp2 = 0; auto tmp3 = tmp1 >= tmp2; bfloat tmp4; if (tmp3) { auto tmp_scoped_0 = static_cast<float>(in_ptr0[(-1) + x0 + 16*x1]); auto tmp_scoped_1 = static_cast<float>(tmp_scoped_0); auto tmp_scoped_2 = 0.5; auto tmp_scoped_3 = tmp_scoped_1 * tmp_scoped_2; auto tmp_scoped_4 = 0.7071067811865476; auto tmp_scoped_5 = tmp_scoped_1 * tmp_scoped_4; auto tmp_scoped_6 = c10::metal::erf(tmp_scoped_5); auto tmp_scoped_7 = 1.0; auto tmp_scoped_8 = tmp_scoped_6 + tmp_scoped_7; auto tmp_scoped_9 = tmp_scoped_3 * tmp_scoped_8; auto tmp_scoped_10 = static_cast<bfloat>(tmp_scoped_9); tmp4 = tmp_scoped_10; } else tmp4 = 0.0; out_ptr0[x2] = static_cast<bfloat>(tmp4); } with program_source:4495:23: error: assigning to 'bfloat' from incompatible type 'float' } else tmp4 = 0.0; ^~~ ``` Authored with Claude. Pull Request resolved: pytorch#177207 Approved by: https://github.com/atalman, https://github.com/mergennachin, https://github.com/jansel

Update

9e7f104

[ghstack-poisoned]

pytorch-bot bot added ciflow/inductor module: inductor topic: not user facing topic category labels Mar 11, 2026

malfet added the ciflow/mps Run MPS tests (subset of trunk) label Mar 11, 2026

malfet requested review from jansel and mergennachin March 11, 2026 22:22

atalman approved these changes Mar 11, 2026

View reviewed changes

Update

bb6423e

[ghstack-poisoned]

mergennachin approved these changes Mar 12, 2026

View reviewed changes

jansel approved these changes Mar 12, 2026

View reviewed changes

pytorchmergebot added the merging label Mar 12, 2026

pytorchmergebot added the Merged label Mar 12, 2026

pytorchmergebot closed this in 847ae93 Mar 12, 2026

pytorchmergebot removed the merging label Mar 12, 2026

malfet added autorevert: disable Disable autorevert for a specific PR and removed autorevert: disable Disable autorevert for a specific PR labels Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Inductor] Add proper regression test for Voxtral compilation on MPS#177207

[Inductor] Add proper regression test for Voxtral compilation on MPS#177207
malfet wants to merge 2 commits intogh/malfet/765/basefrom
gh/malfet/765/head

malfet commented Mar 11, 2026 •

edited by pytorchmergebot

Loading

Uh oh!

pytorch-bot bot commented Mar 11, 2026 •

edited

Loading

Uh oh!

malfet commented Mar 11, 2026

Uh oh!

atalman left a comment

Uh oh!

pytorchmergebot commented Mar 11, 2026

Uh oh!

malfet commented Mar 12, 2026

Uh oh!

pytorchmergebot commented Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

malfet commented Mar 11, 2026 • edited by pytorchmergebot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/177207

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

malfet commented Mar 11, 2026

Uh oh!

atalman left a comment

Choose a reason for hiding this comment

Uh oh!

pytorchmergebot commented Mar 11, 2026

Uh oh!

malfet commented Mar 12, 2026

Uh oh!

pytorchmergebot commented Mar 12, 2026

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

malfet commented Mar 11, 2026 •

edited by pytorchmergebot

Loading

pytorch-bot bot commented Mar 11, 2026 •

edited

Loading