[Inductor][MPS] Fix half-precision type mismatches in Metal shader codegen by mergennachin · Pull Request #176436 · pytorch/pytorch

mergennachin · 2026-03-04T16:49:21Z

Metal Shading Language rejects implicit float-to-bfloat conversions, so
bare float literals like 0.0 in generated shaders cause compilation
failures when the target variable is bfloat (or half). Three codegen
methods were affected:

constant() ignored its dtype parameter and returned raw literals.
masked() assigned a bare literal in the else-branch (} else tmp = 0.0;).
where() passed a bare literal through the ternary without casting.

All three now emit static_cast<bfloat>(...) / static_cast<half>(...)
where needed. Tests added for half-precision constants, reductions, and
conditionals.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @kadeng @muchulee8 @amjames @chauhang @aakhundov @coconutruben @jataylo

pytorch-bot · 2026-03-04T16:49:26Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176436

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit cc0589b with merge base e45dfba ():

BROKEN TRUNK - The following job failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

trunk / linux-jammy-rocm-py3.10 / test (default, 2, 6, linux.rocm.gpu.gfx950.1) (gh) (trunk failure)
test/inductor/test_gpu_cpp_wrapper.py::TestGpuWrapper::test_profiler_mark_wrapper_call_cuda_gpu_wrapper

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

inductor / inductor-cpu-test / test (cpu_inductor_torchbench, 1, 2, linux.2xlarge.amx, unstable) (gh) (#174929)
detectron2_maskrcnn_r_50_fpn

This comment was automatically generated by Dr. CI and updates every 15 minutes.

pytorch-bot · 2026-03-04T16:49:29Z

This PR needs a `release notes:` label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "topic: not user facing"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

malfet · 2026-03-05T00:25:48Z

Hmm, there are indeed this quirk present, but only for bfloat16 (which isn't the real type as far as I understand the HW architecture)

>>> torch.mps.compile_shader("kernel void foo(device bfloat &x) { x = 0.0;}")
Traceback (most recent call last):
  File "<python-input-7>", line 1, in <module>
    torch.mps.compile_shader("kernel void foo(device bfloat &x) { x = 0.0;}")
    ~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/nshulga/git/pytorch/pytorch/torch/mps/__init__.py", line 163, in compile_shader
    return torch._C._mps_compileShader(source)
           ~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^
SyntaxError: program_source:1:41: error: assigning to 'bfloat' from incompatible type 'float'
kernel void foo(device bfloat &x) { x = 0.0;}

malfet · 2026-03-05T00:19:15Z

test/inductor/test_mps_basic.py

    [torch.bfloat16] if MACOS_VERSION < 14.0 else []
 )
 MPS_DTYPES = [t for t in get_all_dtypes() if t not in MPS_UNSUPPORTED_TYPES]
+MPS_HALF_DTYPES = [torch.float16] + ([torch.bfloat16] if MACOS_VERSION >= 14.0 else [])


We(And Apple) don't support MacOS-13 anymore. But I believe this list already exists

Suggested change

MPS_HALF_DTYPES = [torch.float16] + ([torch.bfloat16] if MACOS_VERSION >= 14.0 else [])

MPS_HALF_DTYPES = [torch.float16, torch.bfloat16]

malfet · 2026-03-05T00:19:58Z

test/inductor/test_mps_basic.py

+        )
+
+    @parametrize("dtype", MPS_HALF_DTYPES)
+    def test_half_masked(self, dtype):


Err, why does it called test_half_masked? when you call sum?

malfet · 2026-03-05T00:30:27Z

torch/_inductor/codegen/mps.py

-        return value_to_metal(val)
+        raw = value_to_metal(val)
+        if (
+            dtype in (torch.bfloat16, torch.float16)


Do you know if this condition is ever hit? (I would have checked that one by adding an assert and running test suite)

But I would argue, that math op should return float values, and only downcast to low precision dtype when it's written out (as opmath_t<bfloat> == float)

Yes, the condition is hit. Verified empirically x + 1.0 with bf16 input calls constant(val=1.0, dtype=torch.bfloat16). Same for x * 2, x - 0.5, and torch.where with scalar constants.

You're right that math ops should return float values. The current fix does a pointless round-trip: the literal is float, we cast it to bfloat, then Metal implicitly promotes it back to float for arithmetic (since loads are already upcast to float32). I'll drop the cast in constant() and just return the raw literal. It's already float-compatible and matches the float32 compute dtype established by load().

malfet · 2026-03-05T00:31:08Z

torch/_inductor/codegen/mps.py

@@ -244,12 +251,18 @@ def masked(mask: CSEVariable, body: sympy.Expr, other: CSEVariable) -> str:
            with V.kernel.compute.indent():
                V.kernel.compute.splice(scoped_body)
                V.kernel.compute.writeline(f"{var} = {rc};")


Don't you need to do it here as well?

malfet · 2026-03-05T00:31:44Z

torch/_inductor/codegen/mps.py

                V.kernel.compute.writeline(f"{var} = {rc};")
-            V.kernel.compute.writeline(f"}} else {var} = {other_str};")
+            V.kernel.compute.writeline(
+                f"}} else {var} = static_cast<{DTYPE_TO_METAL[rc.dtype]}>({other_str});"


Nit (rc.dtype is often wrong/undefined)

Suggested change

f"}} else {var} = static_cast<{DTYPE_TO_METAL[rc.dtype]}>({other_str});"

f"}} else {var} = static_cast<decltype(var)>({other_str});"

malfet · 2026-03-05T00:33:18Z

torch/_inductor/codegen/mps.py

+        c_str = value_to_metal(c)
+        if isinstance(b, CSEVariable) and b.dtype in (torch.bfloat16, torch.float16):
+            assert b.dtype is not None
+            c_str = f"static_cast<{DTYPE_TO_METAL[b.dtype]}>({c_str})"
+        return f"{a} ? {b} : {c_str}"


Nit (i.e. always leave it to compiler rather than codegen)

Suggested change

c_str = value_to_metal(c)

if isinstance(b, CSEVariable) and b.dtype in (torch.bfloat16, torch.float16):

assert b.dtype is not None

c_str = f"static_cast<{DTYPE_TO_METAL[b.dtype]}>({c_str})"

return f"{a} ? {b} : {c_str}"

return f"{a} ? {b} : static_cast<decltype({a})>({value_to_metal(c)})"

malfet · 2026-03-06T20:25:17Z

test/inductor/test_torchinductor.py

+    def test_half_constant(self):
+        for dtype in [torch.float16, torch.bfloat16]:
+            if not self.is_dtype_supported(dtype):
+                continue
+            self.common(
+                lambda x: x + 1.0,
+                (make_tensor(1024, dtype=dtype, device=self.device),),
+                check_lowp=False,
+            )


Nit (check_lowp indeed checks for torch.half)

Suggested change

def test_half_constant(self):

for dtype in [torch.float16, torch.bfloat16]:

if not self.is_dtype_supported(dtype):

continue

self.common(

lambda x: x + 1.0,

(make_tensor(1024, dtype=dtype, device=self.device),),

check_lowp=False,

)

def test_bfloat_constant(self):

if not self.is_dtype_supported(torch.bfloat16):

continue

self.common(

lambda x: x + 1.0,

(make_tensor(1024, dtype=torch.bfloat16, device=self.device),),

)

malfet · 2026-03-06T20:26:00Z

test/inductor/test_torchinductor.py

+    def test_half_reduction(self):
+        for dtype in [torch.float16, torch.bfloat16]:
+            if not self.is_dtype_supported(dtype):
+                continue
+            self.common(
+                lambda x: x.sum(),
+                (make_tensor(1024, dtype=dtype, device=self.device),),
+                check_lowp=False,
+            )


Alternatively use paramterize

Suggested change

def test_half_reduction(self):

for dtype in [torch.float16, torch.bfloat16]:

if not self.is_dtype_supported(dtype):

continue

self.common(

lambda x: x.sum(),

(make_tensor(1024, dtype=dtype, device=self.device),),

check_lowp=False,

)

@paramtertize(dtype, [torch.float16, torch.bfloat16])

def test_lowp_reduction(self, dtype):

if not self.is_dtype_supported(dtype):

continue

self.common(

lambda x: x.sum(),

(make_tensor(1024, dtype=dtype, device=self.device),),

check_lowp=False,

)

mergennachin · 2026-03-06T22:19:33Z

@pytorchbot merge

pytorchmergebot · 2026-03-06T22:22:13Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-03-06T23:25:47Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / linux-jammy-cuda13.0-py3.10-gcc11 / test (default, 1, 5, lf.linux.g6.4xlarge.experimental.nvidia.gpu)

Details for Dev Infra team

Raised by workflow job

malfet · 2026-03-06T23:44:10Z

@pytorchbot merge -f "Lint + MPS are green, hopefully other failures are just broken trunk"

pytorchmergebot · 2026-03-06T23:46:05Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-03-08T12:39:37Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2026-03-08T13:54:00Z

Merge failed

Reason: 1 jobs have failed, first few of them are: trunk / macos-py3-arm64 / test (default, 2, 3, macos-m1-stable)

Details for Dev Infra team

Raised by workflow job

…degen Metal Shading Language rejects implicit float-to-bfloat conversions, so bare float literals like `0.0` in generated shaders cause compilation failures when the target variable is `bfloat` (or `half`). Three codegen methods were affected: - `constant()` ignored its `dtype` parameter and returned raw literals. - `masked()` assigned a bare literal in the else-branch (`} else tmp = 0.0;`). - `where()` passed a bare literal through the ternary without casting. All three now emit `static_cast<bfloat>(...)` / `static_cast<half>(...)` where needed. Tests added for half-precision constants, reductions, and conditionals.

mergennachin · 2026-03-09T15:45:30Z

@pytorchbot merge

pytorchmergebot · 2026-03-09T15:47:59Z

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

mergennachin · 2026-03-10T01:32:09Z

@pytorchbot cherry-pick --onto release/2.11 -c critical --fixes "Bug fix for MPS backend using inductor"

pytorchbot · 2026-03-10T01:37:21Z

Cherry picking #176436

Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 3b161e7a756798e6eb1ab096f4ef1232d163a68d returned non-zero exit code 1

Auto-merging test/inductor/test_torchinductor.py
CONFLICT (content): Merge conflict in test/inductor/test_torchinductor.py
Auto-merging torch/_inductor/codegen/mps.py
error: could not apply 3b161e7a756... [Inductor][MPS] Fix half-precision type mismatches in Metal shader codegen (#176436)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"

Details for Dev Infra team

Raised by workflow job

…degen (#176436) Metal Shading Language rejects implicit float-to-bfloat conversions, so bare float literals like `0.0` in generated shaders cause compilation failures when the target variable is `bfloat` (or `half`). Three codegen methods were affected: - `constant()` ignored its `dtype` parameter and returned raw literals. - `masked()` assigned a bare literal in the else-branch (`} else tmp = 0.0;`). - `where()` passed a bare literal through the ternary without casting. All three now emit `static_cast<bfloat>(...)` / `static_cast<half>(...)` where needed. Tests added for half-precision constants, reductions, and conditionals. Pull Request resolved: #176436 Approved by: https://github.com/malfet Test plan: Run `python -c "import torch;F=torch.nn.functional;print(torch.compile(lambda x: F.pad(F.gelu(x), [1, 0]))(torch.randn(4, device='mps', dtype=torch.bfloat16)))"` (cherry picked from commit 3b161e7)

…degen (#176436) (#177193) Metal Shading Language rejects implicit float-to-bfloat conversions, so bare float literals like `0.0` in generated shaders cause compilation failures when the target variable is `bfloat` (or `half`). Three codegen methods were affected: - `constant()` ignored its `dtype` parameter and returned raw literals. - `masked()` assigned a bare literal in the else-branch (`} else tmp = 0.0;`). - `where()` passed a bare literal through the ternary without casting. All three now emit `static_cast<bfloat>(...)` / `static_cast<half>(...)` where needed. Tests added for half-precision constants, reductions, and conditionals. Pull Request resolved: #176436 Approved by: https://github.com/malfet Test plan: Run `python -c "import torch;F=torch.nn.functional;print(torch.compile(lambda x: F.pad(F.gelu(x), [1, 0]))(torch.randn(4, device='mps', dtype=torch.bfloat16)))"` (cherry picked from commit 3b161e7) Co-authored-by: Mergen Nachin <mnachin@meta.com>

- Remove `test_bfloat_constant`, `test_lowp_reduction`, and `test_lowp_where` as they don't test for anything beyond what existing tests cover. - Add test_pad_after_gelu as a regression test for Voxtral compilation on MPS, exercising pad(gelu(x)) across fp32, fp16, and bfloat16. Before #176436 test will fail with ``` torch._inductor.exc.InductorError: SyntaxError: failed to compile #include <c10/metal/utils.h> #include <c10/metal/special_math.h> kernel void generated_kernel( device bfloat* out_ptr0, constant bfloat* in_ptr0, uint xindex [[thread_position_in_grid]] ) { int x0 = (xindex) % (17); int x1 = c10::metal::floor_divide(xindex, 17); int x2 = xindex; auto tmp0 = (-1) + x0; auto tmp1 = static_cast<long>(tmp0); auto tmp2 = 0; auto tmp3 = tmp1 >= tmp2; bfloat tmp4; if (tmp3) { auto tmp_scoped_0 = static_cast<float>(in_ptr0[(-1) + x0 + 16*x1]); auto tmp_scoped_1 = static_cast<float>(tmp_scoped_0); auto tmp_scoped_2 = 0.5; auto tmp_scoped_3 = tmp_scoped_1 * tmp_scoped_2; auto tmp_scoped_4 = 0.7071067811865476; auto tmp_scoped_5 = tmp_scoped_1 * tmp_scoped_4; auto tmp_scoped_6 = c10::metal::erf(tmp_scoped_5); auto tmp_scoped_7 = 1.0; auto tmp_scoped_8 = tmp_scoped_6 + tmp_scoped_7; auto tmp_scoped_9 = tmp_scoped_3 * tmp_scoped_8; auto tmp_scoped_10 = static_cast<bfloat>(tmp_scoped_9); tmp4 = tmp_scoped_10; } else tmp4 = 0.0; out_ptr0[x2] = static_cast<bfloat>(tmp4); } with program_source:4495:23: error: assigning to 'bfloat' from incompatible type 'float' } else tmp4 = 0.0; ^~~ ``` Authored with Claude. ghstack-source-id: 7919b53 Pull-Request: #177207

- Remove `test_bfloat_constant`, `test_lowp_reduction`, and `test_lowp_where` as they don't test for anything beyond what existing tests cover. - Add test_pad_after_gelu as a regression test for Voxtral compilation on MPS, exercising pad(gelu(x)) across fp32, fp16, and bfloat16. Before #176436 test will fail with ``` torch._inductor.exc.InductorError: SyntaxError: failed to compile #include <c10/metal/utils.h> #include <c10/metal/special_math.h> kernel void generated_kernel( device bfloat* out_ptr0, constant bfloat* in_ptr0, uint xindex [[thread_position_in_grid]] ) { int x0 = (xindex) % (17); int x1 = c10::metal::floor_divide(xindex, 17); int x2 = xindex; auto tmp0 = (-1) + x0; auto tmp1 = static_cast<long>(tmp0); auto tmp2 = 0; auto tmp3 = tmp1 >= tmp2; bfloat tmp4; if (tmp3) { auto tmp_scoped_0 = static_cast<float>(in_ptr0[(-1) + x0 + 16*x1]); auto tmp_scoped_1 = static_cast<float>(tmp_scoped_0); auto tmp_scoped_2 = 0.5; auto tmp_scoped_3 = tmp_scoped_1 * tmp_scoped_2; auto tmp_scoped_4 = 0.7071067811865476; auto tmp_scoped_5 = tmp_scoped_1 * tmp_scoped_4; auto tmp_scoped_6 = c10::metal::erf(tmp_scoped_5); auto tmp_scoped_7 = 1.0; auto tmp_scoped_8 = tmp_scoped_6 + tmp_scoped_7; auto tmp_scoped_9 = tmp_scoped_3 * tmp_scoped_8; auto tmp_scoped_10 = static_cast<bfloat>(tmp_scoped_9); tmp4 = tmp_scoped_10; } else tmp4 = 0.0; out_ptr0[x2] = static_cast<bfloat>(tmp4); } with program_source:4495:23: error: assigning to 'bfloat' from incompatible type 'float' } else tmp4 = 0.0; ^~~ ``` Authored with Claude. ghstack-source-id: f075662 Pull-Request: #177207

…177207) ---- - Remove `test_bfloat_constant`, `test_lowp_reduction`, and `test_lowp_where` as they don't test for anything beyond what existing tests cover. - Add `test_pad_after_gelu` as a regression test for Voxtral compilation on MPS, exercising pad(gelu(x)) across fp32, fp16, and bfloat16. Before #176436 test will fail with ``` torch._inductor.exc.InductorError: SyntaxError: failed to compile #include <c10/metal/utils.h> #include <c10/metal/special_math.h> kernel void generated_kernel( device bfloat* out_ptr0, constant bfloat* in_ptr0, uint xindex [[thread_position_in_grid]] ) { int x0 = (xindex) % (17); int x1 = c10::metal::floor_divide(xindex, 17); int x2 = xindex; auto tmp0 = (-1) + x0; auto tmp1 = static_cast<long>(tmp0); auto tmp2 = 0; auto tmp3 = tmp1 >= tmp2; bfloat tmp4; if (tmp3) { auto tmp_scoped_0 = static_cast<float>(in_ptr0[(-1) + x0 + 16*x1]); auto tmp_scoped_1 = static_cast<float>(tmp_scoped_0); auto tmp_scoped_2 = 0.5; auto tmp_scoped_3 = tmp_scoped_1 * tmp_scoped_2; auto tmp_scoped_4 = 0.7071067811865476; auto tmp_scoped_5 = tmp_scoped_1 * tmp_scoped_4; auto tmp_scoped_6 = c10::metal::erf(tmp_scoped_5); auto tmp_scoped_7 = 1.0; auto tmp_scoped_8 = tmp_scoped_6 + tmp_scoped_7; auto tmp_scoped_9 = tmp_scoped_3 * tmp_scoped_8; auto tmp_scoped_10 = static_cast<bfloat>(tmp_scoped_9); tmp4 = tmp_scoped_10; } else tmp4 = 0.0; out_ptr0[x2] = static_cast<bfloat>(tmp4); } with program_source:4495:23: error: assigning to 'bfloat' from incompatible type 'float' } else tmp4 = 0.0; ^~~ ``` Authored with Claude. Pull Request resolved: #177207 Approved by: https://github.com/atalman, https://github.com/mergennachin, https://github.com/jansel

…degen (pytorch#176436) Metal Shading Language rejects implicit float-to-bfloat conversions, so bare float literals like `0.0` in generated shaders cause compilation failures when the target variable is `bfloat` (or `half`). Three codegen methods were affected: - `constant()` ignored its `dtype` parameter and returned raw literals. - `masked()` assigned a bare literal in the else-branch (`} else tmp = 0.0;`). - `where()` passed a bare literal through the ternary without casting. All three now emit `static_cast<bfloat>(...)` / `static_cast<half>(...)` where needed. Tests added for half-precision constants, reductions, and conditionals. Pull Request resolved: pytorch#176436 Approved by: https://github.com/malfet

…hader codegen (pytorch#176436)" This reverts commit 4926192. Reverted pytorch#176436 on behalf of https://github.com/zou3519 due to sorry I need to revert this in order to revert pytorch#176606 ([comment](pytorch#176436 (comment)))

…degen (pytorch#176436) Metal Shading Language rejects implicit float-to-bfloat conversions, so bare float literals like `0.0` in generated shaders cause compilation failures when the target variable is `bfloat` (or `half`). Three codegen methods were affected: - `constant()` ignored its `dtype` parameter and returned raw literals. - `masked()` assigned a bare literal in the else-branch (`} else tmp = 0.0;`). - `where()` passed a bare literal through the ternary without casting. All three now emit `static_cast<bfloat>(...)` / `static_cast<half>(...)` where needed. Tests added for half-precision constants, reductions, and conditionals. Pull Request resolved: pytorch#176436 Approved by: https://github.com/malfet

…ytorch#177207) ---- - Remove `test_bfloat_constant`, `test_lowp_reduction`, and `test_lowp_where` as they don't test for anything beyond what existing tests cover. - Add `test_pad_after_gelu` as a regression test for Voxtral compilation on MPS, exercising pad(gelu(x)) across fp32, fp16, and bfloat16. Before pytorch#176436 test will fail with ``` torch._inductor.exc.InductorError: SyntaxError: failed to compile #include <c10/metal/utils.h> #include <c10/metal/special_math.h> kernel void generated_kernel( device bfloat* out_ptr0, constant bfloat* in_ptr0, uint xindex [[thread_position_in_grid]] ) { int x0 = (xindex) % (17); int x1 = c10::metal::floor_divide(xindex, 17); int x2 = xindex; auto tmp0 = (-1) + x0; auto tmp1 = static_cast<long>(tmp0); auto tmp2 = 0; auto tmp3 = tmp1 >= tmp2; bfloat tmp4; if (tmp3) { auto tmp_scoped_0 = static_cast<float>(in_ptr0[(-1) + x0 + 16*x1]); auto tmp_scoped_1 = static_cast<float>(tmp_scoped_0); auto tmp_scoped_2 = 0.5; auto tmp_scoped_3 = tmp_scoped_1 * tmp_scoped_2; auto tmp_scoped_4 = 0.7071067811865476; auto tmp_scoped_5 = tmp_scoped_1 * tmp_scoped_4; auto tmp_scoped_6 = c10::metal::erf(tmp_scoped_5); auto tmp_scoped_7 = 1.0; auto tmp_scoped_8 = tmp_scoped_6 + tmp_scoped_7; auto tmp_scoped_9 = tmp_scoped_3 * tmp_scoped_8; auto tmp_scoped_10 = static_cast<bfloat>(tmp_scoped_9); tmp4 = tmp_scoped_10; } else tmp4 = 0.0; out_ptr0[x2] = static_cast<bfloat>(tmp4); } with program_source:4495:23: error: assigning to 'bfloat' from incompatible type 'float' } else tmp4 = 0.0; ^~~ ``` Authored with Claude. Pull Request resolved: pytorch#177207 Approved by: https://github.com/atalman, https://github.com/mergennachin, https://github.com/jansel

pytorch-bot bot added ciflow/inductor ciflow/mps Run MPS tests (subset of trunk) module: inductor labels Mar 4, 2026

mergennachin requested review from angelayi, malfet, manuelcandales and metascroy March 4, 2026 16:49

mergennachin force-pushed the mps_bf16 branch from a12bf2d to 9ee0114 Compare March 4, 2026 19:00

mergennachin added topic: bug fixes topic category release notes: mps Release notes category labels Mar 4, 2026

malfet reviewed Mar 5, 2026

View reviewed changes

malfet approved these changes Mar 6, 2026

View reviewed changes

mergennachin force-pushed the mps_bf16 branch from 4cbec8c to 9b6063a Compare March 6, 2026 22:16

pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Mar 6, 2026

pytorchmergebot added the merging label Mar 6, 2026

pytorchmergebot removed the merging label Mar 6, 2026

malfet added this to the 2.11.0 milestone Mar 6, 2026

malfet added the module: regression It used to work, and now it doesn't label Mar 6, 2026

pytorchmergebot added the merging label Mar 6, 2026

pytorchmergebot closed this in 4926192 Mar 6, 2026

pytorchmergebot added the Merged label Mar 6, 2026

pytorchmergebot added the merging label Mar 8, 2026

pytorchmergebot removed the merging label Mar 8, 2026

mergennachin added 2 commits March 9, 2026 10:57

Address comments by malfet

cc0589b

mergennachin force-pushed the mps_bf16 branch from 96d3cad to cc0589b Compare March 9, 2026 14:59

pytorchmergebot added the merging label Mar 9, 2026

pytorchmergebot closed this in 3b161e7 Mar 9, 2026

pytorchmergebot removed the merging label Mar 9, 2026

malfet deleted the mps_bf16 branch March 11, 2026 18:00

malfet mentioned this pull request Mar 11, 2026

[Inductor][MPS] Fix half-precision type mismatches in Metal shader codegen (#176436) #177193

Merged

This was referenced Mar 11, 2026

[v.2.11.0] Release Tracker #175093

Closed

[Inductor] Add proper regression test for Voxtral compilation on MPS #177207

Closed

atalman mentioned this pull request Mar 13, 2026

Release 2.11 validations checklist and cherry-picks #177422

Closed

70 tasks

coderabbitai bot mentioned this pull request Mar 21, 2026

Implement Bits Per Byte Metric Computation sign/WeLT#61

Open

7 tasks

	MPS_HALF_DTYPES = [torch.float16] + ([torch.bfloat16] if MACOS_VERSION >= 14.0 else [])
	MPS_HALF_DTYPES = [torch.float16, torch.bfloat16]

	f"}} else {var} = static_cast<{DTYPE_TO_METAL[rc.dtype]}>({other_str});"
	f"}} else {var} = static_cast<decltype(var)>({other_str});"

Conversation

mergennachin commented Mar 4, 2026 • edited by pytorch-bot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/176436

✅ You can merge normally! (2 Unrelated Failures)

Uh oh!

pytorch-bot bot commented Mar 4, 2026

This PR needs a release notes: label

Uh oh!

malfet commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mergennachin commented Mar 6, 2026

Uh oh!

pytorchmergebot commented Mar 6, 2026

Merge started

Uh oh!

pytorchmergebot commented Mar 6, 2026

Merge failed

Uh oh!

malfet commented Mar 6, 2026

Uh oh!

pytorchmergebot commented Mar 6, 2026

Merge started

Uh oh!

pytorchmergebot commented Mar 8, 2026

Merge started

Uh oh!

pytorchmergebot commented Mar 8, 2026

Merge failed

Uh oh!

mergennachin commented Mar 9, 2026

Uh oh!

pytorchmergebot commented Mar 9, 2026

Merge started

Uh oh!

mergennachin commented Mar 10, 2026

Uh oh!

pytorchbot commented Mar 10, 2026

Cherry picking #176436

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mergennachin commented Mar 4, 2026 •

edited by pytorch-bot bot

Loading

pytorch-bot bot commented Mar 4, 2026 •

edited

Loading

This PR needs a `release notes:` label

malfet commented Mar 5, 2026 •

edited

Loading