[MPS] Fix `smooth_l1_loss` backward for fp16 by malfet · Pull Request #166687 · pytorch/pytorch

malfet · 2025-10-31T02:44:17Z

Stack from ghstack (oldest at bottom):

-> [MPS] Fix smooth_l1_loss backward for fp16 #166687

Enable fp16 implementation for CPU, by using convert_to_float primitives instead of convert_bfloat16_float and extending bf16 implementation to half
Simplify OpInfo definitions for the backward

Originally PR used AT_DISPATCH_ALL_TYPES_AND(kHalf,, but it cause ICE with gcc-13 when compiled with SVE128:

/opt/rh/gcc-toolset-13/root/usr/bin/c++ -DAT_BUILD_ARM_VEC256_WITH_SLEEF -DAT_PER_OPERATOR_HEADERS -DBUILD_ONEDNN_GRAPH -DCAFFE2_BUILD_MAIN_LIB -DCAFFE2_PERF_WITH_SVE=1 -DCPUINFO_SUPPORTED_PLATFORM=1 -DENABLE_IPC_FABRIC -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_POSIX_FALLOCATE=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DKINETO_NAMESPACE=libkineto -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_MIMALLOC -DUSE_RPC -DUSE_TENSORPIPE -DXNN_LOG_LEVEL=0 -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/pytorch/build/aten/src -I/pytorch/aten/src -I/pytorch/build -I/pytorch -I/pytorch/nlohmann -I/pytorch/moodycamel -I/pytorch/third_party/mimalloc/include -I/pytorch/torch/csrc/api -I/pytorch/torch/csrc/api/include -I/pytorch/caffe2/aten/src/TH -I/pytorch/build/caffe2/aten/src/TH -I/pytorch/build/caffe2/aten/src -I/acl -I/acl/include -I/pytorch/build/caffe2/../aten/src -I/pytorch/torch/csrc -I/pytorch/torch/headeronly -I/pytorch/third_party/miniz-3.0.2 -I/pytorch/third_party/kineto/libkineto/include -I/pytorch/third_party/kineto/libkineto/src -I/pytorch/third_party/cpp-httplib -I/pytorch/aten/src/ATen/.. -I/pytorch/third_party/FXdiv/include -I/pytorch/c10/.. -I/pytorch/third_party/pthreadpool/include -I/pytorch/third_party/cpuinfo/include -I/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/include -I/pytorch/third_party/NNPACK/include -I/pytorch/third_party/FP16/include -I/pytorch/third_party/tensorpipe -I/pytorch/build/third_party/tensorpipe -I/pytorch/third_party/tensorpipe/third_party/libnop/include -I/pytorch/third_party/kleidiai -I/pytorch/third_party/fmt/include -I/pytorch/build/third_party/ideep/mkl-dnn/include -I/pytorch/third_party/ideep/mkl-dnn/src/../include -I/pytorch/third_party/onnx -I/pytorch/build/third_party/onnx -I/pytorch/third_party/flatbuffers/include -isystem /pytorch/build/third_party/gloo -isystem /pytorch/cmake/../third_party/gloo -isystem /pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /pytorch/third_party/protobuf/src -isystem /opt/OpenBLAS/include -isystem /pytorch/third_party/XNNPACK/include -isystem /pytorch/cmake/../third_party/eigen -isystem /pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /pytorch/third_party/ideep/include -isystem /pytorch/INTERFACE -isystem /pytorch/third_party/nlohmann/include -isystem /pytorch/third_party/concurrentqueue -isystem /pytorch/build/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_PYTORCH_QNNPACK -DAT_BUILD_ARM_VEC256_WITH_SLEEF -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-dangling-reference -Wno-error=dangling-reference -Wno-stringop-overflow -DHAVE_SVE_CPU_DEFINITION -DHAVE_SVE256_CPU_DEFINITION -DHAVE_ARM_BF16_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIC -fdiagnostics-color=always -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -D__NEON__ -DBLAS_HAS_SBGEMM -Wall -Wextra -Wdeprecated -Wunused -Wno-unused-parameter -Wno-missing-field-initializers -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wredundant-move -Wno-interference-size -Wno-maybe-uninitialized -fvisibility=hidden -pthread -fopenmp -O3  -march=armv8-a+sve+bf16 -D__ARM_FEATURE_BF16 -DCPU_CAPABILITY_SVE -msve-vector-bits=256 -DCPU_CAPABILITY=SVE256 -DCPU_CAPABILITY_SVE256 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.SVE256.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.SVE256.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.SVE256.cpp.o -c /pytorch/build/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.SVE256.cpp
during RTL pass: expand
In file included from /pytorch/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp:6,
                 from /pytorch/build/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.SVE256.cpp:1:
/pytorch/aten/src/ATen/native/cpu/Loops.h: In function ‘void at::native::SVE256::vectorized_loop(char**, int64_t, int64_t, func_t&&, vec_func_t&&) [with func_t = at::native::{anonymous}::smooth_l1_backward_cpu_kernel(at::TensorIterator&, const c10::Scalar&, double)::<lambda()>::<lambda()>::<lambda(scalar_t, scalar_t, scalar_t)>&; vec_func_t = at::native::{anonymous}::smooth_l1_backward_cpu_kernel(at::TensorIterator&, const c10::Scalar&, double)::<lambda()>::<lambda()>::<lambda(at::vec::SVE256::Vectorized<c10::Half>, at::vec::SVE256::Vectorized<c10::Half>, at::vec::SVE256::Vectorized<c10::Half>)>&]’:
/pytorch/aten/src/ATen/native/cpu/Loops.h:200:1: internal compiler error: in expand_insn, at optabs.cc:8185
  200 | vectorized_loop(char** C10_RESTRICT data_, int64_t n, int64_t S, func_t&& op, vec_func_t&& vop) {
      | ^~~~~~~~~~~~~~~
Please submit a full bug report, with preprocessed source.
See <http://bugzilla.redhat.com/bugzilla> for instructions.
Preprocessed source stored into /tmp/ccgYMlTo.out file, please attach this to your bugreport.

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01

[ghstack-poisoned]

pytorch-bot · 2025-10-31T02:44:21Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166687

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

ROCm failures during provisioning step due to network issues

❌ 1 New Failure, 200 Pending

As of commit 81fd733 with merge base 94f2657 ():

NEW FAILURE - The following job has failed:

linux-binary-libtorch / libtorch-rocm7_1-shared-with-deps-release-build / build (gh)
ninja: build stopped: subcommand failed

UNSTABLE - The following job is marked as unstable, possibly due to flakiness on trunk:

⏳ trunk / linux-jammy-py3-clang12-executorch / test (executorch, 1, 1, linux.2xlarge, unstable) (gh) (#166072)

This comment was automatically generated by Dr. CI and updates every 15 minutes.

[ghstack-poisoned]

malfet · 2025-10-31T21:11:50Z

@pytorchbot merge -f "CI was green before"

pytorchmergebot · 2025-10-31T21:13:28Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

Or BatchNorm or LayerNorm for Long types Discovered while trying to enable `test_ops.py` for MPS Pull Request resolved: #166215 Approved by: https://github.com/dcci, https://github.com/kulinseth, https://github.com/Skylion007 ghstack dependencies: #166214, #166687

And enable fp16 implementation for CPU, which simplifies OpInfo definitions for the op Pull Request resolved: #166687 Approved by: https://github.com/Skylion007 ghstack dependencies: #166214

This reverts commit 4e7232c.

robert-hardwick · 2025-11-03T13:04:28Z

Looks like this is breaking nightly wheels on aarch64 due to another compiler issue.

EDIT - Looks like it's been reverted already. Will raise this with compiler team on our side.

This reverts commit 9261a1f. Reverted #166215 on behalf of https://github.com/atalman due to sorry need to revert #166687 ([comment](#166215 (comment)))

atalman · 2025-11-03T14:01:03Z

@pytorchmergebot revert -c nosignal -m "GH job link HUD commit link"

pytorchmergebot · 2025-11-03T14:05:18Z

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

This reverts commit 4e7232c. Reverted #166687 on behalf of https://github.com/atalman due to [GH job link](https://github.com/pytorch/pytorch/actions/runs/19027214755/job/54332952760) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/95ab09cb54f6ba13eda0160b663e85119f68ac14) ([comment](#166687 (comment)))

pytorchmergebot · 2025-11-03T14:05:29Z

@malfet your PR has been successfully reverted.

[ghstack-poisoned]

And enable fp16 implementation for CPU ghstack-source-id: a18aee9 Pull Request resolved: #166687

malfet · 2025-11-03T16:32:46Z

Looks like this is breaking nightly wheels on aarch64 due to another compiler issue.

EDIT - Looks like it's been reverted already. Will raise this with compiler team on our side.

I'm just going to guard this codepath against CPU_CAPABILITY_SVE256 and re-land the change, but will create an issue

[ghstack-poisoned]

And enable fp16 implementation for CPU ghstack-source-id: 6836c28 Pull Request resolved: #166687

[ghstack-poisoned]

And enable fp16 implementation for CPU ghstack-source-id: 3358d08 Pull Request resolved: #166687

malfet · 2025-11-03T23:53:09Z

@pytorchbot merge -f "Binary builds are green now"

pytorchmergebot · 2025-11-03T23:54:37Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

And enable fp16 implementation for CPU, which simplifies OpInfo definitions for the op Pull Request resolved: pytorch#166687 Approved by: https://github.com/Skylion007 ghstack dependencies: pytorch#166214

Or BatchNorm or LayerNorm for Long types Discovered while trying to enable `test_ops.py` for MPS Pull Request resolved: pytorch#166215 Approved by: https://github.com/dcci, https://github.com/kulinseth, https://github.com/Skylion007 ghstack dependencies: pytorch#166214, pytorch#166687

This reverts commit 9261a1f. Reverted #166215 on behalf of https://github.com/atalman due to sorry need to revert #166687 ([comment](#166215 (comment)))

This reverts commit 4e7232c. Reverted #166687 on behalf of https://github.com/atalman due to [GH job link](https://github.com/pytorch/pytorch/actions/runs/19027214755/job/54332952760) [HUD commit link](https://hud.pytorch.org/pytorch/pytorch/commit/95ab09cb54f6ba13eda0160b663e85119f68ac14) ([comment](#166687 (comment)))

- Enable fp16 implementation for CPU, by using `convert_to_float` primitives instead of `convert_bfloat16_float` and extending bf16 implementation to half - Simplify OpInfo definitions for the backward Originally PR used `AT_DISPATCH_ALL_TYPES_AND(kHalf,`, but it cause ICE with gcc-13 when compiled with SVE128: ``` /opt/rh/gcc-toolset-13/root/usr/bin/c++ -DAT_BUILD_ARM_VEC256_WITH_SLEEF -DAT_PER_OPERATOR_HEADERS -DBUILD_ONEDNN_GRAPH -DCAFFE2_BUILD_MAIN_LIB -DCAFFE2_PERF_WITH_SVE=1 -DCPUINFO_SUPPORTED_PLATFORM=1 -DENABLE_IPC_FABRIC -DFMT_HEADER_ONLY=1 -DFXDIV_USE_INLINE_ASSEMBLY=0 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_POSIX_FALLOCATE=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DKINETO_NAMESPACE=libkineto -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DNNP_CONVOLUTION_ONLY=0 -DNNP_INFERENCE_ONLY=0 -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_MIMALLOC -DUSE_RPC -DUSE_TENSORPIPE -DXNN_LOG_LEVEL=0 -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/pytorch/build/aten/src -I/pytorch/aten/src -I/pytorch/build -I/pytorch -I/pytorch/nlohmann -I/pytorch/moodycamel -I/pytorch/third_party/mimalloc/include -I/pytorch/torch/csrc/api -I/pytorch/torch/csrc/api/include -I/pytorch/caffe2/aten/src/TH -I/pytorch/build/caffe2/aten/src/TH -I/pytorch/build/caffe2/aten/src -I/acl -I/acl/include -I/pytorch/build/caffe2/../aten/src -I/pytorch/torch/csrc -I/pytorch/torch/headeronly -I/pytorch/third_party/miniz-3.0.2 -I/pytorch/third_party/kineto/libkineto/include -I/pytorch/third_party/kineto/libkineto/src -I/pytorch/third_party/cpp-httplib -I/pytorch/aten/src/ATen/.. -I/pytorch/third_party/FXdiv/include -I/pytorch/c10/.. -I/pytorch/third_party/pthreadpool/include -I/pytorch/third_party/cpuinfo/include -I/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/include -I/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/src -I/pytorch/aten/src/ATen/native/quantized/cpu/qnnpack/deps/clog/include -I/pytorch/third_party/NNPACK/include -I/pytorch/third_party/FP16/include -I/pytorch/third_party/tensorpipe -I/pytorch/build/third_party/tensorpipe -I/pytorch/third_party/tensorpipe/third_party/libnop/include -I/pytorch/third_party/kleidiai -I/pytorch/third_party/fmt/include -I/pytorch/build/third_party/ideep/mkl-dnn/include -I/pytorch/third_party/ideep/mkl-dnn/src/../include -I/pytorch/third_party/onnx -I/pytorch/build/third_party/onnx -I/pytorch/third_party/flatbuffers/include -isystem /pytorch/build/third_party/gloo -isystem /pytorch/cmake/../third_party/gloo -isystem /pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /pytorch/third_party/protobuf/src -isystem /opt/OpenBLAS/include -isystem /pytorch/third_party/XNNPACK/include -isystem /pytorch/cmake/../third_party/eigen -isystem /pytorch/third_party/ideep/mkl-dnn/include/oneapi/dnnl -isystem /pytorch/third_party/ideep/include -isystem /pytorch/INTERFACE -isystem /pytorch/third_party/nlohmann/include -isystem /pytorch/third_party/concurrentqueue -isystem /pytorch/build/include -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON -DUSE_PYTORCH_QNNPACK -DAT_BUILD_ARM_VEC256_WITH_SLEEF -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -DC10_NODEPRECATED -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=range-loop-construct -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-unused-parameter -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=old-style-cast -faligned-new -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Wno-dangling-reference -Wno-error=dangling-reference -Wno-stringop-overflow -DHAVE_SVE_CPU_DEFINITION -DHAVE_SVE256_CPU_DEFINITION -DHAVE_ARM_BF16_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -fPIC -fdiagnostics-color=always -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -D__NEON__ -DBLAS_HAS_SBGEMM -Wall -Wextra -Wdeprecated -Wunused -Wno-unused-parameter -Wno-missing-field-initializers -Wno-array-bounds -Wno-unknown-pragmas -Wno-strict-overflow -Wno-strict-aliasing -Wredundant-move -Wno-interference-size -Wno-maybe-uninitialized -fvisibility=hidden -pthread -fopenmp -O3 -march=armv8-a+sve+bf16 -D__ARM_FEATURE_BF16 -DCPU_CAPABILITY_SVE -msve-vector-bits=256 -DCPU_CAPABILITY=SVE256 -DCPU_CAPABILITY_SVE256 -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.SVE256.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.SVE256.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.SVE256.cpp.o -c /pytorch/build/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.SVE256.cpp during RTL pass: expand In file included from /pytorch/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp:6, from /pytorch/build/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.SVE256.cpp:1: /pytorch/aten/src/ATen/native/cpu/Loops.h: In function ‘void at::native::SVE256::vectorized_loop(char**, int64_t, int64_t, func_t&&, vec_func_t&&) [with func_t = at::native::{anonymous}::smooth_l1_backward_cpu_kernel(at::TensorIterator&, const c10::Scalar&, double)::<lambda()>::<lambda()>::<lambda(scalar_t, scalar_t, scalar_t)>&; vec_func_t = at::native::{anonymous}::smooth_l1_backward_cpu_kernel(at::TensorIterator&, const c10::Scalar&, double)::<lambda()>::<lambda()>::<lambda(at::vec::SVE256::Vectorized<c10::Half>, at::vec::SVE256::Vectorized<c10::Half>, at::vec::SVE256::Vectorized<c10::Half>)>&]’: /pytorch/aten/src/ATen/native/cpu/Loops.h:200:1: internal compiler error: in expand_insn, at optabs.cc:8185 200 | vectorized_loop(char** C10_RESTRICT data_, int64_t n, int64_t S, func_t&& op, vec_func_t&& vop) { | ^~~~~~~~~~~~~~~ Please submit a full bug report, with preprocessed source. See <http://bugzilla.redhat.com/bugzilla> for instructions. Preprocessed source stored into /tmp/ccgYMlTo.out file, please attach this to your bugreport. ``` Pull Request resolved: #166687 Approved by: https://github.com/Skylion007

And enable fp16 implementation for CPU ghstack-source-id: 325abd6 Pull Request resolved: pytorch/pytorch#166687

Update

9edd96a

[ghstack-poisoned]

malfet requested review from kulinseth and mruberry as code owners October 31, 2025 02:44

pytorch-bot Bot added ciflow/mps Run MPS tests (subset of trunk) module: cpu CPU specific problem (e.g., perf, algorithm) release notes: mps Release notes category labels Oct 31, 2025

This was referenced Oct 28, 2025

[MPS] Error out when BatchNorm is called for Complex #166215

Closed

[MPS][1/N] Fix unsupported dtypes error checking for some MPS ops #166273

Closed

[CI] Enable test_ops for MPS backend #166688

Closed

malfet requested a review from Skylion007 October 31, 2025 02:44

Skylion007 approved these changes Oct 31, 2025

View reviewed changes

malfet added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 31, 2025

Update

98ff5dd

[ghstack-poisoned]

pytorchmergebot added the merging label Oct 31, 2025

pytorchmergebot added the Merged label Oct 31, 2025

pytorchmergebot closed this in 4e7232c Oct 31, 2025

pytorchmergebot removed the merging label Oct 31, 2025

atalman added a commit to atalman/pytorch that referenced this pull request Nov 2, 2025

Revert "[MPS] Fix smooth_l1_loss backward for fp16 (pytorch#166687)"

d705085

This reverts commit 4e7232c.

pytorchmergebot added Reverted ci-no-td Do not run TD on this PR labels Nov 3, 2025

pytorchmergebot reopened this Nov 3, 2025

Update

c63dd92

[ghstack-poisoned]

malfet added a commit that referenced this pull request Nov 3, 2025

[MPS] Fix smooth_l1_loss backward for fp16

f73ea6d

And enable fp16 implementation for CPU ghstack-source-id: a18aee9 Pull Request resolved: #166687

malfet added the ciflow/binaries Trigger all binary build and upload jobs on the PR label Nov 3, 2025

Update

0fe6c62

[ghstack-poisoned]

malfet added a commit that referenced this pull request Nov 3, 2025

[MPS] Fix smooth_l1_loss backward for fp16

f201b0a

And enable fp16 implementation for CPU ghstack-source-id: 6836c28 Pull Request resolved: #166687

fadara01 mentioned this pull request Nov 3, 2025

[ci][cpu] Update compiler to GCC-13 in jammy-aarch64 #166849

Closed

malfet added a commit that referenced this pull request Nov 3, 2025

[MPS] Fix smooth_l1_loss backward for fp16

f30cc12

And enable fp16 implementation for CPU ghstack-source-id: 6836c28 Pull Request resolved: #166687

Update

81fd733

[ghstack-poisoned]

malfet added a commit that referenced this pull request Nov 3, 2025

[MPS] Fix smooth_l1_loss backward for fp16

b259514

And enable fp16 implementation for CPU ghstack-source-id: 3358d08 Pull Request resolved: #166687

pytorchmergebot added the merging label Nov 3, 2025

pytorchmergebot closed this in 5c89bdb Nov 3, 2025

pytorchmergebot removed the merging label Nov 3, 2025

Khanaksahu pushed a commit to Khanaksahu/pytorch that referenced this pull request Nov 17, 2025

[MPS] Fix smooth_l1_loss backward for fp16

a72ed89

And enable fp16 implementation for CPU ghstack-source-id: 325abd6 Pull Request resolved: pytorch/pytorch#166687

This was referenced Nov 20, 2025

ICE in gcc < 13.5 for AArch64 on Neoverse-V1 #168288

Open

[ci][cpu] Update AArch64 manylinux compiler to GCC14 #166876

Open

github-actions Bot deleted the gh/malfet/582/head branch December 4, 2025 02:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MPS] Fix `smooth_l1_loss` backward for fp16#166687

[MPS] Fix `smooth_l1_loss` backward for fp16#166687
malfet wants to merge 5 commits intogh/malfet/582/basefrom
gh/malfet/582/head

malfet commented Oct 31, 2025 •

edited

Loading

Uh oh!

pytorch-bot Bot commented Oct 31, 2025 •

edited

Loading

Uh oh!

malfet commented Oct 31, 2025

Uh oh!

pytorchmergebot commented Oct 31, 2025

Uh oh!

robert-hardwick commented Nov 3, 2025 •

edited

Loading

Uh oh!

atalman commented Nov 3, 2025

Uh oh!

pytorchmergebot commented Nov 3, 2025

Uh oh!

pytorchmergebot commented Nov 3, 2025

Uh oh!

malfet commented Nov 3, 2025

Uh oh!

malfet commented Nov 3, 2025

Uh oh!

pytorchmergebot commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Conversation

malfet commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Oct 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166687

❗ 1 Active SEVs

❌ 1 New Failure, 200 Pending

Uh oh!

malfet commented Oct 31, 2025

Uh oh!

pytorchmergebot commented Oct 31, 2025

Merge started

Uh oh!

robert-hardwick commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

atalman commented Nov 3, 2025

Uh oh!

pytorchmergebot commented Nov 3, 2025

Uh oh!

pytorchmergebot commented Nov 3, 2025

Uh oh!

malfet commented Nov 3, 2025

Uh oh!

malfet commented Nov 3, 2025

Uh oh!

pytorchmergebot commented Nov 3, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

malfet commented Oct 31, 2025 •

edited

Loading

pytorch-bot Bot commented Oct 31, 2025 •

edited

Loading

robert-hardwick commented Nov 3, 2025 •

edited

Loading