[ci][cpu] Update compiler to GCC-13 in jammy-aarch64 by fadara01 · Pull Request #166849 · pytorch/pytorch

fadara01 · 2025-11-03T09:23:25Z

Stack from ghstack (oldest at bottom):

This is needed because manylinux uses GCC-13 since #152825
As a result of the current compiler version mismatches, we've seen tests passing jammy-aarch64 pre-commit CI, but failing for wheels built in manylinux
Related to: #166736

cc @jgong5 @mingfeima @XiaobingSuper @sanchitintel @ashokei @jingxu10 @jerryzh168 @aditew01 @seemethere @malfet @pytorch/pytorch-dev-infra @snadampal @milpuz01 @nikhil-arm

[ghstack-poisoned]

pytorch-bot · 2025-11-03T09:23:29Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166849

📄 Preview Python docs built from this PR
📄 Preview C++ docs built from this PR
❓ Need help or want to give feedback on the CI? Visit the bot commands wiki

Note: Links to docs will display an error until the docs builds have been completed.

❌ 2 New Failures

As of commit cf99307 with merge base fbd70fb ():

NEW FAILURES - The following jobs have failed:

operator_benchmark / aarch64-opbenchmark-test / test (cpu_operator_benchmark_short, 1, 1, linux.arm64.m8g.4xlarge) (gh)
erf__M512_N512_cpu
operator_benchmark / x86-opbenchmark-test / test (cpu_operator_benchmark_short, 1, 1, linux.12xlarge) (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

This is needed because manylinux uses GCC-13 since #152825 As a result of the current compiler version mismatches, we've seen tests passing jammy-aarch64 pre-commit CI, but failing for wheels built in manylinux Related to: #166736 ghstack-source-id: a68de47 Pull-Request: #166849

robert-hardwick

Need to update GCC_VERSION variable

[ghstack-poisoned]

This is needed because manylinux uses GCC-13 since #152825 As a result of the current compiler version mismatches, we've seen tests passing jammy-aarch64 pre-commit CI, but failing for wheels built in manylinux Related to: #166736 ghstack-source-id: 9d18d85 Pull-Request: #166849

robert-hardwick

LGTM

fadara01 · 2025-11-03T10:52:48Z

Actually, we also need to update both manylinux and jammy to GCC14 as per @malfet 's comment about manylinux standards on #166736:

Also, per manylinux2_28 standard all builds should be done by gcc-14 toolchain (see https://github.com/pypa/manylinux?tab=readme-ov-file#manylinux_2_28-almalinux-8-based ) if this is not the case, than it's a bug, please don't hesitate to propose a PR that fixes it

Let's address the GCC version mismatch for AArch64 between jammy and manylinux first (and get jammy to build with GCC13), then we'll raise PRs to this stack updating both to GCC14, which I think is related to #149828 and #152426

fadara01 · 2025-11-03T11:59:41Z

Oh, OpenBLAS is failing to link with GCC-13 in jammy due to missing -lgfortran

/usr/bin/ld: cannot find -lgfortran: No such file or directory
/usr/bin/ld: cannot find -lgfortran: No such file or directory
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:207: ../libopenblasp-r0.3.30.so] Error 1
make: *** [Makefile:149: shared] Error 2

#80 ERROR: process "/bin/sh -c if [ -n \"${OPENBLAS}\" ]; then bash ./install_openblas.sh; fi" did not complete successfully: exit code: 2

malfet · 2025-11-03T15:35:44Z

Oh, OpenBLAS is failing to link with GCC-13 in jammy due to missing -lgfortran

/usr/bin/ld: cannot find -lgfortran: No such file or directory
/usr/bin/ld: cannot find -lgfortran: No such file or directory
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:207: ../libopenblasp-r0.3.30.so] Error 1
make: *** [Makefile:149: shared] Error 2

#80 ERROR: process "/bin/sh -c if [ -n \"${OPENBLAS}\" ]; then bash ./install_openblas.sh; fi" did not complete successfully: exit code: 2

@fadara01 just install gcc13-gfortran or something (I bet there is a line somewhere in the scripts that does it already)

fadara01 · 2025-11-03T15:36:18Z

gcc13-gfortran or something (I bet there is a line somewhere in the scripts that does it already)

Yup that's what I'm doing locally

malfet · 2025-11-03T15:39:15Z

gcc13-gfortran or something (I bet there is a line somewhere in the scripts that does it already)

Yup that's what I'm doing locally

Alternative, you can just move CI to a more recent version of Ubuntu

[ghstack-poisoned]

This is needed because manylinux uses GCC-13 since #152825 As a result of the current compiler version mismatches, we've seen tests passing jammy-aarch64 pre-commit CI, but failing for wheels built in manylinux Related to: #166736 ghstack-source-id: 36be71d Pull-Request: #166849

fadara01 · 2025-11-03T17:33:14Z

Current linux-aarch64 / linux-jammy-aarch64-py3.10 / build failure is a known issue with GCC 13 (which failed nightly too) from #166687

In file included from /var/lib/jenkins/workspace/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp:6,
                 from /var/lib/jenkins/workspace/build/aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp.SVE256.cpp:1:
/var/lib/jenkins/workspace/aten/src/ATen/native/cpu/Loops.h: In function ‘void at::native::SVE256::vectorized_loop(char**, int64_t, int64_t, func_t&&, vec_func_t&&) [with func_t = at::native::{anonymous}::smooth_l1_backward_cpu_kernel(at::TensorIterator&, const c10::Scalar&, double)::<lambda()>::<lambda()>::<lambda(scalar_t, scalar_t, scalar_t)>&; vec_func_t = at::native::{anonymous}::smooth_l1_backward_cpu_kernel(at::TensorIterator&, const c10::Scalar&, double)::<lambda()>::<lambda()>::<lambda(at::vec::SVE256::Vectorized<c10::Half>, at::vec::SVE256::Vectorized<c10::Half>, at::vec::SVE256::Vectorized<c10::Half>)>&]’:
/var/lib/jenkins/workspace/aten/src/ATen/native/cpu/Loops.h:200:1: internal compiler error: in expand_insn, at optabs.cc:8185
  200 | vectorized_loop(char** C10_RESTRICT data_, int64_t n, int64_t S, func_t&& op, vec_func_t&& vop) {
      | ^~~~~~~~~~~~~~~

[ghstack-poisoned]

fadara01 · 2025-11-03T18:28:01Z

After rebasing, we now have a new failure from a GCC13 warning that's being treated as error:

In file included from /usr/include/c++/13/bits/stl_uninitialized.h:63,
                 from /usr/include/c++/13/memory:69,
                 from /var/lib/jenkins/workspace/third_party/googletest/googletest/include/gtest/gtest.h:55,
                 from /var/lib/jenkins/workspace/test/cpp/api/inference_mode.cpp:1:
In static member function ‘static _Up* std::__copy_move<_IsMove, true, std::random_access_iterator_tag>::__copy_m(_Tp*, _Tp*, _Up*) [with _Tp = long unsigned int; _Up = long unsigned int; bool _IsMove = false]’,
    inlined from ‘_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = long unsigned int*; _OI = long unsigned int*]’ at /usr/include/c++/13/bits/stl_algobase.h:506:30,
    inlined from ‘_OI std::__copy_move_a1(_II, _II, _OI) [with bool _IsMove = false; _II = long unsigned int*; _OI = long unsigned int*]’ at /usr/include/c++/13/bits/stl_algobase.h:533:42,
    inlined from ‘_OI std::__copy_move_a(_II, _II, _OI) [with bool _IsMove = false; _II = long unsigned int*; _OI = long unsigned int*]’ at /usr/include/c++/13/bits/stl_algobase.h:540:31,
    inlined from ‘_OI std::copy(_II, _II, _OI) [with _II = long unsigned int*; _OI = long unsigned int*]’ at /usr/include/c++/13/bits/stl_algobase.h:633:7,
    inlined from ‘std::vector<bool, _Alloc>::iterator std::vector<bool, _Alloc>::_M_copy_aligned(const_iterator, const_iterator, iterator) [with _Alloc = std::allocator<bool>]’ at /usr/include/c++/13/bits/stl_bvector.h:1305:28,
    inlined from ‘void std::vector<bool, _Alloc>::_M_reallocate(size_type) [with _Alloc = std::allocator<bool>]’ at /usr/include/c++/13/bits/vector.tcc:851:40,
    inlined from ‘void std::vector<bool, _Alloc>::reserve(size_type) [with _Alloc = std::allocator<bool>]’ at /usr/include/c++/13/bits/stl_bvector.h:1093:17,
    inlined from ‘static std::enable_if_t<is_same_v<X, T>, decltype (X::forward(nullptr, (declval<Args>)()...))> torch::autograd::Function<T>::apply(Args&& ...) [with X = InferenceModeTest_TestCustomFunction_Test::TestBody()::MyFunction; Args = {at::Tensor&, int&, at::Tensor&}; T = InferenceModeTest_TestCustomFunction_Test::TestBody()::MyFunction]’ at /var/lib/jenkins/workspace/torch/csrc/autograd/custom_function.h:483:35,
    inlined from ‘virtual void InferenceModeTest_TestCustomFunction_Test::TestBody()’ at /var/lib/jenkins/workspace/test/cpp/api/inference_mode.cpp:644:47:
/usr/include/c++/13/bits/stl_algobase.h:437:30: error: ‘void* __builtin_memmove(void*, const void*, long unsigned int)’ forming offset 8 is out of the bounds [0, 8] [-Werror=array-bounds=]
  437 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors

fadara01 · 2025-11-05T17:41:30Z

I think this is the corresponding gcc issue for the new warning we're seeing: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113239

let's silent it...

[ghstack-poisoned]

fadara01 · 2025-11-05T19:29:52Z

@pytorchbot merge -f "AArch64 operator benchmarks are very flakey"

pytorchmergebot · 2025-11-05T19:31:47Z

Merge started

Your change will be merged immediately since you used the force (-f) flag, bypassing any CI checks (ETA: 1-5 minutes). Please use -f as last resort and instead consider -i/--ignore-current to continue the merge ignoring current failures. This will allow currently pending tests to finish and report signal before the merge.

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

pytorchmergebot · 2025-11-05T19:32:06Z

Merge failed

Reason: Command git -C /home/runner/work/pytorch/pytorch cherry-pick -x 1808a854a04d133fb81205b951d25a33b078c903 returned non-zero exit code 1

Auto-merging .ci/docker/build.sh
CONFLICT (content): Merge conflict in .ci/docker/build.sh
Auto-merging .github/workflows/docker-builds.yml
CONFLICT (content): Merge conflict in .github/workflows/docker-builds.yml
error: could not apply 1808a854a04... [ci][cpu] Update compiler to GCC-13 in jammy-aarch64
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git cherry-pick --continue".
hint: You can instead skip this commit with "git cherry-pick --skip".
hint: To abort and get back to the state before "git cherry-pick",
hint: run "git cherry-pick --abort".
hint: Disable this message with "git config set advice.mergeConflict false"

Details for Dev Infra team

Raised by workflow job

[ghstack-poisoned]

fadara01 · 2025-11-05T22:12:39Z

Darn, after rebasing, we now fail with another GCC13 warning -Werror=dangling-pointer introduce by #164991

test_aoti_abi_check/CMakeFiles/test_aoti_abi_check.dir/test_headeronlyarrayref.cpp.o -MF test_aoti_abi_check/CMakeFiles/test_aoti_abi_check.dir/test_headeronlyarrayref.cpp.o.d -o test_aoti_abi_check/CMakeFiles/test_aoti_abi_check.dir/test_headeronlyarrayref.cpp.o -c /var/lib/jenkins/workspace/test/cpp/aoti_abi_check/test_headeronlyarrayref.cpp
In file included from /usr/include/c++/13/bits/stl_uninitialized.h:63,
                 from /usr/include/c++/13/memory:69,
                 from /var/lib/jenkins/workspace/third_party/googletest/googletest/include/gtest/gtest.h:55,
                 from /var/lib/jenkins/workspace/test/cpp/aoti_abi_check/test_headeronlyarrayref.cpp:1:
In static member function ‘static _Up* std::__copy_move<_IsMove, true, std::random_access_iterator_tag>::__copy_m(_Tp*, _Tp*, _Up*) [with _Tp = const int; _Up = int; bool _IsMove = false]’,
    inlined from ‘_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = const int*; _OI = int*]’ at /usr/include/c++/13/bits/stl_algobase.h:506:30,
    inlined from ‘_OI std::__copy_move_a1(_II, _II, _OI) [with bool _IsMove = false; _II = const int*; _OI = int*]’ at /usr/include/c++/13/bits/stl_algobase.h:533:42,
    inlined from ‘_OI std::__copy_move_a(_II, _II, _OI) [with bool _IsMove = false; _II = const int*; _OI = int*]’ at /usr/include/c++/13/bits/stl_algobase.h:540:31,
    inlined from ‘_OI std::copy(_II, _II, _OI) [with _II = const int*; _OI = int*]’ at /usr/include/c++/13/bits/stl_algobase.h:633:7,
    inlined from ‘static _ForwardIterator std::__uninitialized_copy<true>::__uninit_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = const int*; _ForwardIterator = int*]’ at /usr/include/c++/13/bits/stl_uninitialized.h:147:27,
    inlined from ‘_ForwardIterator std::uninitialized_copy(_InputIterator, _InputIterator, _ForwardIterator) [with _InputIterator = const int*; _ForwardIterator = int*]’ at /usr/include/c++/13/bits/stl_uninitialized.h:185:15,
    inlined from ‘_ForwardIterator std::__uninitialized_copy_a(_InputIterator, _InputIterator, _ForwardIterator, allocator<_Tp>&) [with _InputIterator = const int*; _ForwardIterator = int*; _Tp = int]’ at /usr/include/c++/13/bits/stl_uninitialized.h:373:37,
    inlined from ‘void std::vector<_Tp, _Alloc>::_M_range_initialize(_ForwardIterator, _ForwardIterator, std::forward_iterator_tag) [with _ForwardIterator = const int*; _Tp = int; _Alloc = std::allocator<int>]’ at /usr/include/c++/13/bits/stl_vector.h:1692:33,
    inlined from ‘std::vector<_Tp, _Alloc>::vector(_InputIterator, _InputIterator, const allocator_type&) [with _InputIterator = const int*; <template-parameter-2-2> = void; _Tp = int; _Alloc = std::allocator<int>]’ at /usr/include/c++/13/bits/stl_vector.h:708:23,
    inlined from ‘std::vector<T> c10::HeaderOnlyArrayRef<T>::vec() const [with T = int]’ at /var/lib/jenkins/workspace/torch/headeronly/util/HeaderOnlyArrayRef.h:236:64,
    inlined from ‘virtual void TestHeaderOnlyArrayRef_TestFromInitializerList_Test::TestBody()’ at /var/lib/jenkins/workspace/test/cpp/aoti_abi_check/test_headeronlyarrayref.cpp:39:26:
/usr/include/c++/13/bits/stl_algobase.h:437:30: error: using a dangling pointer to an unnamed temporary [-Werror=dangling-pointer=]
  437 |             __builtin_memmove(__result, __first, sizeof(_Tp) * _Num);
      |             ~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/var/lib/jenkins/workspace/test/cpp/aoti_abi_check/test_headeronlyarrayref.cpp: In member function ‘virtual void TestHeaderOnlyArrayRef_TestFromInitializerList_Test::TestBody()’:
/var/lib/jenkins/workspace/test/cpp/aoti_abi_check/test_headeronlyarrayref.cpp:38:52: note: unnamed temporary defined here
   38 |   HeaderOnlyArrayRef<int> arr({1, 2, 3, 4, 5, 6, 7});
      |                                                    ^
cc1plus: all warnings being treated as errors

[ghstack-poisoned]

fadara01 · 2025-11-06T00:55:14Z

@pytorchbot merge --ignore-current "operator benchmarks are flakey"

pytorch-bot · 2025-11-06T00:55:16Z

❌ 🤖 pytorchbot command failed:

@pytorchbot: error: unrecognized arguments: operator benchmarks are flakey

usage: @pytorchbot [-h] {merge,revert,rebase,label,drci,cherry-pick} ...

Try @pytorchbot --help for more info.

fadara01 · 2025-11-06T00:56:04Z

@pytorchbot merge --ignore-current

pytorchmergebot · 2025-11-06T00:58:03Z

Merge started

Your change will be merged while ignoring the following 2 checks: operator_benchmark / x86-opbenchmark-test / test (cpu_operator_benchmark_short, 1, 1, linux.12xlarge), operator_benchmark / aarch64-opbenchmark-test / test (cpu_operator_benchmark_short, 1, 1, linux.arm64.m8g.4xlarge)

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging

Check the merge workflow status
here

This is needed because manylinux uses GCC-13 since #152825 As a result of the current compiler version mismatches, we've seen tests passing jammy-aarch64 pre-commit CI, but failing for wheels built in manylinux Related to: #166736 ghstack-source-id: a4d92fc Pull-Request: pytorch/pytorch#166849

Update

7b5f743

[ghstack-poisoned]

pytorch-bot Bot added the topic: not user facing topic category label Nov 3, 2025

fadara01 requested review from atalman, malfet and robert-hardwick November 3, 2025 09:25

pytorchbot added the open source label Nov 3, 2025

fadara01 mentioned this pull request Nov 3, 2025

Aarch64 unit test failures from nightly/manylinux build, jammy upgrade to gcc13 needed #166736

Closed

fadara01 added module: cpu CPU specific problem (e.g., perf, algorithm) module: arm Related to ARM architectures builds of PyTorch. Includes Apple M1 ciflow/linux-aarch64 linux aarch64 CI workflow labels Nov 3, 2025

robert-hardwick added the module: ci Related to continuous integration label Nov 3, 2025

robert-hardwick requested changes Nov 3, 2025

View reviewed changes

Comment thread .ci/docker/build.sh Outdated

Comment thread .ci/docker/build.sh Outdated

Update

695ec47

[ghstack-poisoned]

fadara01 requested review from a team and jeffdaily as code owners November 3, 2025 10:21

fadara01 requested a review from robert-hardwick November 3, 2025 10:23

robert-hardwick approved these changes Nov 3, 2025

View reviewed changes

malfet approved these changes Nov 3, 2025

View reviewed changes

Update

8e824ad

[ghstack-poisoned]

fadara01 mentioned this pull request Nov 3, 2025

Silent XNNPACK GCC14 warnings #166873

Closed

Update

c4af11c

[ghstack-poisoned]

Skylion007 approved these changes Nov 3, 2025

View reviewed changes

fadara01 mentioned this pull request Nov 3, 2025

[ci][cpu] Update AArch64 manylinux compiler to GCC14 #166876

Open

atalman approved these changes Nov 3, 2025

View reviewed changes

Update

48664f3

[ghstack-poisoned]

pytorchmergebot added the merging label Nov 5, 2025

pytorchmergebot removed the merging label Nov 5, 2025

Update

9031d35

[ghstack-poisoned]

fadara01 added the ciflow/trunk Trigger trunk jobs on your pull request label Nov 5, 2025

Update

cf99307

[ghstack-poisoned]

pytorch-bot Bot added the ciflow/inductor label Nov 5, 2025

pytorchmergebot added the merging label Nov 6, 2025

pytorchmergebot added the Merged label Nov 6, 2025

pytorchmergebot closed this in c08ce30 Nov 6, 2025

pytorchmergebot removed the merging label Nov 6, 2025

github-actions Bot deleted the gh/fadara01/7/head branch December 7, 2025 02:20

Conversation

fadara01 commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot Bot commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/166849

❌ 2 New Failures

Uh oh!

robert-hardwick left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

robert-hardwick left a comment

Choose a reason for hiding this comment

Uh oh!

fadara01 commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fadara01 commented Nov 3, 2025

Uh oh!

malfet commented Nov 3, 2025

Uh oh!

fadara01 commented Nov 3, 2025

Uh oh!

malfet commented Nov 3, 2025

Uh oh!

fadara01 commented Nov 3, 2025

Uh oh!

fadara01 commented Nov 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fadara01 commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fadara01 commented Nov 5, 2025

Uh oh!

pytorchmergebot commented Nov 5, 2025

Merge started

Uh oh!

pytorchmergebot commented Nov 5, 2025

Merge failed

Uh oh!

fadara01 commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fadara01 commented Nov 6, 2025

Uh oh!

pytorch-bot Bot commented Nov 6, 2025

Uh oh!

fadara01 commented Nov 6, 2025

Uh oh!

pytorchmergebot commented Nov 6, 2025

Merge started

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

fadara01 commented Nov 3, 2025 •

edited

Loading

pytorch-bot Bot commented Nov 3, 2025 •

edited

Loading

fadara01 commented Nov 3, 2025 •

edited

Loading

fadara01 commented Nov 3, 2025 •

edited

Loading

fadara01 commented Nov 5, 2025 •

edited

Loading

fadara01 commented Nov 5, 2025 •

edited

Loading