Disable NEON workaround on Clang 20 and above, and enable it for non-mobile platforms#1942
Disable NEON workaround on Clang 20 and above, and enable it for non-mobile platforms#1942Dead2 merged 3 commits intozlib-ng:developfrom Un1q32:develop
Conversation
WalkthroughThe code updates conditional compilation logic in the NEON intrinsics header for ARM architectures. It changes the criteria for undefining certain NEON intrinsics, now basing the decision on the Clang compiler version (requiring Clang 20 or newer) for 32-bit ARM targets, instead of previously using an Changes
Estimated code review effort🎯 1 (Trivial) | ⏱️ ~2 minutes Suggested labels
Note ⚡️ Unit Test Generation - BetaCodeRabbit's unit test generation is now available in Beta! Automatically generate comprehensive unit tests for your code changes, ensuring better test coverage and catching edge cases you might miss. Our AI analyzes your code structure and creates tests that follow best practices and your project's testing patterns. Learn more here, or just try it under ✨ Finishing Touches. 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (70)
✨ Finishing Touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
cmake/detect-intrinsics.cmake (1)
220-224: Consider narrowing the Apple platform detection.The current condition uses
__APPLE__which covers all Apple platforms (macOS, iOS, tvOS, watchOS). If the performance issue is specific to iOS as mentioned in the PR objectives, consider using more specific macros likeTARGET_OS_IOS(requires#include <TargetConditionals.h>) to avoid unnecessarily restricting macOS builds.However, if testing shows the issue affects all Apple ARM platforms, the current broad condition is appropriate.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
arch/arm/neon_intrins.h(1 hunks)cmake/detect-intrinsics.cmake(1 hunks)configure(1 hunks)
🧠 Learnings (4)
📓 Common learnings
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: In zlib-ng, the policy for supporting old compilers is to maintain compatibility for up to two years after GitHub removes support for them in CI, or when there is no long-time contributor left testing for the architecture if the architecture was not supported by GitHub Actions at all.
Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: LoongArch64 is still a quite new architecture where compilers don't yet have intrinsic functions for all the common operations, making inline assembly implementations necessary as fallbacks.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.
cmake/detect-intrinsics.cmake (6)
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by defined(_MSC_VER) && !defined(__clang__), making them mutually exclusive.
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define __GNUC__, so adding !defined(_MSC_VER) to GCC detection macros is redundant when __GNUC__ is already being checked.
Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.
Learnt from: mtl1979
PR: #1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.
Learnt from: KungFuJesus
PR: #1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In inffast_tpl.h, when AVX512 is enabled, the branch involving chunkcopy_safe is intentionally eliminated to optimize performance.
Learnt from: mtl1979
PR: #1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:30:10.081Z
Learning: Feature detection modules like riscv_features.c should not be compiled with feature-specific flags (like RVVFLAG) because they need to be compilable on all systems regardless of feature support. These modules perform runtime detection and must initialize feature availability flags to zero on unsupported systems.
configure (3)
Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.
Learnt from: mtl1979
PR: #1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:20:52.333Z
Learning: Optional feature flags in Makefile.in templates should default to empty strings. The configure script should detect feature support and update these variables accordingly during the configuration process.
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define __GNUC__, so adding !defined(_MSC_VER) to GCC detection macros is redundant when __GNUC__ is already being checked.
arch/arm/neon_intrins.h (11)
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by defined(_MSC_VER) && !defined(__clang__), making them mutually exclusive.
Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.
Learnt from: KungFuJesus
PR: #1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In inffast_tpl.h, when AVX512 is enabled, the branch involving chunkcopy_safe is intentionally eliminated to optimize performance.
Learnt from: KungFuJesus
PR: #1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In arch/x86/chunkset_avx2.c, when working with AVX2-capable x86 CPUs, unaligned memory access using _mm_loadu_si128 is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
Learnt from: KungFuJesus
PR: #1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In arch/x86/chunkset_avx2.c, when working with AVX2-capable x86 CPUs, unaligned memory access using _mm_loadu_si128 is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
Learnt from: mtl1979
PR: #1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:41:10.063Z
Learning: For SSE2 optimizations, _mm_cvtsi128_si64 should be used instead of _mm_extract_epi64 (SSE4.1) for extracting 64-bit values from 128-bit vectors, as it generates more efficient movq instructions.
Learnt from: KungFuJesus
PR: #1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In arch/x86/chunkset_avx2.c, within the GET_HALFCHUNK_MAG function, using a conditional branch to select between _mm_loadl_epi64 and _mm_loadu_si128 is not recommended because the branching cost outweighs the savings from the load.
Learnt from: Ag-Cu
PR: #1905
File: arch/riscv/crc32_riscv.c:0-0
Timestamp: 2025-04-22T07:05:36.825Z
Learning: On RISC-V architectures running Linux, misaligned memory accesses are guaranteed to work for userspace applications as documented in the kernel's UABI specifications. While ISO C considers unaligned pointer casts as undefined behavior, the Linux kernel guarantees support for misaligned loads/stores either through hardware support or kernel emulation (with potential performance impact). This does not apply to atomic operations, which must remain aligned.
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: In zlib-ng, the policy for supporting old compilers is to maintain compatibility for up to two years after GitHub removes support for them in CI, or when there is no long-time contributor left testing for the architecture if the architecture was not supported by GitHub Actions at all.
Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: LoongArch64 is still a quite new architecture where compilers don't yet have intrinsic functions for all the common operations, making inline assembly implementations necessary as fallbacks.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.
cmake/detect-intrinsics.cmake (6)
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by defined(_MSC_VER) && !defined(__clang__), making them mutually exclusive.
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define __GNUC__, so adding !defined(_MSC_VER) to GCC detection macros is redundant when __GNUC__ is already being checked.
Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.
Learnt from: mtl1979
PR: #1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.
Learnt from: KungFuJesus
PR: #1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In inffast_tpl.h, when AVX512 is enabled, the branch involving chunkcopy_safe is intentionally eliminated to optimize performance.
Learnt from: mtl1979
PR: #1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:30:10.081Z
Learning: Feature detection modules like riscv_features.c should not be compiled with feature-specific flags (like RVVFLAG) because they need to be compilable on all systems regardless of feature support. These modules perform runtime detection and must initialize feature availability flags to zero on unsupported systems.
configure (3)
Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.
Learnt from: mtl1979
PR: #1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:20:52.333Z
Learning: Optional feature flags in Makefile.in templates should default to empty strings. The configure script should detect feature support and update these variables accordingly during the configuration process.
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define __GNUC__, so adding !defined(_MSC_VER) to GCC detection macros is redundant when __GNUC__ is already being checked.
arch/arm/neon_intrins.h (11)
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by defined(_MSC_VER) && !defined(__clang__), making them mutually exclusive.
Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.
Learnt from: KungFuJesus
PR: #1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In inffast_tpl.h, when AVX512 is enabled, the branch involving chunkcopy_safe is intentionally eliminated to optimize performance.
Learnt from: KungFuJesus
PR: #1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In arch/x86/chunkset_avx2.c, when working with AVX2-capable x86 CPUs, unaligned memory access using _mm_loadu_si128 is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
Learnt from: KungFuJesus
PR: #1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In arch/x86/chunkset_avx2.c, when working with AVX2-capable x86 CPUs, unaligned memory access using _mm_loadu_si128 is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
Learnt from: mtl1979
PR: #1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:41:10.063Z
Learning: For SSE2 optimizations, _mm_cvtsi128_si64 should be used instead of _mm_extract_epi64 (SSE4.1) for extracting 64-bit values from 128-bit vectors, as it generates more efficient movq instructions.
Learnt from: KungFuJesus
PR: #1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In arch/x86/chunkset_avx2.c, within the GET_HALFCHUNK_MAG function, using a conditional branch to select between _mm_loadl_epi64 and _mm_loadu_si128 is not recommended because the branching cost outweighs the savings from the load.
Learnt from: Ag-Cu
PR: #1905
File: arch/riscv/crc32_riscv.c:0-0
Timestamp: 2025-04-22T07:05:36.825Z
Learning: On RISC-V architectures running Linux, misaligned memory accesses are guaranteed to work for userspace applications as documented in the kernel's UABI specifications. While ISO C considers unaligned pointer casts as undefined behavior, the Linux kernel guarantees support for misaligned loads/stores either through hardware support or kernel emulation (with potential performance impact). This does not apply to atomic operations, which must remain aligned.
Learnt from: KungFuJesus
PR: #1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (95)
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
- GitHub Check: Ubuntu GCC Native Instructions (AVX)
- GitHub Check: Ubuntu GCC -O3 OSB
- GitHub Check: Ubuntu GCC Native Instructions
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Analyze (cpp)
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
- GitHub Check: Ubuntu GCC Native Instructions (AVX)
- GitHub Check: Ubuntu GCC -O3 OSB
- GitHub Check: Ubuntu GCC Native Instructions
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Analyze (cpp)
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
- GitHub Check: Ubuntu GCC Native Instructions (AVX)
- GitHub Check: Ubuntu GCC -O3 OSB
- GitHub Check: Ubuntu GCC Native Instructions
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Analyze (cpp)
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
- GitHub Check: Ubuntu GCC Native Instructions (AVX)
- GitHub Check: Ubuntu GCC -O3 OSB
- GitHub Check: Ubuntu GCC Native Instructions
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Analyze (cpp)
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
- GitHub Check: Ubuntu GCC Native Instructions (AVX)
- GitHub Check: Ubuntu GCC -O3 OSB
- GitHub Check: Ubuntu GCC Native Instructions
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Analyze (cpp)
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
- GitHub Check: Ubuntu GCC Native Instructions (AVX)
- GitHub Check: Ubuntu GCC -O3 OSB
- GitHub Check: Ubuntu GCC Native Instructions
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS Clang ASAN
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
- GitHub Check: Ubuntu GCC Native Instructions (AVX)
- GitHub Check: Ubuntu GCC -O3 OSB
- GitHub Check: Ubuntu GCC Native Instructions
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS Clang ASAN
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
- GitHub Check: Ubuntu GCC Native Instructions (AVX)
- GitHub Check: Ubuntu GCC -O3 OSB
- GitHub Check: Ubuntu GCC Native Instructions
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS Clang ASAN
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
- GitHub Check: Ubuntu GCC Native Instructions (AVX)
- GitHub Check: Ubuntu GCC -O3 OSB
- GitHub Check: Ubuntu GCC Native Instructions
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS Clang ASAN
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: Windows MSVC 2022 v140 Win64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
- GitHub Check: Ubuntu GCC Native Instructions (AVX)
- GitHub Check: Ubuntu GCC -O3 OSB
- GitHub Check: Ubuntu GCC Native Instructions
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Analyze (cpp)
🔇 Additional comments (4)
cmake/detect-intrinsics.cmake (2)
220-224: Effective approach to address the performance issue.The compile-time check correctly identifies the problematic Clang version and platform combinations that cause NEON performance degradation. This provides a clear path forward for users while preventing the generation of poorly-performing code.
220-224: Refine the preprocessor condition to avoid false positives.The current condition
!defined(__clang_major__)could trigger the error for non-Clang compilers that don't define this macro. The logic should be more precise to only apply to Clang compilers.Apply this diff to make the condition more precise:
- "#if defined(__arm__) && defined(__clang__) && \\ - (!defined(__clang_major__) || __clang_major__ < 20) && \\ - (defined(__ANDROID__) || defined(__APPLE__)) + "#if defined(__arm__) && defined(__clang__) && \\ + defined(__clang_major__) && __clang_major__ < 20 && \\ + (defined(__ANDROID__) || defined(__APPLE__))This ensures the check only applies when we can definitively identify the Clang version, avoiding potential issues with other compilers or unusual build environments.
⛔ Skipped due to learnings
Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1872 File: arch/x86/x86_intrins.h:114-117 Timestamp: 2025-02-23T16:51:54.545Z Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.Learnt from: KungFuJesus PR: zlib-ng/zlib-ng#1872 File: arch/x86/x86_intrins.h:0-0 Timestamp: 2025-02-23T16:50:50.925Z Learning: MSVC does not define `__GNUC__`, so adding `!defined(_MSC_VER)` to GCC detection macros is redundant when `__GNUC__` is already being checked.Learnt from: pps83 PR: zlib-ng/zlib-ng#0 File: :0-0 Timestamp: 2025-01-13T18:28:11.751Z Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.Learnt from: mtl1979 PR: zlib-ng/zlib-ng#1925 File: arch/loongarch/lasxintrin_ext.h:38-65 Timestamp: 2025-07-04T16:59:44.725Z Learning: In zlib-ng, the policy for supporting old compilers is to maintain compatibility for up to two years after GitHub removes support for them in CI, or when there is no long-time contributor left testing for the architecture if the architecture was not supported by GitHub Actions at all.Learnt from: Dead2 PR: zlib-ng/zlib-ng#1797 File: infback.c:58-58 Timestamp: 2024-10-08T10:59:27.889Z Learning: Ensure that the PR description is up-to-date before commenting on inconsistencies between the PR description and the code.Learnt from: Dead2 PR: zlib-ng/zlib-ng#1797 File: infback.c:58-58 Timestamp: 2024-10-12T13:02:26.066Z Learning: Ensure that the PR description is up-to-date before commenting on inconsistencies between the PR description and the code.Learnt from: mtl1979 PR: zlib-ng/zlib-ng#1925 File: arch/loongarch/lasxintrin_ext.h:38-65 Timestamp: 2025-07-04T16:59:44.725Z Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.Learnt from: mtl1979 PR: zlib-ng/zlib-ng#1925 File: arch/loongarch/lasxintrin_ext.h:38-65 Timestamp: 2025-07-04T16:59:44.725Z Learning: LoongArch64 is still a quite new architecture where compilers don't yet have intrinsic functions for all the common operations, making inline assembly implementations necessary as fallbacks.configure (1)
1282-1286: LGTM: Correct implementation of Clang version enforcement for NEON on 32-bit ARM Android/iOS.The preprocessor check correctly identifies the target platforms and enforces the minimum Clang 20 requirement as stated in the PR objectives. The condition properly handles the case where
__clang_major__might not be defined and provides a clear error message.arch/arm/neon_intrins.h (1)
28-32: LGTM: Consistent enforcement of Clang version requirement.The preprocessor check is identical to the one added in the configure script, ensuring consistent enforcement of the minimum Clang 20 requirement for 32-bit ARM Android/iOS with NEON. The placement within the
ARM_NEONguard is appropriate and the error message matches the configure script.
This comment was marked as outdated.
This comment was marked as outdated.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #1942 +/- ##
===========================================
- Coverage 81.87% 81.84% -0.04%
===========================================
Files 162 162
Lines 13923 13923
Branches 3122 3122
===========================================
- Hits 11400 11395 -5
- Misses 1549 1551 +2
- Partials 974 977 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
If Clang 19 breaks a workaround that works for older versions, we should just disable the workaround for Clang 19 and later instead of doing the complete opposite. In current state, it seems this PR introduces a lot of dead code. |
I've only tested with Clang 19 and 20, the workaround was supposed to fix an issue affecting all clang versions, but Clang 20 fixed it. Clang 18, 17, etc all probably also break, I'll test rq to make sure tho. |
|
Okay maybe I've gone insane because suddenly Clang 19 is working again despite nothing changing from yesterday, I'll just rework the patch to disable the workaround on Clang 20 so it can have better performance I guess. |
|
This is what I originally imagined this PR as anyway, but then I got issues when building with Clang 19 with the workaround enabled, so I changed it thinking the code had re-broken at some point and was better off disabled. |
|
Looks good to me, but I am no ARM expert. @ccawley2011 Any comments? |
This seems like it reduces performance, (if not, maybe it should be always applied?), and it isn't needed after Clang 20 so disable it if Clang is new enough.
Also enable the workaround when building with older Clang versions targeting non-mobile platforms like normal Linux. The original issue had discussion of Alpine Linux breaking and testing with qemu-user emulation confirms that builds with Clang 19 are broken without the workaround on Alpine.
Summary by CodeRabbit