Skip to content

Disable NEON workaround on Clang 20 and above, and enable it for non-mobile platforms#1942

Merged
Dead2 merged 3 commits intozlib-ng:developfrom
Un1q32:develop
Aug 3, 2025
Merged

Disable NEON workaround on Clang 20 and above, and enable it for non-mobile platforms#1942
Dead2 merged 3 commits intozlib-ng:developfrom
Un1q32:develop

Conversation

@Un1q32
Copy link
Copy Markdown
Contributor

@Un1q32 Un1q32 commented Jul 22, 2025

This seems like it reduces performance, (if not, maybe it should be always applied?), and it isn't needed after Clang 20 so disable it if Clang is new enough.

Also enable the workaround when building with older Clang versions targeting non-mobile platforms like normal Linux. The original issue had discussion of Alpine Linux breaking and testing with qemu-user emulation confirms that builds with Clang 19 are broken without the workaround on Alpine.

Summary by CodeRabbit

  • Refactor
    • Updated internal logic to improve compatibility of NEON intrinsics with older Clang versions on 32-bit ARM devices. No changes to user-facing features or functionality.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jul 22, 2025

Walkthrough

The code updates conditional compilation logic in the NEON intrinsics header for ARM architectures. It changes the criteria for undefining certain NEON intrinsics, now basing the decision on the Clang compiler version (requiring Clang 20 or newer) for 32-bit ARM targets, instead of previously using an IS_IOS macro and broader platform checks.

Changes

File(s) Change Summary
arch/arm/neon_intrins.h Modified preprocessor directives to undefine specific NEON intrinsics only for 32-bit ARM with Clang versions older than 20; removed IS_IOS macro and platform-specific checks.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~2 minutes

Suggested labels

Build Env, Architecture

Note

⚡️ Unit Test Generation - Beta

CodeRabbit's unit test generation is now available in Beta! Automatically generate comprehensive unit tests for your code changes, ensuring better test coverage and catching edge cases you might miss. Our AI analyzes your code structure and creates tests that follow best practices and your project's testing patterns. Learn more here, or just try it under ✨ Finishing Touches.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6135510 and ba7137e.

📒 Files selected for processing (1)
  • arch/arm/neon_intrins.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • arch/arm/neon_intrins.h
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (70)
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: macOS Clang Symbol Prefix
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: macOS Clang Symbol Prefix
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: macOS Clang Symbol Prefix
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: macOS Clang Symbol Prefix
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: macOS Clang Symbol Prefix
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: macOS Clang Symbol Prefix
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: macOS Clang Symbol Prefix
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: macOS Clang Symbol Prefix
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: macOS Clang Symbol Prefix
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Windows MSVC 2022 v140 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: macOS Clang Symbol Prefix
  • GitHub Check: Ubuntu GCC MIPS
✨ Finishing Touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai generate unit tests to generate unit tests for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
cmake/detect-intrinsics.cmake (1)

220-224: Consider narrowing the Apple platform detection.

The current condition uses __APPLE__ which covers all Apple platforms (macOS, iOS, tvOS, watchOS). If the performance issue is specific to iOS as mentioned in the PR objectives, consider using more specific macros like TARGET_OS_IOS (requires #include <TargetConditionals.h>) to avoid unnecessarily restricting macOS builds.

However, if testing shows the issue affects all Apple ARM platforms, the current broad condition is appropriate.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f7bd199 and a0aa13c.

📒 Files selected for processing (3)
  • arch/arm/neon_intrins.h (1 hunks)
  • cmake/detect-intrinsics.cmake (1 hunks)
  • configure (1 hunks)
🧠 Learnings (4)
📓 Common learnings
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: In zlib-ng, the policy for supporting old compilers is to maintain compatibility for up to two years after GitHub removes support for them in CI, or when there is no long-time contributor left testing for the architecture if the architecture was not supported by GitHub Actions at all.
Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: LoongArch64 is still a quite new architecture where compilers don't yet have intrinsic functions for all the common operations, making inline assembly implementations necessary as fallbacks.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.
cmake/detect-intrinsics.cmake (6)

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by defined(_MSC_VER) && !defined(__clang__), making them mutually exclusive.

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define __GNUC__, so adding !defined(_MSC_VER) to GCC detection macros is redundant when __GNUC__ is already being checked.

Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.

Learnt from: mtl1979
PR: #1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.

Learnt from: KungFuJesus
PR: #1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In inffast_tpl.h, when AVX512 is enabled, the branch involving chunkcopy_safe is intentionally eliminated to optimize performance.

Learnt from: mtl1979
PR: #1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:30:10.081Z
Learning: Feature detection modules like riscv_features.c should not be compiled with feature-specific flags (like RVVFLAG) because they need to be compilable on all systems regardless of feature support. These modules perform runtime detection and must initialize feature availability flags to zero on unsupported systems.

configure (3)

Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.

Learnt from: mtl1979
PR: #1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:20:52.333Z
Learning: Optional feature flags in Makefile.in templates should default to empty strings. The configure script should detect feature support and update these variables accordingly during the configuration process.

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define __GNUC__, so adding !defined(_MSC_VER) to GCC detection macros is redundant when __GNUC__ is already being checked.

arch/arm/neon_intrins.h (11)

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by defined(_MSC_VER) && !defined(__clang__), making them mutually exclusive.

Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.

Learnt from: KungFuJesus
PR: #1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In inffast_tpl.h, when AVX512 is enabled, the branch involving chunkcopy_safe is intentionally eliminated to optimize performance.

Learnt from: KungFuJesus
PR: #1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In arch/x86/chunkset_avx2.c, when working with AVX2-capable x86 CPUs, unaligned memory access using _mm_loadu_si128 is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.

Learnt from: KungFuJesus
PR: #1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In arch/x86/chunkset_avx2.c, when working with AVX2-capable x86 CPUs, unaligned memory access using _mm_loadu_si128 is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.

Learnt from: mtl1979
PR: #1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:41:10.063Z
Learning: For SSE2 optimizations, _mm_cvtsi128_si64 should be used instead of _mm_extract_epi64 (SSE4.1) for extracting 64-bit values from 128-bit vectors, as it generates more efficient movq instructions.

Learnt from: KungFuJesus
PR: #1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In arch/x86/chunkset_avx2.c, within the GET_HALFCHUNK_MAG function, using a conditional branch to select between _mm_loadl_epi64 and _mm_loadu_si128 is not recommended because the branching cost outweighs the savings from the load.

Learnt from: Ag-Cu
PR: #1905
File: arch/riscv/crc32_riscv.c:0-0
Timestamp: 2025-04-22T07:05:36.825Z
Learning: On RISC-V architectures running Linux, misaligned memory accesses are guaranteed to work for userspace applications as documented in the kernel's UABI specifications. While ISO C considers unaligned pointer casts as undefined behavior, the Linux kernel guarantees support for misaligned loads/stores either through hardware support or kernel emulation (with potential performance impact). This does not apply to atomic operations, which must remain aligned.

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: In zlib-ng, the policy for supporting old compilers is to maintain compatibility for up to two years after GitHub removes support for them in CI, or when there is no long-time contributor left testing for the architecture if the architecture was not supported by GitHub Actions at all.
Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: LoongArch64 is still a quite new architecture where compilers don't yet have intrinsic functions for all the common operations, making inline assembly implementations necessary as fallbacks.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.
cmake/detect-intrinsics.cmake (6)

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by defined(_MSC_VER) && !defined(__clang__), making them mutually exclusive.

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define __GNUC__, so adding !defined(_MSC_VER) to GCC detection macros is redundant when __GNUC__ is already being checked.

Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.

Learnt from: mtl1979
PR: #1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.

Learnt from: KungFuJesus
PR: #1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In inffast_tpl.h, when AVX512 is enabled, the branch involving chunkcopy_safe is intentionally eliminated to optimize performance.

Learnt from: mtl1979
PR: #1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:30:10.081Z
Learning: Feature detection modules like riscv_features.c should not be compiled with feature-specific flags (like RVVFLAG) because they need to be compilable on all systems regardless of feature support. These modules perform runtime detection and must initialize feature availability flags to zero on unsupported systems.

configure (3)

Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.

Learnt from: mtl1979
PR: #1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:20:52.333Z
Learning: Optional feature flags in Makefile.in templates should default to empty strings. The configure script should detect feature support and update these variables accordingly during the configuration process.

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define __GNUC__, so adding !defined(_MSC_VER) to GCC detection macros is redundant when __GNUC__ is already being checked.

arch/arm/neon_intrins.h (11)

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by defined(_MSC_VER) && !defined(__clang__), making them mutually exclusive.

Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.

Learnt from: KungFuJesus
PR: #1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In inffast_tpl.h, when AVX512 is enabled, the branch involving chunkcopy_safe is intentionally eliminated to optimize performance.

Learnt from: KungFuJesus
PR: #1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In arch/x86/chunkset_avx2.c, when working with AVX2-capable x86 CPUs, unaligned memory access using _mm_loadu_si128 is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.

Learnt from: KungFuJesus
PR: #1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In arch/x86/chunkset_avx2.c, when working with AVX2-capable x86 CPUs, unaligned memory access using _mm_loadu_si128 is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.

Learnt from: mtl1979
PR: #1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:41:10.063Z
Learning: For SSE2 optimizations, _mm_cvtsi128_si64 should be used instead of _mm_extract_epi64 (SSE4.1) for extracting 64-bit values from 128-bit vectors, as it generates more efficient movq instructions.

Learnt from: KungFuJesus
PR: #1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In arch/x86/chunkset_avx2.c, within the GET_HALFCHUNK_MAG function, using a conditional branch to select between _mm_loadl_epi64 and _mm_loadu_si128 is not recommended because the branching cost outweighs the savings from the load.

Learnt from: Ag-Cu
PR: #1905
File: arch/riscv/crc32_riscv.c:0-0
Timestamp: 2025-04-22T07:05:36.825Z
Learning: On RISC-V architectures running Linux, misaligned memory accesses are guaranteed to work for userspace applications as documented in the kernel's UABI specifications. While ISO C considers unaligned pointer casts as undefined behavior, the Linux kernel guarantees support for misaligned loads/stores either through hardware support or kernel emulation (with potential performance impact). This does not apply to atomic operations, which must remain aligned.

Learnt from: KungFuJesus
PR: #1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (95)
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: Ubuntu GCC Native Instructions
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Analyze (cpp)
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: Ubuntu GCC Native Instructions
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Analyze (cpp)
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: Ubuntu GCC Native Instructions
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Analyze (cpp)
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: Ubuntu GCC Native Instructions
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Analyze (cpp)
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: Ubuntu GCC Native Instructions
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Analyze (cpp)
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: Ubuntu GCC Native Instructions
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Analyze (cpp)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: Ubuntu GCC Native Instructions
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Analyze (cpp)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: Ubuntu GCC Native Instructions
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Analyze (cpp)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: Ubuntu GCC Native Instructions
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Analyze (cpp)
  • GitHub Check: macOS Clang ASAN
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: Windows MSVC 2022 v140 Win64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC -O3 OSB add_subdirectory
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC -O3 OSB
  • GitHub Check: Ubuntu GCC Native Instructions
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Analyze (cpp)
🔇 Additional comments (4)
cmake/detect-intrinsics.cmake (2)

220-224: Effective approach to address the performance issue.

The compile-time check correctly identifies the problematic Clang version and platform combinations that cause NEON performance degradation. This provides a clear path forward for users while preventing the generation of poorly-performing code.


220-224: Refine the preprocessor condition to avoid false positives.

The current condition !defined(__clang_major__) could trigger the error for non-Clang compilers that don't define this macro. The logic should be more precise to only apply to Clang compilers.

Apply this diff to make the condition more precise:

-        "#if defined(__arm__) && defined(__clang__) && \\
-            (!defined(__clang_major__) || __clang_major__ < 20) && \\
-            (defined(__ANDROID__) || defined(__APPLE__))
+        "#if defined(__arm__) && defined(__clang__) && \\
+            defined(__clang_major__) && __clang_major__ < 20 && \\
+            (defined(__ANDROID__) || defined(__APPLE__))

This ensures the check only applies when we can definitively identify the Clang version, avoiding potential issues with other compilers or unusual build environments.

⛔ Skipped due to learnings
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define `__GNUC__`, so adding `!defined(_MSC_VER)` to GCC detection macros is redundant when `__GNUC__` is already being checked.
Learnt from: pps83
PR: zlib-ng/zlib-ng#0
File: :0-0
Timestamp: 2025-01-13T18:28:11.751Z
Learning: In zlib-ng, removing preprocessor guards (like HAVE_*) should be described as removing unused conditional compilation checks rather than removing functionality, as the underlying implementation often remains intact.
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: In zlib-ng, the policy for supporting old compilers is to maintain compatibility for up to two years after GitHub removes support for them in CI, or when there is no long-time contributor left testing for the architecture if the architecture was not supported by GitHub Actions at all.
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1797
File: infback.c:58-58
Timestamp: 2024-10-08T10:59:27.889Z
Learning: Ensure that the PR description is up-to-date before commenting on inconsistencies between the PR description and the code.
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1797
File: infback.c:58-58
Timestamp: 2024-10-12T13:02:26.066Z
Learning: Ensure that the PR description is up-to-date before commenting on inconsistencies between the PR description and the code.
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: For new architectures like LoongArch64, inline assembly fallbacks are necessary because compilers don't yet have intrinsic functions for all common operations. This requires maintaining complex inline assembly implementations until the compiler ecosystem matures.
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: LoongArch64 is still a quite new architecture where compilers don't yet have intrinsic functions for all the common operations, making inline assembly implementations necessary as fallbacks.
configure (1)

1282-1286: LGTM: Correct implementation of Clang version enforcement for NEON on 32-bit ARM Android/iOS.

The preprocessor check correctly identifies the target platforms and enforces the minimum Clang 20 requirement as stated in the PR objectives. The condition properly handles the case where __clang_major__ might not be defined and provides a clear error message.

arch/arm/neon_intrins.h (1)

28-32: LGTM: Consistent enforcement of Clang version requirement.

The preprocessor check is identical to the one added in the configure script, ensuring consistent enforcement of the minimum Clang 20 requirement for 32-bit ARM Android/iOS with NEON. The placement within the ARM_NEON guard is appropriate and the error message matches the configure script.

@Un1q32

This comment was marked as outdated.

@codecov
Copy link
Copy Markdown

codecov bot commented Jul 22, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.84%. Comparing base (f7bd199) to head (ba7137e).
⚠️ Report is 4 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1942      +/-   ##
===========================================
- Coverage    81.87%   81.84%   -0.04%     
===========================================
  Files          162      162              
  Lines        13923    13923              
  Branches      3122     3122              
===========================================
- Hits         11400    11395       -5     
- Misses        1549     1551       +2     
- Partials       974      977       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Un1q32 Un1q32 changed the title Disallow NEON when on 32-bit Android and iOS with Clang versions less than 20 Disallow NEON when on 32-bit arm with Clang versions less than 20 Jul 22, 2025
@Un1q32 Un1q32 changed the title Disallow NEON when on 32-bit arm with Clang versions less than 20 Disable NEON when on 32-bit arm with Clang versions less than 20 Jul 22, 2025
@mtl1979
Copy link
Copy Markdown
Collaborator

mtl1979 commented Jul 23, 2025

If Clang 19 breaks a workaround that works for older versions, we should just disable the workaround for Clang 19 and later instead of doing the complete opposite. In current state, it seems this PR introduces a lot of dead code.

@Un1q32
Copy link
Copy Markdown
Contributor Author

Un1q32 commented Jul 24, 2025

If Clang 19 breaks a workaround that works for older versions, we should just disable the workaround for Clang 19 and later instead of doing the complete opposite. In current state, it seems this PR introduces a lot of dead code.

I've only tested with Clang 19 and 20, the workaround was supposed to fix an issue affecting all clang versions, but Clang 20 fixed it. Clang 18, 17, etc all probably also break, I'll test rq to make sure tho.

@Un1q32
Copy link
Copy Markdown
Contributor Author

Un1q32 commented Jul 24, 2025

Okay maybe I've gone insane because suddenly Clang 19 is working again despite nothing changing from yesterday, I'll just rework the patch to disable the workaround on Clang 20 so it can have better performance I guess.

@Un1q32 Un1q32 changed the title Disable NEON when on 32-bit arm with Clang versions less than 20 Disable NEON workaround on Clang 20 and above, and enable it for non-mobile platforms Jul 24, 2025
@Un1q32
Copy link
Copy Markdown
Contributor Author

Un1q32 commented Jul 24, 2025

This is what I originally imagined this PR as anyway, but then I got issues when building with Clang 19 with the workaround enabled, so I changed it thinking the code had re-broken at some point and was better off disabled.

@Dead2
Copy link
Copy Markdown
Member

Dead2 commented Jul 27, 2025

Looks good to me, but I am no ARM expert.

@ccawley2011 Any comments?

Copy link
Copy Markdown
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dead2 Dead2 merged commit 3a52db3 into zlib-ng:develop Aug 3, 2025
148 of 150 checks passed
@Dead2 Dead2 mentioned this pull request Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants