Skip to content

Explicit SSE2 vectorization of Chorba CRC method#1872

Merged
Dead2 merged 1 commit intozlib-ng:developfrom
KungFuJesus:sse2_chorba
Mar 28, 2025
Merged

Explicit SSE2 vectorization of Chorba CRC method#1872
Dead2 merged 1 commit intozlib-ng:developfrom
KungFuJesus:sse2_chorba

Conversation

@KungFuJesus
Copy link
Copy Markdown
Collaborator

@KungFuJesus KungFuJesus commented Feb 16, 2025

The version that's currently in the generic implementation for 32768 byte buffers leverages the stack. It manages to autovectorize but unfortunately the trips to the stack hurt its performance for CPUs which need this the most. This version is explicitly SIMD vectorized and doesn't use trips to the stack. In my testing it's ~10% faster than the "small" variant, and about 42% faster than the "32768" variant.

Summary by CodeRabbit

  • New Features

    • Introduced an optimized CRC32 routine that leverages advanced SIMD (SSE2) instructions for improved performance on modern x86 systems.
    • Added support for a new CRC32 calculation method, crc32_chorba_sse2, enhancing efficiency for larger datasets.
    • Updated the build configuration to include the new chorba_sse2 functionality across different environments.
  • Tests

    • Added new benchmarks and test cases to validate the performance and reliability of the optimized CRC32 functionality.
    • Introduced a benchmark for the chorba_sse2 implementation.
    • Added a test case for the chorba_sse2 CRC32 functionality.
    • Added a new function declaration for crc32_braid_internal to enhance CRC32 functionality.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Feb 16, 2025

Walkthrough

The pull request introduces a new CRC32 calculation implementation using SSE2 intrinsics, named chorba_sse2. It updates the build configuration across CMake, Makefiles, and configure scripts to include the new source file and corresponding targets. The source and header files declare and assign the new functions for computing CRC32. Additionally, new tests and benchmarks validate the functionality, and the function pointer table is updated to select the new implementation when applicable.

Changes

Files Change Summary
CMakeLists.txt, arch/x86/Makefile.in, configure, win32/Makefile.msc Updated build configuration to include the new chorba_sse2 source file and corresponding object targets; replaced references to the previous SSE2 object file with the new one.
arch/x86/chorba_sse2.c Introduced a new CRC32 implementation using SSE2 intrinsics with functions chorba_small_nondestructive_sse2 and crc32_chorba_sse2 that process data using SIMD operations and handle data alignment.
arch/x86/x86_functions.h, functable.c, crc32.h Added declaration for crc32_chorba_sse2 and crc32_braid_internal; updated the function table to assign crc32_chorba_sse2 as the CRC32 function when SSE2 is enabled.
arch/x86/x86_intrins.h, test/benchmarks/benchmark_crc32.cc, test/test_crc32.cc Added new inline functions and macros for 128-/64-bit conversions; inserted benchmark and test cases for the chorba_sse2 implementation under appropriate preprocessor conditions, ensuring testing only when SSE2 features are available.
test/test_crc32.cc New test case added for chorba_sse2 implementation, ensuring coverage under specific CPU feature checks.

Sequence Diagram(s)

sequenceDiagram
    participant Init as init_functable
    participant FT as Function Table
    participant CRC as crc32_chorba_sse2
    Init->>FT: Setup function table
    FT-->>Init: Assign crc32 pointer to crc32_chorba_sse2 (if SSE2 enabled)
Loading
sequenceDiagram
    participant App as Application
    participant FT as Function Table
    participant CRC as crc32_chorba_sse2
    App->>FT: Invoke CRC32 function via pointer
    FT->>CRC: Call crc32_chorba_sse2
    CRC-->>App: Return computed CRC32 value
Loading

Possibly related PRs

  • Clean up crc32_braid. #1873: The changes in the main PR, which involve updating the SSE2_SRCS variable in CMakeLists.txt to include chorba_sse2.c, are related as both involve modifications to the CRC32 functionality, specifically the integration of the chorba_sse2 implementation in the build process.
  • Chorba #1837: The changes in the main PR, which involve updating the SSE2_SRCS variable to include chorba_sse2.c, are related as both involve the implementation and optimization of CRC32 calculations using the Chorba algorithm, specifically in the context of SSE2 architecture.

Suggested reviewers

  • Dead2
  • nmoinvaz
  • mtl1979
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
arch/x86/chorba_sse2.c (1)

28-805: Modularize or annotate complex SSE logic
The function chorba_small_nondestructive_sse is highly unrolled, making it difficult to maintain or extend. Consider segmenting repeated rounds into helper functions or adding inline comments for clarity. Also confirm that the leftover handling with memcpy(final, ...) is safe for all edge cases, given final is only 72 bytes.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e8d8049 and 7a567b1.

📒 Files selected for processing (9)
  • CMakeLists.txt (2 hunks)
  • arch/generic/crc32_c.c (0 hunks)
  • arch/x86/Makefile.in (2 hunks)
  • arch/x86/chorba_sse2.c (1 hunks)
  • arch/x86/x86_functions.h (2 hunks)
  • configure (1 hunks)
  • crc32_c.h (1 hunks)
  • functable.c (1 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
💤 Files with no reviewable changes (1)
  • arch/generic/crc32_c.c
✅ Files skipped from review due to trivial changes (1)
  • crc32_c.h
🧰 Additional context used
🧠 Learnings (1)
arch/x86/x86_functions.h (1)
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c

[error] 807-807: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)

🔇 Additional comments (12)
arch/x86/chorba_sse2.c (3)

16-26: Validate SSE load/store alignment usage in macros
The READ_NEXT macro uses _mm_load_si128, which requires 16-byte alignment. Verify that all call sites or buffer pointers are indeed 16-byte aligned to prevent potential undefined behavior on some systems.


807-835: Configure the unknown PREFIX macro
A static analysis tool flagged an unknown macro on line 807. If PREFIX is defined elsewhere or via compiler flags, ensure it is properly included to silence the warning. Also verify that the alignment calculation at lines 813-814 guarantees 16-byte alignment in all scenarios.

🧰 Tools
🪛 Cppcheck (2.10-2)

[error] 807-807: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)


1-13:

Details

✅ Verification successful

Ensure consistency of thresholds with external references
These macros define buffer sizes and are potentially interrelated with other thresholds (e.g., CHORBA_LARGE_THRESHOLD used later). Ensure that upstream or neighboring modules share compatible constants to avoid mismatched optimizations or unintended behavior.

You can run a search to locate all references to these bitbuffer macros and confirm consistent usage:


🏁 Script executed:

#!/bin/bash
# Searching for definitions or references of bitbuffersize or CHORBA_LARGE_THRESHOLD across codebase
rg 'bitbuffersize|CHORBA_LARGE_THRESHOLD'

Length of output: 4876


Review Verification: Constants are Consistent

  • Verified that bitbuffersizebytes, bitbuffersizezwords, and bitbuffersizeqwords are defined identically in both arch/x86/chorba_sse2.c and arch/generic/crc32_chorba_c.c.
  • Confirmed that the upstream threshold CHORBA_LARGE_THRESHOLD (set as sizeof(z_word_t) * 64 * 1024 in crc32_c.h) is used consistently in conditions across the codebase.
  • The intentional difference between the 16 KB-based bit buffer macros and the 64 KB-based threshold appears to be by design. Please continue to ensure that any future modifications maintain this interrelated configuration across modules.
test/benchmarks/benchmark_crc32.cc (1)

61-61: Benchmark addition is valid
Registering the chorba_sse benchmark correctly integrates the new SSE2-optimized CRC32 routine. No issues found.

arch/x86/Makefile.in (2)

37-37: Build targets for SSE2 Chorba
Adding chorba_sse2.o chorba_sse2.lo is consistent with parallel SSE2 targets. No further concerns.


75-80: Verify compiler flags for SSE2 compilation
The newly introduced lines use $(SSE2FLAG) consistently with existing SSE2 builds. Ensure no additional flags (e.g., -march=) are needed for your target environments.

arch/x86/x86_functions.h (2)

72-74: Well-structured addition of SSE2-optimized CRC32 function!

The new function declaration and its conditional compilation guard are properly placed and follow the existing patterns for CRC32 functions.


95-96: Appropriate macro redefinition for SSE2 optimization!

The native_crc32 macro is correctly redefined to use the new SSE2-optimized implementation when SSE2 support is available.

functable.c (1)

85-85: Correct function pointer assignment for SSE2-optimized CRC32!

The new crc32_chorba_sse function is properly assigned to the function table when SSE2 support is available.

configure (1)

1573-1574: Proper integration of new SSE2-optimized object files!

The chorba_sse2 object files are correctly added to both static and shared library builds, maintaining consistency with the existing build system.

CMakeLists.txt (2)

915-916: Explicitly Enabling the New SSE2 Chorba CRC Implementation

These new lines add the definition -DX86_CHORBA_SSE_CRC and update the SSE2_SRCS variable to include the new source file chorba_sse2.c along with the other SSE2-related files. This is in line with the PR objectives to introduce an explicitly vectorized Chorba CRC implementation.


925-925: Documenting the New SSE2 Chorba CRC Feature

The added call to add_feature_info(X86_CHORBA_SSE_CRC 1 "Support SSE2-accelerated crc32, using \"${SSE2FLAG}\"") documents the new capability. It clearly informs users about the availability of the SSE2-accelerated CRC32 function enabled by the added macro.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (3)
arch/x86/chorba_sse2.c (3)

16-26: Fix trailing whitespace in macro definitions.

The macros are well-defined but contain trailing whitespace that should be removed to comply with the coding style.

-#define NEXT_ROUND(invec, a, b, c, d) do { \
+#define NEXT_ROUND(invec, a, b, c, d) do { \
         a = _mm_xor_si128(_mm_slli_epi64(invec, 17), _mm_slli_epi64(invec, 55)); \
         b = _mm_xor_si128(_mm_xor_si128(_mm_srli_epi64(invec, 47), _mm_srli_epi64(invec, 9)), _mm_slli_epi64(invec, 19)); \
         c = _mm_xor_si128(_mm_srli_epi64(invec, 45), _mm_slli_epi64(invec, 44)); \
-        d  = _mm_srli_epi64(invec, 20); \
+        d = _mm_srli_epi64(invec, 20); \
         } while (0);

28-805: Verify the performance impact of the SSE2 implementation.

The implementation looks solid with careful handling of data alignment and efficient use of SSE2 intrinsics. However, we should verify:

  1. The performance improvement claims (10% over "small" variant, 42% over "32768" variant)
  2. The impact of memory alignment requirements on different platforms

Consider adding performance benchmarks to validate the improvements across different CPU architectures and data sizes.

🧰 Tools
🪛 GitHub Actions: Lint

[warning] 103-103: Trailing whitespace.


[warning] 113-113: Trailing whitespace.


[warning] 129-129: Trailing whitespace.


[warning] 136-136: Trailing whitespace.


[warning] 185-185: Trailing whitespace.


[warning] 212-212: Trailing whitespace.


[warning] 222-222: Trailing whitespace.


[warning] 229-229: Trailing whitespace.


[warning] 275-275: Trailing whitespace.


[warning] 298-298: Trailing whitespace.


[warning] 311-311: Trailing whitespace.


[warning] 312-312: Trailing whitespace.


[warning] 356-356: Trailing whitespace.


[warning] 384-384: Trailing whitespace.


[warning] 393-393: Trailing whitespace.


[warning] 394-394: Trailing whitespace.


[warning] 395-395: Trailing whitespace.


[warning] 416-416: Trailing whitespace.


[warning] 441-441: Trailing whitespace.


[warning] 471-471: Trailing whitespace.


[warning] 479-479: Trailing whitespace.


[warning] 480-480: Trailing whitespace.


[warning] 481-481: Trailing whitespace.


[warning] 524-524: Trailing whitespace.


[warning] 550-550: Trailing whitespace.


[warning] 572-572: Trailing whitespace.


[warning] 573-573: Trailing whitespace.


[warning] 603-603: Trailing whitespace.


[warning] 619-619: Trailing whitespace.


[warning] 633-633: Trailing whitespace.


[warning] 634-634: Trailing whitespace.


[warning] 635-635: Trailing whitespace.


[warning] 674-674: Trailing whitespace.


[warning] 691-691: Trailing whitespace.


[warning] 718-718: Trailing whitespace.


[warning] 720-720: Trailing whitespace.


[warning] 721-721: Trailing whitespace.


[warning] 722-722: Trailing whitespace.


[warning] 761-761: Trailing whitespace.


[warning] 776-776: Trailing whitespace.

🪛 GitHub Actions: Libpng

[error] 38-38: call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations


[error] 38-38: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'

🪛 GitHub Actions: Static Analysis

[error] 38-38: Call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations.


[error] 38-38: Initializing '__m128i' with an expression of incompatible type 'int'.


1-836: Clean up trailing whitespace throughout the file.

Multiple instances of trailing whitespace were detected by the linter. This should be cleaned up to maintain consistent code style.

Run the following command to remove trailing whitespace:

#!/bin/bash
# Remove trailing whitespace
sed -i 's/[[:space:]]*$//' arch/x86/chorba_sse2.c
🧰 Tools
🪛 Cppcheck (2.10-2)

[error] 807-807: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)

🪛 GitHub Actions: Lint

[warning] 27-27: Trailing whitespace.


[warning] 103-103: Trailing whitespace.


[warning] 113-113: Trailing whitespace.


[warning] 129-129: Trailing whitespace.


[warning] 136-136: Trailing whitespace.


[warning] 185-185: Trailing whitespace.


[warning] 212-212: Trailing whitespace.


[warning] 222-222: Trailing whitespace.


[warning] 229-229: Trailing whitespace.


[warning] 275-275: Trailing whitespace.


[warning] 298-298: Trailing whitespace.


[warning] 311-311: Trailing whitespace.


[warning] 312-312: Trailing whitespace.


[warning] 356-356: Trailing whitespace.


[warning] 384-384: Trailing whitespace.


[warning] 393-393: Trailing whitespace.


[warning] 394-394: Trailing whitespace.


[warning] 395-395: Trailing whitespace.


[warning] 416-416: Trailing whitespace.


[warning] 441-441: Trailing whitespace.


[warning] 471-471: Trailing whitespace.


[warning] 479-479: Trailing whitespace.


[warning] 480-480: Trailing whitespace.


[warning] 481-481: Trailing whitespace.


[warning] 524-524: Trailing whitespace.


[warning] 550-550: Trailing whitespace.


[warning] 572-572: Trailing whitespace.


[warning] 573-573: Trailing whitespace.


[warning] 603-603: Trailing whitespace.


[warning] 619-619: Trailing whitespace.


[warning] 633-633: Trailing whitespace.


[warning] 634-634: Trailing whitespace.


[warning] 635-635: Trailing whitespace.


[warning] 674-674: Trailing whitespace.


[warning] 691-691: Trailing whitespace.


[warning] 718-718: Trailing whitespace.


[warning] 720-720: Trailing whitespace.


[warning] 721-721: Trailing whitespace.


[warning] 722-722: Trailing whitespace.


[warning] 761-761: Trailing whitespace.


[warning] 776-776: Trailing whitespace.

🪛 GitHub Actions: Libpng

[error] 38-38: call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations


[error] 38-38: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'

🪛 GitHub Actions: Static Analysis

[error] 38-38: Call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations.


[error] 38-38: Initializing '__m128i' with an expression of incompatible type 'int'.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7a567b1 and 00abf62.

📒 Files selected for processing (10)
  • CMakeLists.txt (2 hunks)
  • arch/generic/crc32_c.c (0 hunks)
  • arch/x86/Makefile.in (2 hunks)
  • arch/x86/chorba_sse2.c (1 hunks)
  • arch/x86/x86_functions.h (2 hunks)
  • arch/x86/x86_intrins.h (1 hunks)
  • configure (1 hunks)
  • crc32_c.h (1 hunks)
  • functable.c (1 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
💤 Files with no reviewable changes (1)
  • arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (7)
  • test/benchmarks/benchmark_crc32.cc
  • arch/x86/Makefile.in
  • functable.c
  • arch/x86/x86_functions.h
  • crc32_c.h
  • CMakeLists.txt
  • configure
🧰 Additional context used
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c

[error] 807-807: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)

🪛 GitHub Actions: Lint
arch/x86/chorba_sse2.c

[warning] 27-27: Trailing whitespace.


[warning] 103-103: Trailing whitespace.


[warning] 113-113: Trailing whitespace.


[warning] 129-129: Trailing whitespace.


[warning] 136-136: Trailing whitespace.


[warning] 185-185: Trailing whitespace.


[warning] 212-212: Trailing whitespace.


[warning] 222-222: Trailing whitespace.


[warning] 229-229: Trailing whitespace.


[warning] 275-275: Trailing whitespace.


[warning] 298-298: Trailing whitespace.


[warning] 311-311: Trailing whitespace.


[warning] 312-312: Trailing whitespace.


[warning] 356-356: Trailing whitespace.


[warning] 384-384: Trailing whitespace.


[warning] 393-393: Trailing whitespace.


[warning] 394-394: Trailing whitespace.


[warning] 395-395: Trailing whitespace.


[warning] 416-416: Trailing whitespace.


[warning] 441-441: Trailing whitespace.


[warning] 471-471: Trailing whitespace.


[warning] 479-479: Trailing whitespace.


[warning] 480-480: Trailing whitespace.


[warning] 481-481: Trailing whitespace.


[warning] 524-524: Trailing whitespace.


[warning] 550-550: Trailing whitespace.


[warning] 572-572: Trailing whitespace.


[warning] 573-573: Trailing whitespace.


[warning] 603-603: Trailing whitespace.


[warning] 619-619: Trailing whitespace.


[warning] 633-633: Trailing whitespace.


[warning] 634-634: Trailing whitespace.


[warning] 635-635: Trailing whitespace.


[warning] 674-674: Trailing whitespace.


[warning] 691-691: Trailing whitespace.


[warning] 718-718: Trailing whitespace.


[warning] 720-720: Trailing whitespace.


[warning] 721-721: Trailing whitespace.


[warning] 722-722: Trailing whitespace.


[warning] 761-761: Trailing whitespace.


[warning] 776-776: Trailing whitespace.

🪛 GitHub Actions: Libpng
arch/x86/chorba_sse2.c

[error] 38-38: call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations


[error] 38-38: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'

🪛 GitHub Actions: Static Analysis
arch/x86/chorba_sse2.c

[error] 38-38: Call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations.


[error] 38-38: Initializing '__m128i' with an expression of incompatible type 'int'.

⏰ Context from checks skipped due to timeout of 90000ms (28)
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
🔇 Additional comments (2)
arch/x86/x86_intrins.h (1)

92-96:

Details

❓ Verification inconclusive

LGTM! Verify the intrinsic compatibility across MSVC versions.

The macro definition provides compatibility for the _mm_cvtsi64x_si128 intrinsic in Visual Studio, consistent with other MSVC-specific fixes in this file.


🌐 Web query:

What is the difference between _mm_cvtsi64x_si128 and _mm_cvtsi64_si128 intrinsics in MSVC?

💡 Result:

The _mm_cvtsi64_si128 and _mm_cvtsi64x_si128 intrinsics in MSVC both move 64-bit integers into SSE registers but differ in their architectural compatibility and parameter types:

Key Differences

Intrinsic Architecture Support Parameter Type Behavior
_mm_cvtsi64_si128 x86-32 only __int64 Moves a 64-bit integer to the lower 64 bits of __m128i, zeroing upper bits[6].
_mm_cvtsi64x_si128 x86-32 and x86-64 long long/int64_t Designed for cross-platform compatibility; behaves identically to _mm_cvtsi64_si128 on x86-32 but may adapt to x86-64 calling conventions[10].

Technical Details

  • _mm_cvtsi64_si128:

    • Explicitly uses the __int64 type (MSVC-specific).
    • Generates the MOVQ instruction to copy a 64-bit integer into an SSE register[6].
    • Not supported on x86-64 targets where inline assembly is unavailable[1].
  • _mm_cvtsi64x_si128:

    • Uses standard long long/int64_t types for portability.
    • Ensures compatibility when compiling for both x86-32 and x86-64 architectures[10].
    • On x86-32, functionally identical to _mm_cvtsi64_si128.

Practical Use Cases

  • Use _mm_cvtsi64_si128 when targeting x86-32 specifically with MSVC.
  • Prefer _mm_cvtsi64x_si128 for cross-compiler/cross-arch code involving int64_t.

Both intrinsics ultimately map to similar low-level operations but address differences in type systems and architectural constraints[6][10].

Citations:


LGTM with a caution! The macro mapping of _mm_cvtsi64x_si128 to _mm_cvtsi64_si128 correctly addresses the missing definition in Visual Studio and aligns with other MSVC-specific fixes in the file. Based on the observed differences:

  • _mm_cvtsi64_si128 is historically intended for x86-32 builds (accepting a __int64), while
  • _mm_cvtsi64x_si128 is designed for cross-platform support (using long long/int64_t).

Please verify that this substitution works correctly across all targeted MSVC architectures—especially on x86-64—and that any parameter type nuances do not lead to unexpected behavior.

arch/x86/chorba_sse2.c (1)

1-14: LGTM! Clear conditional compilation and includes.

The header section is well-organized with proper include guards and necessary headers for SSE2 operations.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
arch/x86/chorba_sse2.c (1)

1-27: LGTM! Well-structured macros for SSE2 operations.

The macros effectively encapsulate complex SSE2 operations, making the code more maintainable and readable.

Remove trailing whitespace on line 27.

🧰 Tools
🪛 GitHub Actions: Lint

[warning] 27-27: Trailing whitespace.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 00abf62 and 2b259d9.

📒 Files selected for processing (10)
  • CMakeLists.txt (2 hunks)
  • arch/generic/crc32_c.c (0 hunks)
  • arch/x86/Makefile.in (2 hunks)
  • arch/x86/chorba_sse2.c (1 hunks)
  • arch/x86/x86_functions.h (2 hunks)
  • arch/x86/x86_intrins.h (1 hunks)
  • configure (1 hunks)
  • crc32_c.h (1 hunks)
  • functable.c (1 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
💤 Files with no reviewable changes (1)
  • arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (8)
  • functable.c
  • arch/x86/Makefile.in
  • test/benchmarks/benchmark_crc32.cc
  • arch/x86/x86_intrins.h
  • arch/x86/x86_functions.h
  • configure
  • CMakeLists.txt
  • crc32_c.h
🧰 Additional context used
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c

[error] 806-806: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)

🪛 GitHub Actions: Lint
arch/x86/chorba_sse2.c

[warning] 27-27: Trailing whitespace.


[warning] 112-112: Trailing whitespace.


[warning] 128-128: Trailing whitespace.


[warning] 221-221: Trailing whitespace.


[warning] 355-355: Trailing whitespace.


[warning] 383-383: Trailing whitespace.


[warning] 760-760: Trailing whitespace.


[warning] 775-775: Trailing whitespace.

🪛 GitHub Actions: Libpng
arch/x86/chorba_sse2.c

[error] 38-38: Call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations.


[error] 38-38: Initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'.

🪛 GitHub Actions: Static Analysis
arch/x86/chorba_sse2.c

[error] 38-38: Call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations.


[error] 38-38: Initializing '__m128i' with an expression of incompatible type 'int'.

🪛 GitHub Actions: Package Check
arch/x86/chorba_sse2.c

[warning] 38-38: Implicit declaration of function ‘_mm_cvtsi64x_si128’; did you mean ‘_mm_cvtsi32_si128’?


[error] 38-38: Incompatible types when initializing type ‘__m128i’ using type ‘int’.


[warning] 729-729: Implicit declaration of function ‘_mm_cvtsi128_si64’; did you mean ‘_mm_cvtsi128_si32’?

⏰ Context from checks skipped due to timeout of 90000ms (30)
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
🔇 Additional comments (1)
arch/x86/chorba_sse2.c (1)

806-834: Add documentation about alignment requirements.

The function handles memory alignment carefully, but the alignment requirements and their impact on performance should be documented.

Add a comment explaining the alignment logic:

+/* 
+ * Ensures 16-byte alignment for SSE2 operations:
+ * 1. Processes unaligned prefix using scalar implementation
+ * 2. Main body uses aligned SSE2 operations for optimal performance
+ * 3. Processes remaining bytes using scalar implementation
+ */
 Z_INTERNAL uint32_t PREFIX(crc32_chorba_sse)(uint32_t crc, const uint8_t *buf, size_t len) {

The PREFIX macro documentation was already requested in a previous review.

🧰 Tools
🪛 Cppcheck (2.10-2)

[error] 806-806: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
arch/x86/chorba_sse2.c (1)

9-9: ⚠️ Potential issue

Fix include path for x86_intrins.h.

The angled brackets in the include path are causing build failures. Use quotes instead.

-#include <x86_intrins.h>
+#include "x86_intrins.h"
🧰 Tools
🪛 GitHub Actions: Libpng

[error] 9-9: 'x86_intrins.h' file not found with include; use "quotes" instead

🪛 GitHub Actions: CodeQL

[error] 9-9: fatal error: x86_intrins.h: No such file or directory

🪛 GitHub Actions: Static Analysis

[error] 9-9: fatal error: x86_intrins.h: No such file or directory

🪛 GitHub Actions: Link

[error] 9-9: fatal error: x86_intrins.h: No such file or directory

🧹 Nitpick comments (2)
arch/x86/chorba_sse2.c (2)

11-13: Add documentation for buffer size macros.

The buffer size macros would benefit from documentation explaining their purpose and relationships.

+/* Size of the bit buffer in bytes */
 #define bitbuffersizebytes (16 * 1024 * sizeof(z_word_t))
+/* Size of the bit buffer in z_word_t units */
 #define bitbuffersizezwords (bitbuffersizebytes / sizeof(z_word_t))
+/* Size of the bit buffer in uint64_t units */
 #define bitbuffersizeqwords (bitbuffersizebytes / sizeof(uint64_t))

17-27: Add documentation for SIMD operation macros.

The READ_NEXT and NEXT_ROUND macros would benefit from documentation explaining their purpose and parameters.

+/* Loads two 128-bit vectors from the input buffer at the given offset */
 #define READ_NEXT(in, off, a, b) do { \
         a = _mm_load_si128((__m128i*)(in + off / sizeof(uint64_t))); \
         b = _mm_load_si128((__m128i*)(in + off / sizeof(uint64_t) + 2)); \
         } while (0);

+/* Performs a round of CRC computation using SIMD operations
+ * invec: Input vector
+ * a, b, c, d: Output vectors containing shifted and XORed values
+ */
 #define NEXT_ROUND(invec, a, b, c, d) do { \
         a = _mm_xor_si128(_mm_slli_epi64(invec, 17), _mm_slli_epi64(invec, 55)); \
         b = _mm_xor_si128(_mm_xor_si128(_mm_srli_epi64(invec, 47), _mm_srli_epi64(invec, 9)), _mm_slli_epi64(invec, 19)); \
         c = _mm_xor_si128(_mm_srli_epi64(invec, 45), _mm_slli_epi64(invec, 44)); \
         d  = _mm_srli_epi64(invec, 20); \
         } while (0);
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2b259d9 and 6bf51a8.

📒 Files selected for processing (10)
  • CMakeLists.txt (2 hunks)
  • arch/generic/crc32_c.c (0 hunks)
  • arch/x86/Makefile.in (2 hunks)
  • arch/x86/chorba_sse2.c (1 hunks)
  • arch/x86/x86_functions.h (2 hunks)
  • arch/x86/x86_intrins.h (1 hunks)
  • configure (1 hunks)
  • crc32_c.h (1 hunks)
  • functable.c (1 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
💤 Files with no reviewable changes (1)
  • arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (8)
  • functable.c
  • arch/x86/x86_intrins.h
  • test/benchmarks/benchmark_crc32.cc
  • arch/x86/Makefile.in
  • arch/x86/x86_functions.h
  • configure
  • CMakeLists.txt
  • crc32_c.h
🧰 Additional context used
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c

[error] 807-807: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)

🪛 GitHub Actions: Lint
arch/x86/chorba_sse2.c

[warning] 28-28: Trailing whitespace.


[warning] 113-113: Trailing whitespace.


[warning] 129-129: Trailing whitespace.


[warning] 222-222: Trailing whitespace.


[warning] 356-356: Trailing whitespace.


[warning] 384-384: Trailing whitespace.


[warning] 761-761: Trailing whitespace.


[warning] 776-776: Trailing whitespace.

🪛 GitHub Actions: Libpng
arch/x86/chorba_sse2.c

[error] 9-9: 'x86_intrins.h' file not found with include; use "quotes" instead

🪛 GitHub Actions: CodeQL
arch/x86/chorba_sse2.c

[error] 9-9: fatal error: x86_intrins.h: No such file or directory

🪛 GitHub Actions: Static Analysis
arch/x86/chorba_sse2.c

[error] 9-9: fatal error: x86_intrins.h: No such file or directory

🪛 GitHub Actions: Link
arch/x86/chorba_sse2.c

[error] 9-9: fatal error: x86_intrins.h: No such file or directory

⏰ Context from checks skipped due to timeout of 90000ms (59)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu 20.04 Clang 6
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 16, 2025

Codecov Report

Attention: Patch coverage is 98.93993% with 3 lines in your changes missing coverage. Please review.

Project coverage is 83.06%. Comparing base (50e9ca0) to head (94f0a09).
Report is 10 commits behind head on develop.

Files with missing lines Patch % Lines
arch/x86/chorba_sse2.c 99.64% 1 Missing ⚠️
test/benchmarks/benchmark_crc32.cc 0.00% 0 Missing and 1 partial ⚠️
test/test_crc32.cc 0.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           develop    #1872       +/-   ##
============================================
+ Coverage    42.10%   83.06%   +40.95%     
============================================
  Files           71      139       +68     
  Lines         7353    12654     +5301     
  Branches      1265     2823     +1558     
============================================
+ Hits          3096    10511     +7415     
+ Misses        4035     1208     -2827     
- Partials       222      935      +713     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@KungFuJesus KungFuJesus force-pushed the sse2_chorba branch 3 times, most recently from c2b7a86 to 3cd408f Compare February 16, 2025 19:48
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
arch/x86/chorba_sse2.c (1)

30-40: ⚠️ Potential issue

Fix SSE2 intrinsics compatibility issues.

The code uses Intel-specific 64-bit SSE2 intrinsics that may cause compatibility issues.

Replace the Intel-specific intrinsic with its portable equivalent:

-    __m128i next12 = _mm_cvtsi64x_si128(next1);
+    __m128i next12 = _mm_set_epi64x(0, next1);
🧹 Nitpick comments (4)
arch/x86/chorba_sse2.c (4)

1-16: Add documentation for the header includes.

The file includes necessary headers but lacks documentation explaining their purpose.

Add a comment block explaining the purpose of each header:

 #ifdef X86_CHORBA_SSE_CRC
 
+/* Headers for CRC32 computation */
 #include "zbuild.h"
 #include "crc32_braid_p.h"
 #include "crc32_braid_tbl.h"
 #include "crc32_c.h"
+
+/* Headers for SSE2 intrinsics */
 #include <emmintrin.h>
 #include <immintrin.h>
 #include <smmintrin.h>
 #include "x86_intrins.h"

808-835: Add documentation for the wrapper function.

The wrapper function lacks documentation explaining its purpose and parameters.

Add a documentation block:

+/* Wrapper function for CRC32 computation using SSE2-optimized Chorba algorithm
+ * Handles buffer alignment and dispatches to appropriate implementation based on size
+ * @param crc: Initial CRC value
+ * @param buf: Input buffer
+ * @param len: Length of the buffer in bytes
+ * @return: Computed CRC32 value
+ */
 Z_INTERNAL uint32_t PREFIX(crc32_chorba_sse)(uint32_t crc, const uint8_t *buf, size_t len) {
🧰 Tools
🪛 Cppcheck (2.10-2)

[error] 808-808: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)


47-48: Improve comment clarity and add performance metrics.

The comment about the for loop performance is unclear and lacks specific metrics.

Enhance the comment with more details:

-    /* This is weird, doing for vs while drops 10% off the exec time */
+    /* Using a for loop instead of a while loop improves performance by ~10%
+     * This is likely due to better compiler optimizations for for loops
+     */

821-827: Add documentation for the threshold values.

The code uses magic numbers for thresholds without explaining their significance.

Add comments explaining the threshold values:

+        /* Use different implementations based on input size:
+         * > CHORBA_LARGE_THRESHOLD (32768): Use 118960-byte optimized version
+         * > 72 bytes: Use SSE2-optimized small version
+         * <= 72 bytes: Use base implementation
+         */
         if(aligned_len > CHORBA_LARGE_THRESHOLD)
             c = crc32_chorba_118960_nondestructive(c, (z_word_t*) aligned_buf, aligned_len);
         else if (aligned_len > 72)
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6bf51a8 and 3cd408f.

📒 Files selected for processing (10)
  • CMakeLists.txt (2 hunks)
  • arch/generic/crc32_c.c (0 hunks)
  • arch/x86/Makefile.in (2 hunks)
  • arch/x86/chorba_sse2.c (1 hunks)
  • arch/x86/x86_functions.h (2 hunks)
  • arch/x86/x86_intrins.h (1 hunks)
  • configure (1 hunks)
  • crc32_c.h (1 hunks)
  • functable.c (1 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
💤 Files with no reviewable changes (1)
  • arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (7)
  • test/benchmarks/benchmark_crc32.cc
  • arch/x86/Makefile.in
  • functable.c
  • crc32_c.h
  • arch/x86/x86_intrins.h
  • CMakeLists.txt
  • configure
🧰 Additional context used
🧠 Learnings (1)
arch/x86/x86_functions.h (1)
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c

[error] 808-808: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)

⏰ Context from checks skipped due to timeout of 90000ms (182)
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Symbol Prefix
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS Clang Symbol Prefix
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: macOS Clang Native Instructions (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows ClangCl Win32
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC S390X No vectorized CRC32
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
🔇 Additional comments (2)
arch/x86/x86_functions.h (2)

72-74: LGTM!

The function declaration for crc32_chorba_sse follows the existing pattern and is correctly guarded by the X86_CHORBA_SSE_CRC macro.


95-98: LGTM!

The macro definition for native_crc32 is correctly placed and follows the existing pattern for SSE2 optimizations.

@KungFuJesus KungFuJesus force-pushed the sse2_chorba branch 3 times, most recently from dbfb1d6 to 81145dc Compare February 16, 2025 20:02
Copy link
Copy Markdown

@github-advanced-security github-advanced-security bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (2)
arch/x86/chorba_sse2.c (2)

30-806: ⚠️ Potential issue

Add function documentation and fix SSE2 intrinsics compatibility issues.

  1. Add function documentation:
+/* Computes CRC32 checksum using SSE2-optimized Chorba algorithm for small buffers
+ * @param crc: Initial CRC value
+ * @param buf: Input buffer (must be 64-bit aligned)
+ * @param len: Length of the buffer in bytes
+ * @return: Computed CRC32 value
+ */
 Z_INTERNAL uint32_t chorba_small_nondestructive_sse(uint32_t crc, const uint64_t* buf, size_t len) {
  1. Fix SSE2 intrinsics compatibility issues:
-    __m128i next12 = _mm_cvtsi64x_si128(next1);
+    __m128i next12 = _mm_set_epi64x(0, next1);
🧰 Tools
🪛 GitHub Actions: Package Check

[warning] 40-40: Implicit declaration of function ‘_mm_cvtsi64x_si128’; did you mean ‘_mm_cvtsi16_si128’?


[error] 40-40: Incompatible types when initializing type ‘__m128i’ using type ‘int’.


[warning] 731-731: Implicit declaration of function ‘_mm_cvtsi128_si64’; did you mean ‘_mm_cvtsi128_si16’?


808-835: 🛠️ Refactor suggestion

Add documentation for the wrapper function and fix PREFIX macro.

  1. Add function documentation:
+/* Wrapper function for CRC32 computation using SSE2-optimized Chorba algorithm
+ * Handles buffer alignment and dispatches to appropriate implementation based on size
+ * @param crc: Initial CRC value
+ * @param buf: Input buffer
+ * @param len: Length of the buffer in bytes
+ * @return: Computed CRC32 value
+ */
 uint32_t PREFIX(crc32_chorba_sse)(uint32_t crc, const uint8_t *buf, size_t len) {
  1. Add a comment explaining the PREFIX macro:
+/* PREFIX macro is defined by the build system to namespace the symbols */
🧰 Tools
🪛 Cppcheck (2.10-2)

[error] 808-808: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)

🧹 Nitpick comments (1)
arch/x86/chorba_sse2.c (1)

18-29: Add documentation for macros.

Add documentation to explain the purpose and parameters of the macros:

+/* Loads 256 bits (32 bytes) of data from the input buffer into two 128-bit vectors
+ * @param in: Input buffer
+ * @param off: Offset into the buffer
+ * @param a: First 128-bit vector output
+ * @param b: Second 128-bit vector output
+ */
 #define READ_NEXT(in, off, a, b) do { \
         a = _mm_load_si128((__m128i*)(in + off / sizeof(uint64_t))); \
         b = _mm_load_si128((__m128i*)(in + off / sizeof(uint64_t) + 2)); \
         } while (0);

+/* Performs one round of the Chorba CRC calculation using SSE2 instructions
+ * @param invec: Input 128-bit vector
+ * @param a: First output vector for shifted and XORed values
+ * @param b: Second output vector for shifted and XORed values
+ * @param c: Third output vector for shifted and XORed values
+ * @param d: Fourth output vector for shifted values
+ */
 #define NEXT_ROUND(invec, a, b, c, d) do { \
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3cd408f and 3b43214.

📒 Files selected for processing (11)
  • CMakeLists.txt (2 hunks)
  • arch/generic/crc32_c.c (0 hunks)
  • arch/x86/Makefile.in (2 hunks)
  • arch/x86/chorba_sse2.c (1 hunks)
  • arch/x86/x86_functions.h (2 hunks)
  • arch/x86/x86_intrins.h (1 hunks)
  • configure (1 hunks)
  • crc32_c.h (1 hunks)
  • functable.c (1 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
  • win32/Makefile.msc (3 hunks)
💤 Files with no reviewable changes (1)
  • arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (8)
  • arch/x86/Makefile.in
  • test/benchmarks/benchmark_crc32.cc
  • arch/x86/x86_functions.h
  • functable.c
  • CMakeLists.txt
  • arch/x86/x86_intrins.h
  • configure
  • crc32_c.h
🧰 Additional context used
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c

[error] 808-808: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)

🪛 GitHub Actions: Package Check
arch/x86/chorba_sse2.c

[warning] 40-40: Implicit declaration of function ‘_mm_cvtsi64x_si128’; did you mean ‘_mm_cvtsi16_si128’?


[error] 40-40: Incompatible types when initializing type ‘__m128i’ using type ‘int’.


[warning] 731-731: Implicit declaration of function ‘_mm_cvtsi128_si64’; did you mean ‘_mm_cvtsi128_si16’?

⏰ Context from checks skipped due to timeout of 90000ms (59)
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: EL9 Clang S390X DFLTCC MSAN
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: Ubuntu GCC PPC64LE
🔇 Additional comments (1)
win32/Makefile.msc (1)

38-39: LGTM!

The changes correctly integrate the new SSE2-optimized Chorba CRC implementation by:

  1. Adding the -DX86_CHORBA_SSE_CRC flag to enable the feature
  2. Including chorba_sse2.obj in the build

Also applies to: 64-64

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (4)
arch/x86/chorba_sse2.c (4)

1-16: 🛠️ Refactor suggestion

Add file documentation and fix includes.

The file needs documentation to explain its purpose and implementation details.

Add file documentation:

+/* Implementation of CRC32 checksum calculation using SSE2 vectorization.
+ * This implementation uses the Chorba algorithm optimized for SSE2 instructions
+ * to improve performance on CPUs that support SSE2.
+ */
 #ifdef X86_CHORBA_SSE_CRC

40-40: ⚠️ Potential issue

Fix SSE2 intrinsics compatibility issues.

The code uses Intel-specific 64-bit SSE2 intrinsics that are causing compilation errors.

Replace the Intel-specific intrinsics with their portable equivalents:

-    __m128i next12 = _mm_cvtsi64x_si128(next1);
+    __m128i next12 = _mm_set_epi64x(0, next1);

-    next1 = _mm_cvtsi128_si64(next12);
-    next2 = _mm_cvtsi128_si64(_mm_unpackhi_epi64(next12, next12));
-    next3 = _mm_cvtsi128_si64(next34);
-    next4 = _mm_cvtsi128_si64(_mm_unpackhi_epi64(next34, next34));
-    next5 = _mm_cvtsi128_si64(next56);
+    next1 = _mm_extract_epi64(next12, 0);
+    next2 = _mm_extract_epi64(next12, 1);
+    next3 = _mm_extract_epi64(next34, 0);
+    next4 = _mm_extract_epi64(next34, 1);
+    next5 = _mm_extract_epi64(next56, 0);

Also applies to: 731-735

🧰 Tools
🪛 GitHub Actions: NMake

[warning] 40-40: '_mm_cvtsi64x_si128' undefined; assuming extern returning int


[error] 40-40: 'initializing': cannot convert from 'int' to '__m128i'

🪛 GitHub Actions: Package Check

[warning] 40-40: Implicit declaration of function ‘_mm_cvtsi64x_si128’; did you mean ‘_mm_cvtsi16_si128’?


[error] 40-40: Incompatible types when initializing type ‘__m128i’ using type ‘int’.


808-835: 🛠️ Refactor suggestion

Add documentation for the wrapper function and PREFIX macro.

The wrapper function and PREFIX macro need documentation.

+/* PREFIX macro is defined by the build system to namespace the symbols */
+
+/* Wrapper function for CRC32 computation using SSE2-optimized Chorba algorithm
+ * Handles buffer alignment and dispatches to appropriate implementation based on size
+ * @param crc: Initial CRC value
+ * @param buf: Input buffer
+ * @param len: Length of the buffer in bytes
+ * @return: Computed CRC32 value
+ */
 Z_INTERNAL uint32_t PREFIX(crc32_chorba_sse)(uint32_t crc, const uint8_t *buf, size_t len) {
🧰 Tools
🪛 Cppcheck (2.10-2)

[error] 808-808: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)


30-39: 🛠️ Refactor suggestion

Add function documentation.

The function needs documentation to explain its purpose and parameters.

Add function documentation:

+/* Computes CRC32 checksum using SSE2-optimized Chorba algorithm
+ * @param crc: Initial CRC value
+ * @param buf: Input buffer (must be 64-bit aligned)
+ * @param len: Length of the buffer in bytes
+ * @return: Computed CRC32 value
+ */
 Z_INTERNAL uint32_t chorba_small_nondestructive_sse(uint32_t crc, const uint64_t* buf, size_t len) {
🧹 Nitpick comments (1)
arch/x86/chorba_sse2.c (1)

18-28: Add documentation for macros.

The READ_NEXT and NEXT_ROUND macros need documentation to explain their purpose and parameters.

Add macro documentation:

+/* Loads two 128-bit vectors from the input buffer
+ * @param in: Input buffer
+ * @param off: Offset into the buffer
+ * @param a: First output vector
+ * @param b: Second output vector
+ */
 #define READ_NEXT(in, off, a, b) do { \

+/* Performs a round of CRC transformation using SSE2 instructions
+ * @param invec: Input vector
+ * @param a,b,c,d: Output vectors containing transformed values
+ */
 #define NEXT_ROUND(invec, a, b, c, d) do { \
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3b43214 and 17091a9.

📒 Files selected for processing (11)
  • CMakeLists.txt (2 hunks)
  • arch/generic/crc32_c.c (0 hunks)
  • arch/x86/Makefile.in (2 hunks)
  • arch/x86/chorba_sse2.c (1 hunks)
  • arch/x86/x86_functions.h (2 hunks)
  • arch/x86/x86_intrins.h (1 hunks)
  • configure (1 hunks)
  • crc32_c.h (1 hunks)
  • functable.c (1 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
  • win32/Makefile.msc (3 hunks)
💤 Files with no reviewable changes (1)
  • arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (8)
  • crc32_c.h
  • arch/x86/x86_functions.h
  • arch/x86/Makefile.in
  • test/benchmarks/benchmark_crc32.cc
  • arch/x86/x86_intrins.h
  • functable.c
  • CMakeLists.txt
  • configure
🧰 Additional context used
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c

[error] 808-808: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.

(unknownMacro)

🪛 GitHub Actions: NMake
arch/x86/chorba_sse2.c

[warning] 40-40: '_mm_cvtsi64x_si128' undefined; assuming extern returning int


[warning] 731-731: '_mm_cvtsi128_si64' undefined; assuming extern returning int


[error] 40-40: 'initializing': cannot convert from 'int' to '__m128i'

🪛 GitHub Actions: Package Check
arch/x86/chorba_sse2.c

[warning] 40-40: Implicit declaration of function ‘_mm_cvtsi64x_si128’; did you mean ‘_mm_cvtsi16_si128’?


[error] 40-40: Incompatible types when initializing type ‘__m128i’ using type ‘int’.


[warning] 731-731: Implicit declaration of function ‘_mm_cvtsi128_si64’; did you mean ‘_mm_cvtsi128_si16’?

⏰ Context from checks skipped due to timeout of 90000ms (55)
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Windows MSVC 2019 v140 Win32
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: macOS GCC (ARM64)
  • GitHub Check: macOS GCC
  • GitHub Check: macOS GCC Symbol Prefix & Compat
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
🔇 Additional comments (1)
win32/Makefile.msc (1)

38-39: LGTM!

The changes correctly integrate the new SSE2-optimized CRC implementation into the build process by:

  1. Adding the -DX86_CHORBA_SSE_CRC preprocessor definition to enable the feature.
  2. Adding chorba_sse2.obj to the list of object files to be compiled.

Also applies to: 64-64

@KungFuJesus KungFuJesus force-pushed the sse2_chorba branch 2 times, most recently from 83edd75 to 35164b2 Compare February 17, 2025 03:55
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
arch/x86/chorba_sse2.c (1)

26-822: Large function implementation is performance-driven but could impact maintainability.
The SSE2-based Chorba method is highly specialized with frequent unrolled loops and XOR patterns for speed. Although it is large and arguably harder to read or extend, it appears correct for a performance-critical routine. Testing coverage is strongly recommended to prevent regressions.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 37f59ba and 0407b79.

📒 Files selected for processing (11)
  • CMakeLists.txt (1 hunks)
  • arch/x86/Makefile.in (2 hunks)
  • arch/x86/chorba_sse2.c (1 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • arch/x86/x86_intrins.h (1 hunks)
  • configure (1 hunks)
  • crc32.h (1 hunks)
  • functable.c (1 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
  • test/test_crc32.cc (1 hunks)
  • win32/Makefile.msc (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
  • test/benchmarks/benchmark_crc32.cc
  • crc32.h
  • functable.c
  • win32/Makefile.msc
  • arch/x86/Makefile.in
  • CMakeLists.txt
  • configure
  • arch/x86/x86_functions.h
🧰 Additional context used
🧠 Learnings (2)
arch/x86/chorba_sse2.c (7)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-18T02:53:23.621Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:830-831
Timestamp: 2025-02-21T01:37:54.508Z
Learning: The alignment calculation `((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15` is safe from overflow as it's mathematically bounded between 1 and 16, making the use of uintptr_t appropriate for this pointer arithmetic.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:41:10.063Z
Learning: For SSE2 optimizations, `_mm_cvtsi128_si64` should be used instead of `_mm_extract_epi64` (SSE4.1) for extracting 64-bit values from 128-bit vectors, as it generates more efficient movq instructions.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
arch/x86/x86_intrins.h (2)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define `__GNUC__`, so adding `!defined(_MSC_VER)` to GCC detection macros is redundant when `__GNUC__` is already being checked.
⏰ Context from checks skipped due to timeout of 90000ms (136)
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu GCC S390X No vectorized CRC32
  • GitHub Check: Ubuntu GCC S390X
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: Ubuntu GCC PPC
  • GitHub Check: Ubuntu GCC MIPS64
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Ubuntu GCC ARM HF Compat No Opt
  • GitHub Check: Ubuntu GCC ARM HF No NEON
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu GCC S390X No vectorized CRC32
  • GitHub Check: Ubuntu GCC S390X
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: Ubuntu GCC PPC
  • GitHub Check: Ubuntu GCC MIPS64
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Ubuntu GCC ARM HF Compat No Opt
  • GitHub Check: Ubuntu GCC ARM HF No NEON
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu GCC S390X No vectorized CRC32
  • GitHub Check: Ubuntu GCC S390X
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: Ubuntu GCC PPC
  • GitHub Check: Ubuntu GCC MIPS64
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Ubuntu GCC ARM HF Compat No Opt
  • GitHub Check: Ubuntu GCC ARM HF No NEON
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu GCC S390X No vectorized CRC32
  • GitHub Check: Ubuntu GCC S390X
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: Ubuntu GCC PPC
  • GitHub Check: Ubuntu GCC MIPS64
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Ubuntu GCC ARM HF Compat No Opt
  • GitHub Check: Ubuntu GCC ARM HF No NEON
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu GCC S390X No vectorized CRC32
  • GitHub Check: Ubuntu GCC S390X
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: Ubuntu GCC PPC
  • GitHub Check: Ubuntu GCC MIPS64
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Ubuntu GCC ARM HF Compat No Opt
  • GitHub Check: Ubuntu GCC ARM HF No NEON
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu GCC S390X No vectorized CRC32
  • GitHub Check: Ubuntu GCC S390X
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: Ubuntu GCC PPC
  • GitHub Check: Ubuntu GCC MIPS64
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Ubuntu GCC ARM HF Compat No Opt
  • GitHub Check: Ubuntu GCC ARM HF No NEON
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: macOS GCC Symbol Prefix
  • GitHub Check: Ubuntu GCC S390X No vectorized CRC32
  • GitHub Check: Ubuntu GCC S390X
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: Ubuntu GCC PPC
  • GitHub Check: Ubuntu GCC MIPS64
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Ubuntu GCC ARM HF Compat No Opt
  • GitHub Check: Ubuntu GCC ARM HF No NEON
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC S390X No vectorized CRC32
  • GitHub Check: Ubuntu GCC S390X
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC No Power8
  • GitHub Check: Ubuntu GCC PPC
  • GitHub Check: Ubuntu GCC MIPS64
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Ubuntu GCC ARM HF Compat No Opt
  • GitHub Check: Ubuntu GCC ARM HF No NEON
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC S390X No vectorized CRC32
  • GitHub Check: Ubuntu GCC S390X
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC
  • GitHub Check: Ubuntu GCC MIPS64
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Ubuntu GCC ARM HF Compat No Opt
  • GitHub Check: Ubuntu GCC ARM HF No NEON
  • GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
  • GitHub Check: Windows MSVC 2019 v140 Win64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC S390X No vectorized CRC32
  • GitHub Check: Ubuntu GCC S390X
  • GitHub Check: Ubuntu GCC PPC64LE
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu GCC PPC
  • GitHub Check: Ubuntu GCC MIPS64
  • GitHub Check: Ubuntu GCC AARCH64
  • GitHub Check: Ubuntu GCC ARM HF Compat No Opt
  • GitHub Check: Ubuntu GCC ARM HF No NEON
🔇 Additional comments (7)
test/test_crc32.cc (1)

270-272: No concerns; logic is consistent with prior suggestions.
Enabling SSE2-based Chorba tests if WITHOUT_CHORBA is unset, X86_SSE2 is defined, and NO_CHORBA_SSE2 is unset aligns well with the existing pattern for other CRC variants.

arch/x86/x86_intrins.h (3)

93-111: MSVC 32-bit fallback approach looks good.
Defining _mm_cvtsi128_si64x and _mm_cvtsi64x_si128 via a union for 32-bit MSVC is a safe solution when native intrinsics aren’t available. This avoids strict alignment pitfalls and is consistent with the feedback that we skip additional MSVC Clang checks.


114-117: Clang-specific macros are appropriately isolated.
Redirecting to _mm_cvtsi64_si128 and _mm_cvtsi128_si64 for Clang is concise and avoids conflicts with MSVC-specific paths.


119-126: GCC 32-bit handling with union is coherent.
Providing _mm_cvtsi128_si64x and _mm_cvtsi64x_si128 for GCC 32-bit ensures a reliable fallback when compiler-provided intrinsics are missing. This approach is consistent with the rest of the file.

arch/x86/chorba_sse2.c (3)

1-2: Conditional compilation is correct.
Limiting SSE2 Chorba CRC logic to cases where X86_SSE2 is defined (and WITHOUT_CHORBA is not) ensures this code is only built for SSE2-capable x86 targets.


14-25: Macros leverage guaranteed 16-byte alignment.
These macros rely on SSE-friendly alignment. In debug scenarios, consider adding optional assert checks for alignment if needed, but the current approach is acceptable for performance builds.


824-851: Wrapper function handles alignment and thresholds effectively.
The alignment offset calculation is bounded and safe. Selecting between large SSE2, small SSE2, or a fallback approach aligns with user expectations. Post-condition (c ^ 0xffffffff) is standard for CRC finalization.

@Dead2 Dead2 added optimization Architecture Architecture specific labels Mar 19, 2025
Copy link
Copy Markdown
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although we found this to be a little slower than the generic Chorba on cpus with double-pumped SSE2 (Like Zacate for example), it is however good for the rest of the SSE2+.

@KungFuJesus
Copy link
Copy Markdown
Collaborator Author

I just removed an extra function call or two (one to memcpy, the other to braid). It was faster for me on modern hardware, I suspect it'll be the same on older.

@KungFuJesus
Copy link
Copy Markdown
Collaborator Author

Although we found this to be a little slower than the generic Chorba on cpus with double-pumped SSE2 (Like Zacate for example), it is however good for the rest of the SSE2+.

Did we want to try to notch out that variant when assigning from the function table or did we just want to issue that caveat so those who are bit by that can build without it?

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
arch/x86/chorba_sse2.c (3)

43-44: Clarify or parameterize the magic constants in the loop condition.

The expression (i + 256 + 40 + 32 + 32) < len includes multiple hardcoded offsets. Defining or documenting these values (e.g., through named constants) would enhance readability and maintainability.


65-80: Consider pruning large blocks of commented code.

The extensive commented-out code blocks, while possibly useful for reference, can clutter the file and make it harder to maintain. Removing or moving them to dedicated documentation might improve overall readability.

-        /*
-        chorba1 ^= next1;
-        ...
-        chorba8 ^= chorba2;
-        */
+        /* If needed, maintain a short comment
+         * describing the original scalar logic. */

26-850: Consider splitting large function logic for better maintainability.

chorba_small_nondestructive_sse2 is lengthy and contains multiple loops, repeated patterns, and extensive commented code. Extracting sub-steps (e.g., repeated XOR+shift steps) into helper functions can simplify debugging and ease future modifications.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0407b79 and 2d4c022.

📒 Files selected for processing (11)
  • CMakeLists.txt (1 hunks)
  • arch/x86/Makefile.in (2 hunks)
  • arch/x86/chorba_sse2.c (1 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • arch/x86/x86_intrins.h (1 hunks)
  • configure (1 hunks)
  • crc32.h (1 hunks)
  • functable.c (1 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
  • test/test_crc32.cc (1 hunks)
  • win32/Makefile.msc (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
  • crc32.h
  • arch/x86/Makefile.in
  • test/benchmarks/benchmark_crc32.cc
  • win32/Makefile.msc
  • CMakeLists.txt
  • configure
  • functable.c
🧰 Additional context used
🧠 Learnings (3)
arch/x86/x86_functions.h (2)
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
arch/x86/x86_intrins.h (2)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: MSVC does not define `__GNUC__`, so adding `!defined(_MSC_VER)` to GCC detection macros is redundant when `__GNUC__` is already being checked.
arch/x86/chorba_sse2.c (7)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:830-831
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The alignment calculation `((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15` is safe from overflow as it's mathematically bounded between 1 and 16, making the use of uintptr_t appropriate for this pointer arithmetic.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: For SSE2 optimizations, `_mm_cvtsi128_si64` should be used instead of `_mm_extract_epi64` (SSE4.1) for extracting 64-bit values from 128-bit vectors, as it generates more efficient movq instructions.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
🧬 Code Definitions (3)
test/test_crc32.cc (1)
arch/x86/x86_functions.h (1)
  • crc32_chorba_sse2 (29-29)
arch/x86/x86_functions.h (1)
arch/generic/generic_functions.h (1)
  • crc32 (14-14)
arch/x86/chorba_sse2.c (3)
arch/x86/x86_intrins.h (6)
  • _mm_cvtsi64x_si128 (107-109)
  • _mm_cvtsi64x_si128 (107-107)
  • _mm_cvtsi128_si64x (101-105)
  • _mm_cvtsi128_si64x (101-101)
  • _mm_cvtsi128_si64x (120-124)
  • _mm_cvtsi128_si64x (120-120)
crc32.h (1)
  • crc32_braid_internal (18-18)
arch/x86/x86_functions.h (1)
  • crc32_chorba_sse2 (29-29)
⏰ Context from checks skipped due to timeout of 90000ms (87)
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: macOS Clang (Target 10.10)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Windows NMake x64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: macOS Clang (Target 10.10)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Windows NMake x64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: macOS Clang (Target 10.10)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Windows NMake x64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: macOS Clang (Target 10.10)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Windows NMake x64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: macOS Clang (Target 10.10)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Windows NMake x64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: macOS Clang (Target 10.10)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Windows NMake x64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: macOS Clang (Target 10.10)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Windows NMake x64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: macOS Clang (Target 10.10)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Windows NMake x64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: macOS Clang (Target 10.10)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Windows NMake x64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
  • GitHub Check: Ubuntu GCC PPC64
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Windows MSVC 2022 v142 Win64
  • GitHub Check: macOS Clang (Target 10.10)
  • GitHub Check: Windows MSVC 2022 v143 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
  • GitHub Check: Ubuntu GCC SPARC64
  • GitHub Check: Windows NMake x64
  • GitHub Check: Ubuntu GCC AARCH64 ASAN
🔇 Additional comments (12)
test/test_crc32.cc (1)

270-272: Good conditional addition of the SSE2-based Chorba test.

This guard appropriately checks for WITHOUT_CHORBA, X86_SSE2, and NO_CHORBA_SSE2 to enable the optimized CRC32 test only when supported and not explicitly disabled. This aligns with the feedback on defaulting to the SSE2 variant unless it’s intentionally excluded.

arch/x86/x86_functions.h (3)

9-16: Well-defined workaround for the MSVC stack-corruption bug.

Defining NO_CHORBA_SSE2 for MSVC v1920–v1929 (32-bit) is a practical measure that prevents the broken compiler from emitting problematic SSE2 code. The embedded comment and reference link provide helpful context. Ensure it doesn't accidentally disable SSE2 optimizations for other valid toolchains that share version numbering but lack the bug.


28-30: Correct selective declaration of the SSE2 Chorba CRC function.

Exposing crc32_chorba_sse2 only when WITHOUT_CHORBA is not defined avoids compilation issues for builds that intentionally exclude Chorba. This is consistent with the other CPU-specific CRC variants.


102-105: Appropriate conditional override of native_crc32.

By undefining and then redefining native_crc32 to crc32_chorba_sse2, you ensure the SSE2 Chorba CRC impl is used when not excluded by WITHOUT_CHORBA or NO_CHORBA_SSE2. This follows the established approach of linking native_crc32 to the fastest available CRC method.

arch/x86/x86_intrins.h (3)

93-110: MSVC-specific inline implementations of _mm_cvtsi128_si64x and _mm_cvtsi64x_si128.

  1. Uses a union-based approach for reading and writing 64-bit data from 128-bit registers, aiding compatibility with 32-bit MSVC builds that lack native intrinsics for 64-bit extraction.
  2. The fallback to stack memory is acceptable given MSVC’s limitations, though performance might be suboptimal compared to a direct register move on x64.

114-117: Clang macros aligned with existing intrinsics.

Redirecting _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x to Clang’s built-in _mm_cvtsi64_si128 and _mm_cvtsi128_si64 is straightforward and avoids duplication. No concerns here.


119-126: GCC 32-bit fallback using a union for strict-aliasing safety.

This union-based approach avoids direct pointer-casting, reducing aliasing conflicts. The macro _mm_cvtsi64x_si128(a) likewise sets the high 64 bits to 0. This implementation is coherent with the compiler-specific constraints.

arch/x86/chorba_sse2.c (5)

1-2: Good approach to conditionally enable SSE2 Chorba CRC code.

Wrapping the implementation with #if !defined(WITHOUT_CHORBA) && defined(X86_SSE2) aligns with the existing optimization strategies and makes SSE2 optimizations automatically enabled when available.


14-17: Confirmed: Aligned loads are fine here.

READ_NEXT uses _mm_load_si128 for aligned loading, which is valid as long as the caller enforces 16-byte alignment. The usage is consistent with the design and the explicit alignment checks in the wrapper function.


19-24: NEXT_ROUND macro looks consistent.

The XOR and shift operations appear correctly mapped to SSE2 instructions without extraneous steps. This pattern aligns well with the performance goals for the Chorba CRC calculation.


805-810: Good SSE2 intrinsics usage for final extraction.

Using _mm_cvtsi128_si64x works well with SSE2 and avoids SSE4.1-only intrinsics. This maintains broad CPU compatibility while retaining performance benefits.


852-859:

Details

✅ Verification successful

Switch to a more portable type and verify external dependencies.

  1. Prefer using size_t instead of unsigned long for algn_diff to ensure consistent behavior on different platforms:
-    unsigned long algn_diff = ((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15;
+    size_t algn_diff = ((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15;
  1. Validate that crc32_chorba_118960_nondestructive is declared and accessible.

🏁 Script executed:

#!/bin/bash
# Check if the function 'crc32_chorba_118960_nondestructive' is declared and accessible in the repository.
rg -S "crc32_chorba_118960_nondestructive" .

Length of output: 551


Action Required: Update variable type for portability and verify dependency usage

  • Replace the declaration of algn_diff with a size_t to ensure consistent behavior across platforms:

    -    unsigned long algn_diff = ((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15;
    +    size_t algn_diff = ((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15;
  • The function crc32_chorba_118960_nondestructive has been confirmed to be declared and available in the repository.

The version that's currently in the generic implementation for 32768
byte buffers leverages the stack. It manages to autovectorize but
unfortunately the trips to the stack hurt its performance for CPUs which
need this the most. This version is explicitly SIMD vectorized and
doesn't use trips to the stack.  In my testing it's ~10% faster than the
"small" variant, and about 42% faster than the "32768" variant.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
arch/x86/chorba_sse2.c (2)

26-850: The chorba_small_nondestructive_sse2 implementation is highly optimized but lacks documentation.

The SSE2 implementation of the Chorba algorithm is well-engineered, with:

  • Efficient data loading through aligned reads
  • Parallel processing of multiple 64-bit words
  • Careful unrolling of the main loop to maximize throughput
  • Appropriate handling of remaining data

However, the function would benefit from documentation explaining its purpose, parameters, and alignment requirements.

+/* Computes CRC32 checksum using SSE2-optimized Chorba algorithm
+ * Processes data in 32-byte chunks with unrolled loops for maximum throughput
+ * @param crc: Initial CRC value
+ * @param buf: Input buffer (must be 16-byte aligned)
+ * @param len: Length of the buffer in bytes (must be multiple of 8)
+ * @return: Computed CRC32 value
+ */
 Z_INTERNAL uint32_t chorba_small_nondestructive_sse2(uint32_t crc, const uint64_t* buf, size_t len) {

852-879: The wrapper function needs documentation and uses a magic number.

The crc32_chorba_sse2 function:

  1. Handles alignment of input buffer correctly
  2. Dispatches to the appropriate implementation based on buffer size
  3. Properly handles pre and post-conditioning of CRC value

However, the function should document the CHORBA_LARGE_THRESHOLD constant (32768) as mentioned in the PR description and add function documentation.

+/* Wrapper function for CRC32 computation using SSE2-optimized Chorba algorithm
+ * Handles buffer alignment and dispatches to appropriate implementation based on size
+ * @param crc: Initial CRC value
+ * @param buf: Input buffer
+ * @param len: Length of the buffer in bytes
+ * @return: Computed CRC32 value
+ */
 Z_INTERNAL uint32_t crc32_chorba_sse2(uint32_t crc, const uint8_t *buf, size_t len) {
     uint32_t c;
     uint64_t* aligned_buf;
     size_t aligned_len;
 
     c = (~crc) & 0xffffffff;
     unsigned long algn_diff = ((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15;
     if (algn_diff < len) {
         if (algn_diff) {
             c = crc32_braid_internal(c, buf, algn_diff);
         }
         aligned_buf = (uint64_t*) (buf + algn_diff);
         aligned_len = len - algn_diff;
+        /* Threshold for large buffer optimization (32768 bytes) */
+        #ifndef CHORBA_LARGE_THRESHOLD
+        #define CHORBA_LARGE_THRESHOLD 32768
+        #endif
         if(aligned_len > CHORBA_LARGE_THRESHOLD) {
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 2d4c022 and 94f0a09.

📒 Files selected for processing (11)
  • CMakeLists.txt (1 hunks)
  • arch/x86/Makefile.in (2 hunks)
  • arch/x86/chorba_sse2.c (1 hunks)
  • arch/x86/x86_functions.h (3 hunks)
  • arch/x86/x86_intrins.h (1 hunks)
  • configure (1 hunks)
  • crc32.h (1 hunks)
  • functable.c (1 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
  • test/test_crc32.cc (1 hunks)
  • win32/Makefile.msc (2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • crc32.h
  • win32/Makefile.msc
  • CMakeLists.txt
  • functable.c
  • configure
  • arch/x86/Makefile.in
🧰 Additional context used
🧠 Learnings (3)
arch/x86/x86_functions.h (2)
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
arch/x86/x86_intrins.h (2)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: MSVC does not define `__GNUC__`, so adding `!defined(_MSC_VER)` to GCC detection macros is redundant when `__GNUC__` is already being checked.
arch/x86/chorba_sse2.c (7)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:830-831
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The alignment calculation `((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15` is safe from overflow as it's mathematically bounded between 1 and 16, making the use of uintptr_t appropriate for this pointer arithmetic.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: For SSE2 optimizations, `_mm_cvtsi128_si64` should be used instead of `_mm_extract_epi64` (SSE4.1) for extracting 64-bit values from 128-bit vectors, as it generates more efficient movq instructions.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
🧬 Code Definitions (3)
arch/x86/x86_functions.h (1)
arch/generic/generic_functions.h (1)
  • crc32 (14-14)
test/test_crc32.cc (1)
arch/x86/x86_functions.h (1)
  • crc32_chorba_sse2 (29-29)
arch/x86/chorba_sse2.c (3)
arch/x86/x86_intrins.h (6)
  • _mm_cvtsi64x_si128 (107-109)
  • _mm_cvtsi64x_si128 (107-107)
  • _mm_cvtsi128_si64x (101-105)
  • _mm_cvtsi128_si64x (101-101)
  • _mm_cvtsi128_si64x (120-124)
  • _mm_cvtsi128_si64x (120-120)
crc32.h (1)
  • crc32_braid_internal (18-18)
arch/x86/x86_functions.h (1)
  • crc32_chorba_sse2 (29-29)
🔇 Additional comments (12)
test/test_crc32.cc (1)

270-272: Conditional Test for chorba_sse2 Looks Good

This addition properly gates the SSE2-based Chorba CRC test behind relevant macros, ensuring it’s only built and run when applicable.

test/benchmarks/benchmark_crc32.cc (1)

71-75: Benchmark Condition is Consistent

Hooking up the SSE2 benchmark for Chorba CRC is consistent with the test logic, using the correct macros to avoid conflicting builds.

arch/x86/x86_functions.h (3)

9-16: Safe Fallback for MSVC Stack Corruption Issue

Defining NO_CHORBA_SSE2 for affected MSVC versions is a pragmatic way to work around the compiler bug. The inline comment references the MSVC ticket, aiding future maintenance.


28-30: New SSE2 Chorba CRC Declaration is Clear

The prototype appropriately wraps the declaration under !defined(WITHOUT_CHORBA), preventing unwanted compilation when Chorba is disabled.


102-105: Correct Macro Override for Native CRC

Undefining and redefining native_crc32 to crc32_chorba_sse2 ensures this optimized implementation is used by default when no other macro bars its usage.

arch/x86/x86_intrins.h (3)

93-112: MSVC-Specific Union-Based Approach

Using a union to convert between __m128i and int64_t avoids strict aliasing issues. This is a valid fallback for 32-bit MSVC targets lacking direct 64-bit register moves.


114-117: Simple Clang Macros

Defining _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x as Clang aliases is straightforward and avoids duplication.


119-126: GCC 32-Bit Workaround is Effective

The union-based method for _mm_cvtsi128_si64x ensures safe extraction on GCC 32-bit builds. Including the _mm_cvtsi64x_si128 macro complements this solution.

arch/x86/chorba_sse2.c (4)

1-11: Appropriate includes and conditional compilation.

The file is correctly guarded with conditional compilation that only enables the code when CHORBA algorithm is available and SSE2 is supported. The necessary headers for CRC32 calculation and SSE2 operations are properly included.


14-24: The READ_NEXT and NEXT_ROUND macros are concise and efficient.

These macros implement the core operations for the SSE2-optimized Chorba algorithm:

  • READ_NEXT loads aligned data from memory efficiently
  • NEXT_ROUND performs the bit-manipulation operations required by the Chorba algorithm using SIMD instructions

The implementation leverages SSE2 effectively to process multiple elements in parallel.


811-832: Efficient aligned memory copy instead of calling memcpy.

The implementation efficiently handles the remainder of the input buffer by using SSE2 instructions for the aligned copy operations instead of relying on a memcpy call. This avoids the function call overhead and potentially allows for better compiler optimization.


805-810:

Details

✅ Verification successful

Potential issue with 32-bit compilation.

The code uses _mm_cvtsi128_si64x which might not be properly defined on 32-bit platforms. In the PR comments, there was discussion about MSVC having issues with these functions on 32-bit architecture.


🏁 Script executed:

#!/bin/bash
# Check if x86_intrins.h provides proper fallbacks for these functions
grep -A10 "_mm_cvtsi128_si64" arch/x86/x86_intrins.h

Length of output: 632


Action: Confirming 32-bit Fallback for _mm_cvtsi128_si64x Usage

After verifying the implementation in arch/x86/x86_intrins.h, the intrinsic _mm_cvtsi128_si64x now has a proper fallback—even on 32-bit platforms. The header defines an inline function that converts a __m128i value to a 64-bit integer via a union, and for some configurations it maps _mm_cvtsi128_si64x to _mm_cvtsi128_si64. This approach appears to address the previously noted 32-bit MSVC concerns.

  • File: arch/x86/chorba_sse2.c (Lines 805–810) uses _mm_cvtsi128_si64x as expected.
  • File: arch/x86/x86_intrins.h provides the necessary fallback definitions.
  • Next Step: Please verify in your MSVC 32-bit testing that these definitions are picked up correctly.

@Dead2
Copy link
Copy Markdown
Member

Dead2 commented Mar 28, 2025

Although we found this to be a little slower than the generic Chorba on cpus with double-pumped SSE2 (Like Zacate for example), it is however good for the rest of the SSE2+.

Did we want to try to notch out that variant when assigning from the function table or did we just want to issue that caveat so those who are bit by that can build without it?

I don't think we need to bother with that really. First of all it is an extremely narrow userbase, quite possibly zero users will use zlib-ng on this really old cpu. Also it is not that much slower, so combined with the rest of the SSE2 optimizations it is nevertheless a solid win.
So adding complexity to cpuid checking for enabling only partial SSE2, for something that very few or possibly nobody will benefit from, I don't think it is actually worth it.

@Dead2 Dead2 merged commit 724dc0c into zlib-ng:develop Mar 28, 2025
142 of 151 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Apr 14, 2025
@Dead2 Dead2 mentioned this pull request Nov 5, 2025
@coderabbitai coderabbitai bot mentioned this pull request Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Architecture Architecture specific optimization

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants