Explicit SSE2 vectorization of Chorba CRC method#1872
Conversation
WalkthroughThe pull request introduces a new CRC32 calculation implementation using SSE2 intrinsics, named Changes
Sequence Diagram(s)sequenceDiagram
participant Init as init_functable
participant FT as Function Table
participant CRC as crc32_chorba_sse2
Init->>FT: Setup function table
FT-->>Init: Assign crc32 pointer to crc32_chorba_sse2 (if SSE2 enabled)
sequenceDiagram
participant App as Application
participant FT as Function Table
participant CRC as crc32_chorba_sse2
App->>FT: Invoke CRC32 function via pointer
FT->>CRC: Call crc32_chorba_sse2
CRC-->>App: Return computed CRC32 value
Possibly related PRs
Suggested reviewers
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
arch/x86/chorba_sse2.c (1)
28-805: Modularize or annotate complex SSE logic
The functionchorba_small_nondestructive_sseis highly unrolled, making it difficult to maintain or extend. Consider segmenting repeated rounds into helper functions or adding inline comments for clarity. Also confirm that the leftover handling withmemcpy(final, ...)is safe for all edge cases, givenfinalis only 72 bytes.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (9)
CMakeLists.txt(2 hunks)arch/generic/crc32_c.c(0 hunks)arch/x86/Makefile.in(2 hunks)arch/x86/chorba_sse2.c(1 hunks)arch/x86/x86_functions.h(2 hunks)configure(1 hunks)crc32_c.h(1 hunks)functable.c(1 hunks)test/benchmarks/benchmark_crc32.cc(1 hunks)
💤 Files with no reviewable changes (1)
- arch/generic/crc32_c.c
✅ Files skipped from review due to trivial changes (1)
- crc32_c.h
🧰 Additional context used
🧠 Learnings (1)
arch/x86/x86_functions.h (1)
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c
[error] 807-807: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
🔇 Additional comments (12)
arch/x86/chorba_sse2.c (3)
16-26: Validate SSE load/store alignment usage in macros
TheREAD_NEXTmacro uses_mm_load_si128, which requires 16-byte alignment. Verify that all call sites or buffer pointers are indeed 16-byte aligned to prevent potential undefined behavior on some systems.
807-835: Configure the unknown PREFIX macro
A static analysis tool flagged an unknown macro on line 807. IfPREFIXis defined elsewhere or via compiler flags, ensure it is properly included to silence the warning. Also verify that the alignment calculation at lines 813-814 guarantees 16-byte alignment in all scenarios.🧰 Tools
🪛 Cppcheck (2.10-2)
[error] 807-807: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
1-13:Details
✅ Verification successful
Ensure consistency of thresholds with external references
These macros define buffer sizes and are potentially interrelated with other thresholds (e.g.,CHORBA_LARGE_THRESHOLDused later). Ensure that upstream or neighboring modules share compatible constants to avoid mismatched optimizations or unintended behavior.You can run a search to locate all references to these bitbuffer macros and confirm consistent usage:
🏁 Script executed:
#!/bin/bash # Searching for definitions or references of bitbuffersize or CHORBA_LARGE_THRESHOLD across codebase rg 'bitbuffersize|CHORBA_LARGE_THRESHOLD'Length of output: 4876
Review Verification: Constants are Consistent
- Verified that
bitbuffersizebytes,bitbuffersizezwords, andbitbuffersizeqwordsare defined identically in botharch/x86/chorba_sse2.candarch/generic/crc32_chorba_c.c.- Confirmed that the upstream threshold
CHORBA_LARGE_THRESHOLD(set assizeof(z_word_t) * 64 * 1024incrc32_c.h) is used consistently in conditions across the codebase.- The intentional difference between the 16 KB-based bit buffer macros and the 64 KB-based threshold appears to be by design. Please continue to ensure that any future modifications maintain this interrelated configuration across modules.
test/benchmarks/benchmark_crc32.cc (1)
61-61: Benchmark addition is valid
Registering thechorba_ssebenchmark correctly integrates the new SSE2-optimized CRC32 routine. No issues found.arch/x86/Makefile.in (2)
37-37: Build targets for SSE2 Chorba
Addingchorba_sse2.o chorba_sse2.lois consistent with parallel SSE2 targets. No further concerns.
75-80: Verify compiler flags for SSE2 compilation
The newly introduced lines use$(SSE2FLAG)consistently with existing SSE2 builds. Ensure no additional flags (e.g.,-march=) are needed for your target environments.arch/x86/x86_functions.h (2)
72-74: Well-structured addition of SSE2-optimized CRC32 function!The new function declaration and its conditional compilation guard are properly placed and follow the existing patterns for CRC32 functions.
95-96: Appropriate macro redefinition for SSE2 optimization!The native_crc32 macro is correctly redefined to use the new SSE2-optimized implementation when SSE2 support is available.
functable.c (1)
85-85: Correct function pointer assignment for SSE2-optimized CRC32!The new crc32_chorba_sse function is properly assigned to the function table when SSE2 support is available.
configure (1)
1573-1574: Proper integration of new SSE2-optimized object files!The chorba_sse2 object files are correctly added to both static and shared library builds, maintaining consistency with the existing build system.
CMakeLists.txt (2)
915-916: Explicitly Enabling the New SSE2 Chorba CRC ImplementationThese new lines add the definition
-DX86_CHORBA_SSE_CRCand update theSSE2_SRCSvariable to include the new source filechorba_sse2.calong with the other SSE2-related files. This is in line with the PR objectives to introduce an explicitly vectorized Chorba CRC implementation.
925-925: Documenting the New SSE2 Chorba CRC FeatureThe added call to
add_feature_info(X86_CHORBA_SSE_CRC 1 "Support SSE2-accelerated crc32, using \"${SSE2FLAG}\"")documents the new capability. It clearly informs users about the availability of the SSE2-accelerated CRC32 function enabled by the added macro.
7a567b1 to
00abf62
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (3)
arch/x86/chorba_sse2.c (3)
16-26: Fix trailing whitespace in macro definitions.The macros are well-defined but contain trailing whitespace that should be removed to comply with the coding style.
-#define NEXT_ROUND(invec, a, b, c, d) do { \ +#define NEXT_ROUND(invec, a, b, c, d) do { \ a = _mm_xor_si128(_mm_slli_epi64(invec, 17), _mm_slli_epi64(invec, 55)); \ b = _mm_xor_si128(_mm_xor_si128(_mm_srli_epi64(invec, 47), _mm_srli_epi64(invec, 9)), _mm_slli_epi64(invec, 19)); \ c = _mm_xor_si128(_mm_srli_epi64(invec, 45), _mm_slli_epi64(invec, 44)); \ - d = _mm_srli_epi64(invec, 20); \ + d = _mm_srli_epi64(invec, 20); \ } while (0);
28-805: Verify the performance impact of the SSE2 implementation.The implementation looks solid with careful handling of data alignment and efficient use of SSE2 intrinsics. However, we should verify:
- The performance improvement claims (10% over "small" variant, 42% over "32768" variant)
- The impact of memory alignment requirements on different platforms
Consider adding performance benchmarks to validate the improvements across different CPU architectures and data sizes.
🧰 Tools
🪛 GitHub Actions: Lint
[warning] 103-103: Trailing whitespace.
[warning] 113-113: Trailing whitespace.
[warning] 129-129: Trailing whitespace.
[warning] 136-136: Trailing whitespace.
[warning] 185-185: Trailing whitespace.
[warning] 212-212: Trailing whitespace.
[warning] 222-222: Trailing whitespace.
[warning] 229-229: Trailing whitespace.
[warning] 275-275: Trailing whitespace.
[warning] 298-298: Trailing whitespace.
[warning] 311-311: Trailing whitespace.
[warning] 312-312: Trailing whitespace.
[warning] 356-356: Trailing whitespace.
[warning] 384-384: Trailing whitespace.
[warning] 393-393: Trailing whitespace.
[warning] 394-394: Trailing whitespace.
[warning] 395-395: Trailing whitespace.
[warning] 416-416: Trailing whitespace.
[warning] 441-441: Trailing whitespace.
[warning] 471-471: Trailing whitespace.
[warning] 479-479: Trailing whitespace.
[warning] 480-480: Trailing whitespace.
[warning] 481-481: Trailing whitespace.
[warning] 524-524: Trailing whitespace.
[warning] 550-550: Trailing whitespace.
[warning] 572-572: Trailing whitespace.
[warning] 573-573: Trailing whitespace.
[warning] 603-603: Trailing whitespace.
[warning] 619-619: Trailing whitespace.
[warning] 633-633: Trailing whitespace.
[warning] 634-634: Trailing whitespace.
[warning] 635-635: Trailing whitespace.
[warning] 674-674: Trailing whitespace.
[warning] 691-691: Trailing whitespace.
[warning] 718-718: Trailing whitespace.
[warning] 720-720: Trailing whitespace.
[warning] 721-721: Trailing whitespace.
[warning] 722-722: Trailing whitespace.
[warning] 761-761: Trailing whitespace.
[warning] 776-776: Trailing whitespace.
🪛 GitHub Actions: Libpng
[error] 38-38: call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations
[error] 38-38: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'
🪛 GitHub Actions: Static Analysis
[error] 38-38: Call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations.
[error] 38-38: Initializing '__m128i' with an expression of incompatible type 'int'.
1-836: Clean up trailing whitespace throughout the file.Multiple instances of trailing whitespace were detected by the linter. This should be cleaned up to maintain consistent code style.
Run the following command to remove trailing whitespace:
#!/bin/bash # Remove trailing whitespace sed -i 's/[[:space:]]*$//' arch/x86/chorba_sse2.c🧰 Tools
🪛 Cppcheck (2.10-2)
[error] 807-807: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
🪛 GitHub Actions: Lint
[warning] 27-27: Trailing whitespace.
[warning] 103-103: Trailing whitespace.
[warning] 113-113: Trailing whitespace.
[warning] 129-129: Trailing whitespace.
[warning] 136-136: Trailing whitespace.
[warning] 185-185: Trailing whitespace.
[warning] 212-212: Trailing whitespace.
[warning] 222-222: Trailing whitespace.
[warning] 229-229: Trailing whitespace.
[warning] 275-275: Trailing whitespace.
[warning] 298-298: Trailing whitespace.
[warning] 311-311: Trailing whitespace.
[warning] 312-312: Trailing whitespace.
[warning] 356-356: Trailing whitespace.
[warning] 384-384: Trailing whitespace.
[warning] 393-393: Trailing whitespace.
[warning] 394-394: Trailing whitespace.
[warning] 395-395: Trailing whitespace.
[warning] 416-416: Trailing whitespace.
[warning] 441-441: Trailing whitespace.
[warning] 471-471: Trailing whitespace.
[warning] 479-479: Trailing whitespace.
[warning] 480-480: Trailing whitespace.
[warning] 481-481: Trailing whitespace.
[warning] 524-524: Trailing whitespace.
[warning] 550-550: Trailing whitespace.
[warning] 572-572: Trailing whitespace.
[warning] 573-573: Trailing whitespace.
[warning] 603-603: Trailing whitespace.
[warning] 619-619: Trailing whitespace.
[warning] 633-633: Trailing whitespace.
[warning] 634-634: Trailing whitespace.
[warning] 635-635: Trailing whitespace.
[warning] 674-674: Trailing whitespace.
[warning] 691-691: Trailing whitespace.
[warning] 718-718: Trailing whitespace.
[warning] 720-720: Trailing whitespace.
[warning] 721-721: Trailing whitespace.
[warning] 722-722: Trailing whitespace.
[warning] 761-761: Trailing whitespace.
[warning] 776-776: Trailing whitespace.
🪛 GitHub Actions: Libpng
[error] 38-38: call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations
[error] 38-38: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'
🪛 GitHub Actions: Static Analysis
[error] 38-38: Call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations.
[error] 38-38: Initializing '__m128i' with an expression of incompatible type 'int'.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CMakeLists.txt(2 hunks)arch/generic/crc32_c.c(0 hunks)arch/x86/Makefile.in(2 hunks)arch/x86/chorba_sse2.c(1 hunks)arch/x86/x86_functions.h(2 hunks)arch/x86/x86_intrins.h(1 hunks)configure(1 hunks)crc32_c.h(1 hunks)functable.c(1 hunks)test/benchmarks/benchmark_crc32.cc(1 hunks)
💤 Files with no reviewable changes (1)
- arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (7)
- test/benchmarks/benchmark_crc32.cc
- arch/x86/Makefile.in
- functable.c
- arch/x86/x86_functions.h
- crc32_c.h
- CMakeLists.txt
- configure
🧰 Additional context used
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c
[error] 807-807: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
🪛 GitHub Actions: Lint
arch/x86/chorba_sse2.c
[warning] 27-27: Trailing whitespace.
[warning] 103-103: Trailing whitespace.
[warning] 113-113: Trailing whitespace.
[warning] 129-129: Trailing whitespace.
[warning] 136-136: Trailing whitespace.
[warning] 185-185: Trailing whitespace.
[warning] 212-212: Trailing whitespace.
[warning] 222-222: Trailing whitespace.
[warning] 229-229: Trailing whitespace.
[warning] 275-275: Trailing whitespace.
[warning] 298-298: Trailing whitespace.
[warning] 311-311: Trailing whitespace.
[warning] 312-312: Trailing whitespace.
[warning] 356-356: Trailing whitespace.
[warning] 384-384: Trailing whitespace.
[warning] 393-393: Trailing whitespace.
[warning] 394-394: Trailing whitespace.
[warning] 395-395: Trailing whitespace.
[warning] 416-416: Trailing whitespace.
[warning] 441-441: Trailing whitespace.
[warning] 471-471: Trailing whitespace.
[warning] 479-479: Trailing whitespace.
[warning] 480-480: Trailing whitespace.
[warning] 481-481: Trailing whitespace.
[warning] 524-524: Trailing whitespace.
[warning] 550-550: Trailing whitespace.
[warning] 572-572: Trailing whitespace.
[warning] 573-573: Trailing whitespace.
[warning] 603-603: Trailing whitespace.
[warning] 619-619: Trailing whitespace.
[warning] 633-633: Trailing whitespace.
[warning] 634-634: Trailing whitespace.
[warning] 635-635: Trailing whitespace.
[warning] 674-674: Trailing whitespace.
[warning] 691-691: Trailing whitespace.
[warning] 718-718: Trailing whitespace.
[warning] 720-720: Trailing whitespace.
[warning] 721-721: Trailing whitespace.
[warning] 722-722: Trailing whitespace.
[warning] 761-761: Trailing whitespace.
[warning] 776-776: Trailing whitespace.
🪛 GitHub Actions: Libpng
arch/x86/chorba_sse2.c
[error] 38-38: call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations
[error] 38-38: initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'
🪛 GitHub Actions: Static Analysis
arch/x86/chorba_sse2.c
[error] 38-38: Call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations.
[error] 38-38: Initializing '__m128i' with an expression of incompatible type 'int'.
⏰ Context from checks skipped due to timeout of 90000ms (28)
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
🔇 Additional comments (2)
arch/x86/x86_intrins.h (1)
92-96:Details
❓ Verification inconclusive
LGTM! Verify the intrinsic compatibility across MSVC versions.
The macro definition provides compatibility for the
_mm_cvtsi64x_si128intrinsic in Visual Studio, consistent with other MSVC-specific fixes in this file.
🌐 Web query:
What is the difference between _mm_cvtsi64x_si128 and _mm_cvtsi64_si128 intrinsics in MSVC?💡 Result:
The
_mm_cvtsi64_si128and_mm_cvtsi64x_si128intrinsics in MSVC both move 64-bit integers into SSE registers but differ in their architectural compatibility and parameter types:Key Differences
Intrinsic Architecture Support Parameter Type Behavior _mm_cvtsi64_si128x86-32 only __int64Moves a 64-bit integer to the lower 64 bits of __m128i, zeroing upper bits[6]._mm_cvtsi64x_si128x86-32 and x86-64 long long/int64_tDesigned for cross-platform compatibility; behaves identically to _mm_cvtsi64_si128on x86-32 but may adapt to x86-64 calling conventions[10].Technical Details
_mm_cvtsi64_si128:
- Explicitly uses the
__int64type (MSVC-specific).- Generates the
MOVQinstruction to copy a 64-bit integer into an SSE register[6].- Not supported on x86-64 targets where inline assembly is unavailable[1].
_mm_cvtsi64x_si128:
- Uses standard
long long/int64_ttypes for portability.- Ensures compatibility when compiling for both x86-32 and x86-64 architectures[10].
- On x86-32, functionally identical to
_mm_cvtsi64_si128.Practical Use Cases
- Use
_mm_cvtsi64_si128when targeting x86-32 specifically with MSVC.- Prefer
_mm_cvtsi64x_si128for cross-compiler/cross-arch code involvingint64_t.Both intrinsics ultimately map to similar low-level operations but address differences in type systems and architectural constraints[6][10].
Citations:
- 1: https://devblogs.microsoft.com/cppblog/new-intrinsic-support-in-visual-studio-2008/
- 2: https://assets.ctfassets.net/9pcn2syx7zns/1bexmTsBaXHArazquMviK4/5c1524d460e1173d160fcba93157c3a6/compiler-intrinsics-and-assembly.pdf
- 3: https://stackoverflow.blog/2020/07/08/improving-performance-with-simd-intrinsics-in-three-use-cases/
- 4: https://learn.microsoft.com/en-us/cpp/intrinsics/x64-amd64-intrinsics-list?view=msvc-170
- 5: https://developercommunity.visualstudio.com/t/_mm_cvtsi64_si128-gets-lowered-to-vmovq/10834249?space=8&ftype=problem&preview2=true&sort=newest&viewtype=all
- 6: http://portal.nacad.ufrj.br/online/intel/compiler_c/common/core/GUID-CA07DE5F-9DD1-4435-B0D3-8D6B15277666.htm
- 7: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html
- 8: https://developercommunity.visualstudio.com/t/_mm_cvtsi64_si128-gets-lowered-to-vmovq/10834249?scope=follow&sort=newest
- 9: https://www.intel.com/content/www/us/en/docs/cpp-compiler/developer-guide-reference/2021-8/shift-intrinsics.html
- 10: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=32869
LGTM with a caution! The macro mapping of
_mm_cvtsi64x_si128to_mm_cvtsi64_si128correctly addresses the missing definition in Visual Studio and aligns with other MSVC-specific fixes in the file. Based on the observed differences:
_mm_cvtsi64_si128is historically intended for x86-32 builds (accepting a__int64), while_mm_cvtsi64x_si128is designed for cross-platform support (usinglong long/int64_t).Please verify that this substitution works correctly across all targeted MSVC architectures—especially on x86-64—and that any parameter type nuances do not lead to unexpected behavior.
arch/x86/chorba_sse2.c (1)
1-14: LGTM! Clear conditional compilation and includes.The header section is well-organized with proper include guards and necessary headers for SSE2 operations.
00abf62 to
2b259d9
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
arch/x86/chorba_sse2.c (1)
1-27: LGTM! Well-structured macros for SSE2 operations.The macros effectively encapsulate complex SSE2 operations, making the code more maintainable and readable.
Remove trailing whitespace on line 27.
🧰 Tools
🪛 GitHub Actions: Lint
[warning] 27-27: Trailing whitespace.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CMakeLists.txt(2 hunks)arch/generic/crc32_c.c(0 hunks)arch/x86/Makefile.in(2 hunks)arch/x86/chorba_sse2.c(1 hunks)arch/x86/x86_functions.h(2 hunks)arch/x86/x86_intrins.h(1 hunks)configure(1 hunks)crc32_c.h(1 hunks)functable.c(1 hunks)test/benchmarks/benchmark_crc32.cc(1 hunks)
💤 Files with no reviewable changes (1)
- arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (8)
- functable.c
- arch/x86/Makefile.in
- test/benchmarks/benchmark_crc32.cc
- arch/x86/x86_intrins.h
- arch/x86/x86_functions.h
- configure
- CMakeLists.txt
- crc32_c.h
🧰 Additional context used
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c
[error] 806-806: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
🪛 GitHub Actions: Lint
arch/x86/chorba_sse2.c
[warning] 27-27: Trailing whitespace.
[warning] 112-112: Trailing whitespace.
[warning] 128-128: Trailing whitespace.
[warning] 221-221: Trailing whitespace.
[warning] 355-355: Trailing whitespace.
[warning] 383-383: Trailing whitespace.
[warning] 760-760: Trailing whitespace.
[warning] 775-775: Trailing whitespace.
🪛 GitHub Actions: Libpng
arch/x86/chorba_sse2.c
[error] 38-38: Call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations.
[error] 38-38: Initializing '__m128i' (vector of 2 'long long' values) with an expression of incompatible type 'int'.
🪛 GitHub Actions: Static Analysis
arch/x86/chorba_sse2.c
[error] 38-38: Call to undeclared function '_mm_cvtsi64x_si128'; ISO C99 and later do not support implicit function declarations.
[error] 38-38: Initializing '__m128i' with an expression of incompatible type 'int'.
🪛 GitHub Actions: Package Check
arch/x86/chorba_sse2.c
[warning] 38-38: Implicit declaration of function ‘_mm_cvtsi64x_si128’; did you mean ‘_mm_cvtsi32_si128’?
[error] 38-38: Incompatible types when initializing type ‘__m128i’ using type ‘int’.
[warning] 729-729: Implicit declaration of function ‘_mm_cvtsi128_si64’; did you mean ‘_mm_cvtsi128_si32’?
⏰ Context from checks skipped due to timeout of 90000ms (30)
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
🔇 Additional comments (1)
arch/x86/chorba_sse2.c (1)
806-834: Add documentation about alignment requirements.The function handles memory alignment carefully, but the alignment requirements and their impact on performance should be documented.
Add a comment explaining the alignment logic:
+/* + * Ensures 16-byte alignment for SSE2 operations: + * 1. Processes unaligned prefix using scalar implementation + * 2. Main body uses aligned SSE2 operations for optimal performance + * 3. Processes remaining bytes using scalar implementation + */ Z_INTERNAL uint32_t PREFIX(crc32_chorba_sse)(uint32_t crc, const uint8_t *buf, size_t len) {The PREFIX macro documentation was already requested in a previous review.
🧰 Tools
🪛 Cppcheck (2.10-2)
[error] 806-806: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
2b259d9 to
91d590e
Compare
91d590e to
6bf51a8
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (1)
arch/x86/chorba_sse2.c (1)
9-9:⚠️ Potential issueFix include path for x86_intrins.h.
The angled brackets in the include path are causing build failures. Use quotes instead.
-#include <x86_intrins.h> +#include "x86_intrins.h"🧰 Tools
🪛 GitHub Actions: Libpng
[error] 9-9: 'x86_intrins.h' file not found with include; use "quotes" instead
🪛 GitHub Actions: CodeQL
[error] 9-9: fatal error: x86_intrins.h: No such file or directory
🪛 GitHub Actions: Static Analysis
[error] 9-9: fatal error: x86_intrins.h: No such file or directory
🪛 GitHub Actions: Link
[error] 9-9: fatal error: x86_intrins.h: No such file or directory
🧹 Nitpick comments (2)
arch/x86/chorba_sse2.c (2)
11-13: Add documentation for buffer size macros.The buffer size macros would benefit from documentation explaining their purpose and relationships.
+/* Size of the bit buffer in bytes */ #define bitbuffersizebytes (16 * 1024 * sizeof(z_word_t)) +/* Size of the bit buffer in z_word_t units */ #define bitbuffersizezwords (bitbuffersizebytes / sizeof(z_word_t)) +/* Size of the bit buffer in uint64_t units */ #define bitbuffersizeqwords (bitbuffersizebytes / sizeof(uint64_t))
17-27: Add documentation for SIMD operation macros.The READ_NEXT and NEXT_ROUND macros would benefit from documentation explaining their purpose and parameters.
+/* Loads two 128-bit vectors from the input buffer at the given offset */ #define READ_NEXT(in, off, a, b) do { \ a = _mm_load_si128((__m128i*)(in + off / sizeof(uint64_t))); \ b = _mm_load_si128((__m128i*)(in + off / sizeof(uint64_t) + 2)); \ } while (0); +/* Performs a round of CRC computation using SIMD operations + * invec: Input vector + * a, b, c, d: Output vectors containing shifted and XORed values + */ #define NEXT_ROUND(invec, a, b, c, d) do { \ a = _mm_xor_si128(_mm_slli_epi64(invec, 17), _mm_slli_epi64(invec, 55)); \ b = _mm_xor_si128(_mm_xor_si128(_mm_srli_epi64(invec, 47), _mm_srli_epi64(invec, 9)), _mm_slli_epi64(invec, 19)); \ c = _mm_xor_si128(_mm_srli_epi64(invec, 45), _mm_slli_epi64(invec, 44)); \ d = _mm_srli_epi64(invec, 20); \ } while (0);
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CMakeLists.txt(2 hunks)arch/generic/crc32_c.c(0 hunks)arch/x86/Makefile.in(2 hunks)arch/x86/chorba_sse2.c(1 hunks)arch/x86/x86_functions.h(2 hunks)arch/x86/x86_intrins.h(1 hunks)configure(1 hunks)crc32_c.h(1 hunks)functable.c(1 hunks)test/benchmarks/benchmark_crc32.cc(1 hunks)
💤 Files with no reviewable changes (1)
- arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (8)
- functable.c
- arch/x86/x86_intrins.h
- test/benchmarks/benchmark_crc32.cc
- arch/x86/Makefile.in
- arch/x86/x86_functions.h
- configure
- CMakeLists.txt
- crc32_c.h
🧰 Additional context used
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c
[error] 807-807: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
🪛 GitHub Actions: Lint
arch/x86/chorba_sse2.c
[warning] 28-28: Trailing whitespace.
[warning] 113-113: Trailing whitespace.
[warning] 129-129: Trailing whitespace.
[warning] 222-222: Trailing whitespace.
[warning] 356-356: Trailing whitespace.
[warning] 384-384: Trailing whitespace.
[warning] 761-761: Trailing whitespace.
[warning] 776-776: Trailing whitespace.
🪛 GitHub Actions: Libpng
arch/x86/chorba_sse2.c
[error] 9-9: 'x86_intrins.h' file not found with include; use "quotes" instead
🪛 GitHub Actions: CodeQL
arch/x86/chorba_sse2.c
[error] 9-9: fatal error: x86_intrins.h: No such file or directory
🪛 GitHub Actions: Static Analysis
arch/x86/chorba_sse2.c
[error] 9-9: fatal error: x86_intrins.h: No such file or directory
🪛 GitHub Actions: Link
arch/x86/chorba_sse2.c
[error] 9-9: fatal error: x86_intrins.h: No such file or directory
⏰ Context from checks skipped due to timeout of 90000ms (59)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu 20.04 Clang 6
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu 20.04 Clang 6
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu 20.04 Clang 6
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu 20.04 Clang 6
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu 20.04 Clang 6
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu 20.04 Clang 6
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu 20.04 Clang 6
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu 20.04 Clang 6
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu 20.04 Clang 6
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu 20.04 Clang 6
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1872 +/- ##
============================================
+ Coverage 42.10% 83.06% +40.95%
============================================
Files 71 139 +68
Lines 7353 12654 +5301
Branches 1265 2823 +1558
============================================
+ Hits 3096 10511 +7415
+ Misses 4035 1208 -2827
- Partials 222 935 +713 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
c2b7a86 to
3cd408f
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
arch/x86/chorba_sse2.c (1)
30-40:⚠️ Potential issueFix SSE2 intrinsics compatibility issues.
The code uses Intel-specific 64-bit SSE2 intrinsics that may cause compatibility issues.
Replace the Intel-specific intrinsic with its portable equivalent:
- __m128i next12 = _mm_cvtsi64x_si128(next1); + __m128i next12 = _mm_set_epi64x(0, next1);
🧹 Nitpick comments (4)
arch/x86/chorba_sse2.c (4)
1-16: Add documentation for the header includes.The file includes necessary headers but lacks documentation explaining their purpose.
Add a comment block explaining the purpose of each header:
#ifdef X86_CHORBA_SSE_CRC +/* Headers for CRC32 computation */ #include "zbuild.h" #include "crc32_braid_p.h" #include "crc32_braid_tbl.h" #include "crc32_c.h" + +/* Headers for SSE2 intrinsics */ #include <emmintrin.h> #include <immintrin.h> #include <smmintrin.h> #include "x86_intrins.h"
808-835: Add documentation for the wrapper function.The wrapper function lacks documentation explaining its purpose and parameters.
Add a documentation block:
+/* Wrapper function for CRC32 computation using SSE2-optimized Chorba algorithm + * Handles buffer alignment and dispatches to appropriate implementation based on size + * @param crc: Initial CRC value + * @param buf: Input buffer + * @param len: Length of the buffer in bytes + * @return: Computed CRC32 value + */ Z_INTERNAL uint32_t PREFIX(crc32_chorba_sse)(uint32_t crc, const uint8_t *buf, size_t len) {🧰 Tools
🪛 Cppcheck (2.10-2)
[error] 808-808: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
47-48: Improve comment clarity and add performance metrics.The comment about the for loop performance is unclear and lacks specific metrics.
Enhance the comment with more details:
- /* This is weird, doing for vs while drops 10% off the exec time */ + /* Using a for loop instead of a while loop improves performance by ~10% + * This is likely due to better compiler optimizations for for loops + */
821-827: Add documentation for the threshold values.The code uses magic numbers for thresholds without explaining their significance.
Add comments explaining the threshold values:
+ /* Use different implementations based on input size: + * > CHORBA_LARGE_THRESHOLD (32768): Use 118960-byte optimized version + * > 72 bytes: Use SSE2-optimized small version + * <= 72 bytes: Use base implementation + */ if(aligned_len > CHORBA_LARGE_THRESHOLD) c = crc32_chorba_118960_nondestructive(c, (z_word_t*) aligned_buf, aligned_len); else if (aligned_len > 72)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (10)
CMakeLists.txt(2 hunks)arch/generic/crc32_c.c(0 hunks)arch/x86/Makefile.in(2 hunks)arch/x86/chorba_sse2.c(1 hunks)arch/x86/x86_functions.h(2 hunks)arch/x86/x86_intrins.h(1 hunks)configure(1 hunks)crc32_c.h(1 hunks)functable.c(1 hunks)test/benchmarks/benchmark_crc32.cc(1 hunks)
💤 Files with no reviewable changes (1)
- arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (7)
- test/benchmarks/benchmark_crc32.cc
- arch/x86/Makefile.in
- functable.c
- crc32_c.h
- arch/x86/x86_intrins.h
- CMakeLists.txt
- configure
🧰 Additional context used
🧠 Learnings (1)
arch/x86/x86_functions.h (1)
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c
[error] 808-808: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
⏰ Context from checks skipped due to timeout of 90000ms (182)
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win32
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: Windows MSVC 2019 v141 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS Clang Symbol Prefix
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win32
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: Windows MSVC 2019 v141 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS Clang Symbol Prefix
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win32
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: Windows MSVC 2019 v141 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win32
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: Windows MSVC 2019 v141 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win32
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: Windows MSVC 2019 v141 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win32
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: Windows MSVC 2019 v141 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win32
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: Windows MSVC 2019 v141 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win32
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: Windows MSVC 2019 v141 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win32
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: Windows MSVC 2019 v141 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: macOS Clang Native Instructions (ARM64)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows ClangCl Win32
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: Windows MSVC 2019 v141 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC AARCH64 Compat No Opt UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No ARMv8 UBSAN
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC S390X No vectorized CRC32
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
🔇 Additional comments (2)
arch/x86/x86_functions.h (2)
72-74: LGTM!The function declaration for
crc32_chorba_ssefollows the existing pattern and is correctly guarded by theX86_CHORBA_SSE_CRCmacro.
95-98: LGTM!The macro definition for
native_crc32is correctly placed and follows the existing pattern for SSE2 optimizations.
dbfb1d6 to
81145dc
Compare
81145dc to
e3a1fed
Compare
There was a problem hiding this comment.
CodeQL found more than 20 potential problems in the proposed changes. Check the Files changed tab for more details.
e3a1fed to
3b43214
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (2)
arch/x86/chorba_sse2.c (2)
30-806:⚠️ Potential issueAdd function documentation and fix SSE2 intrinsics compatibility issues.
- Add function documentation:
+/* Computes CRC32 checksum using SSE2-optimized Chorba algorithm for small buffers + * @param crc: Initial CRC value + * @param buf: Input buffer (must be 64-bit aligned) + * @param len: Length of the buffer in bytes + * @return: Computed CRC32 value + */ Z_INTERNAL uint32_t chorba_small_nondestructive_sse(uint32_t crc, const uint64_t* buf, size_t len) {
- Fix SSE2 intrinsics compatibility issues:
- __m128i next12 = _mm_cvtsi64x_si128(next1); + __m128i next12 = _mm_set_epi64x(0, next1);🧰 Tools
🪛 GitHub Actions: Package Check
[warning] 40-40: Implicit declaration of function ‘_mm_cvtsi64x_si128’; did you mean ‘_mm_cvtsi16_si128’?
[error] 40-40: Incompatible types when initializing type ‘__m128i’ using type ‘int’.
[warning] 731-731: Implicit declaration of function ‘_mm_cvtsi128_si64’; did you mean ‘_mm_cvtsi128_si16’?
808-835: 🛠️ Refactor suggestionAdd documentation for the wrapper function and fix PREFIX macro.
- Add function documentation:
+/* Wrapper function for CRC32 computation using SSE2-optimized Chorba algorithm + * Handles buffer alignment and dispatches to appropriate implementation based on size + * @param crc: Initial CRC value + * @param buf: Input buffer + * @param len: Length of the buffer in bytes + * @return: Computed CRC32 value + */ uint32_t PREFIX(crc32_chorba_sse)(uint32_t crc, const uint8_t *buf, size_t len) {
- Add a comment explaining the PREFIX macro:
+/* PREFIX macro is defined by the build system to namespace the symbols */🧰 Tools
🪛 Cppcheck (2.10-2)
[error] 808-808: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
🧹 Nitpick comments (1)
arch/x86/chorba_sse2.c (1)
18-29: Add documentation for macros.Add documentation to explain the purpose and parameters of the macros:
+/* Loads 256 bits (32 bytes) of data from the input buffer into two 128-bit vectors + * @param in: Input buffer + * @param off: Offset into the buffer + * @param a: First 128-bit vector output + * @param b: Second 128-bit vector output + */ #define READ_NEXT(in, off, a, b) do { \ a = _mm_load_si128((__m128i*)(in + off / sizeof(uint64_t))); \ b = _mm_load_si128((__m128i*)(in + off / sizeof(uint64_t) + 2)); \ } while (0); +/* Performs one round of the Chorba CRC calculation using SSE2 instructions + * @param invec: Input 128-bit vector + * @param a: First output vector for shifted and XORed values + * @param b: Second output vector for shifted and XORed values + * @param c: Third output vector for shifted and XORed values + * @param d: Fourth output vector for shifted values + */ #define NEXT_ROUND(invec, a, b, c, d) do { \
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
CMakeLists.txt(2 hunks)arch/generic/crc32_c.c(0 hunks)arch/x86/Makefile.in(2 hunks)arch/x86/chorba_sse2.c(1 hunks)arch/x86/x86_functions.h(2 hunks)arch/x86/x86_intrins.h(1 hunks)configure(1 hunks)crc32_c.h(1 hunks)functable.c(1 hunks)test/benchmarks/benchmark_crc32.cc(1 hunks)win32/Makefile.msc(3 hunks)
💤 Files with no reviewable changes (1)
- arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (8)
- arch/x86/Makefile.in
- test/benchmarks/benchmark_crc32.cc
- arch/x86/x86_functions.h
- functable.c
- CMakeLists.txt
- arch/x86/x86_intrins.h
- configure
- crc32_c.h
🧰 Additional context used
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c
[error] 808-808: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
🪛 GitHub Actions: Package Check
arch/x86/chorba_sse2.c
[warning] 40-40: Implicit declaration of function ‘_mm_cvtsi64x_si128’; did you mean ‘_mm_cvtsi16_si128’?
[error] 40-40: Incompatible types when initializing type ‘__m128i’ using type ‘int’.
[warning] 731-731: Implicit declaration of function ‘_mm_cvtsi128_si64’; did you mean ‘_mm_cvtsi128_si16’?
⏰ Context from checks skipped due to timeout of 90000ms (59)
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix (ARM64)
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix (ARM64)
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix (ARM64)
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: macOS GCC Symbol Prefix (ARM64)
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: macOS GCC Symbol Prefix (ARM64)
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: macOS GCC Symbol Prefix (ARM64)
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: macOS GCC Symbol Prefix (ARM64)
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: macOS GCC Symbol Prefix (ARM64)
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: macOS GCC Symbol Prefix (ARM64)
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: EL9 Clang S390X DFLTCC MSAN
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: macOS GCC Symbol Prefix (ARM64)
- GitHub Check: Ubuntu GCC PPC64LE
🔇 Additional comments (1)
win32/Makefile.msc (1)
38-39: LGTM!The changes correctly integrate the new SSE2-optimized Chorba CRC implementation by:
- Adding the
-DX86_CHORBA_SSE_CRCflag to enable the feature- Including
chorba_sse2.objin the buildAlso applies to: 64-64
3b43214 to
17091a9
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (4)
arch/x86/chorba_sse2.c (4)
1-16: 🛠️ Refactor suggestionAdd file documentation and fix includes.
The file needs documentation to explain its purpose and implementation details.
Add file documentation:
+/* Implementation of CRC32 checksum calculation using SSE2 vectorization. + * This implementation uses the Chorba algorithm optimized for SSE2 instructions + * to improve performance on CPUs that support SSE2. + */ #ifdef X86_CHORBA_SSE_CRC
40-40:⚠️ Potential issueFix SSE2 intrinsics compatibility issues.
The code uses Intel-specific 64-bit SSE2 intrinsics that are causing compilation errors.
Replace the Intel-specific intrinsics with their portable equivalents:
- __m128i next12 = _mm_cvtsi64x_si128(next1); + __m128i next12 = _mm_set_epi64x(0, next1); - next1 = _mm_cvtsi128_si64(next12); - next2 = _mm_cvtsi128_si64(_mm_unpackhi_epi64(next12, next12)); - next3 = _mm_cvtsi128_si64(next34); - next4 = _mm_cvtsi128_si64(_mm_unpackhi_epi64(next34, next34)); - next5 = _mm_cvtsi128_si64(next56); + next1 = _mm_extract_epi64(next12, 0); + next2 = _mm_extract_epi64(next12, 1); + next3 = _mm_extract_epi64(next34, 0); + next4 = _mm_extract_epi64(next34, 1); + next5 = _mm_extract_epi64(next56, 0);Also applies to: 731-735
🧰 Tools
🪛 GitHub Actions: NMake
[warning] 40-40: '_mm_cvtsi64x_si128' undefined; assuming extern returning int
[error] 40-40: 'initializing': cannot convert from 'int' to '__m128i'
🪛 GitHub Actions: Package Check
[warning] 40-40: Implicit declaration of function ‘_mm_cvtsi64x_si128’; did you mean ‘_mm_cvtsi16_si128’?
[error] 40-40: Incompatible types when initializing type ‘__m128i’ using type ‘int’.
808-835: 🛠️ Refactor suggestionAdd documentation for the wrapper function and PREFIX macro.
The wrapper function and PREFIX macro need documentation.
+/* PREFIX macro is defined by the build system to namespace the symbols */ + +/* Wrapper function for CRC32 computation using SSE2-optimized Chorba algorithm + * Handles buffer alignment and dispatches to appropriate implementation based on size + * @param crc: Initial CRC value + * @param buf: Input buffer + * @param len: Length of the buffer in bytes + * @return: Computed CRC32 value + */ Z_INTERNAL uint32_t PREFIX(crc32_chorba_sse)(uint32_t crc, const uint8_t *buf, size_t len) {🧰 Tools
🪛 Cppcheck (2.10-2)
[error] 808-808: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
30-39: 🛠️ Refactor suggestionAdd function documentation.
The function needs documentation to explain its purpose and parameters.
Add function documentation:
+/* Computes CRC32 checksum using SSE2-optimized Chorba algorithm + * @param crc: Initial CRC value + * @param buf: Input buffer (must be 64-bit aligned) + * @param len: Length of the buffer in bytes + * @return: Computed CRC32 value + */ Z_INTERNAL uint32_t chorba_small_nondestructive_sse(uint32_t crc, const uint64_t* buf, size_t len) {
🧹 Nitpick comments (1)
arch/x86/chorba_sse2.c (1)
18-28: Add documentation for macros.The
READ_NEXTandNEXT_ROUNDmacros need documentation to explain their purpose and parameters.Add macro documentation:
+/* Loads two 128-bit vectors from the input buffer + * @param in: Input buffer + * @param off: Offset into the buffer + * @param a: First output vector + * @param b: Second output vector + */ #define READ_NEXT(in, off, a, b) do { \ +/* Performs a round of CRC transformation using SSE2 instructions + * @param invec: Input vector + * @param a,b,c,d: Output vectors containing transformed values + */ #define NEXT_ROUND(invec, a, b, c, d) do { \
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
CMakeLists.txt(2 hunks)arch/generic/crc32_c.c(0 hunks)arch/x86/Makefile.in(2 hunks)arch/x86/chorba_sse2.c(1 hunks)arch/x86/x86_functions.h(2 hunks)arch/x86/x86_intrins.h(1 hunks)configure(1 hunks)crc32_c.h(1 hunks)functable.c(1 hunks)test/benchmarks/benchmark_crc32.cc(1 hunks)win32/Makefile.msc(3 hunks)
💤 Files with no reviewable changes (1)
- arch/generic/crc32_c.c
🚧 Files skipped from review as they are similar to previous changes (8)
- crc32_c.h
- arch/x86/x86_functions.h
- arch/x86/Makefile.in
- test/benchmarks/benchmark_crc32.cc
- arch/x86/x86_intrins.h
- functable.c
- CMakeLists.txt
- configure
🧰 Additional context used
🪛 Cppcheck (2.10-2)
arch/x86/chorba_sse2.c
[error] 808-808: There is an unknown macro here somewhere. Configuration is required. If PREFIX is a macro then please configure it.
(unknownMacro)
🪛 GitHub Actions: NMake
arch/x86/chorba_sse2.c
[warning] 40-40: '_mm_cvtsi64x_si128' undefined; assuming extern returning int
[warning] 731-731: '_mm_cvtsi128_si64' undefined; assuming extern returning int
[error] 40-40: 'initializing': cannot convert from 'int' to '__m128i'
🪛 GitHub Actions: Package Check
arch/x86/chorba_sse2.c
[warning] 40-40: Implicit declaration of function ‘_mm_cvtsi64x_si128’; did you mean ‘_mm_cvtsi16_si128’?
[error] 40-40: Incompatible types when initializing type ‘__m128i’ using type ‘int’.
[warning] 731-731: Implicit declaration of function ‘_mm_cvtsi128_si64’; did you mean ‘_mm_cvtsi128_si16’?
⏰ Context from checks skipped due to timeout of 90000ms (55)
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Windows MSVC 2019 v140 Win32
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: macOS GCC (ARM64)
- GitHub Check: macOS GCC
- GitHub Check: macOS GCC Symbol Prefix & Compat
- GitHub Check: Ubuntu GCC AARCH64 ASAN
🔇 Additional comments (1)
win32/Makefile.msc (1)
38-39: LGTM!The changes correctly integrate the new SSE2-optimized CRC implementation into the build process by:
- Adding the
-DX86_CHORBA_SSE_CRCpreprocessor definition to enable the feature.- Adding
chorba_sse2.objto the list of object files to be compiled.Also applies to: 64-64
83edd75 to
35164b2
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
arch/x86/chorba_sse2.c (1)
26-822: Large function implementation is performance-driven but could impact maintainability.
The SSE2-based Chorba method is highly specialized with frequent unrolled loops and XOR patterns for speed. Although it is large and arguably harder to read or extend, it appears correct for a performance-critical routine. Testing coverage is strongly recommended to prevent regressions.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
CMakeLists.txt(1 hunks)arch/x86/Makefile.in(2 hunks)arch/x86/chorba_sse2.c(1 hunks)arch/x86/x86_functions.h(3 hunks)arch/x86/x86_intrins.h(1 hunks)configure(1 hunks)crc32.h(1 hunks)functable.c(1 hunks)test/benchmarks/benchmark_crc32.cc(1 hunks)test/test_crc32.cc(1 hunks)win32/Makefile.msc(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (8)
- test/benchmarks/benchmark_crc32.cc
- crc32.h
- functable.c
- win32/Makefile.msc
- arch/x86/Makefile.in
- CMakeLists.txt
- configure
- arch/x86/x86_functions.h
🧰 Additional context used
🧠 Learnings (2)
arch/x86/chorba_sse2.c (7)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-18T02:53:23.621Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:830-831
Timestamp: 2025-02-21T01:37:54.508Z
Learning: The alignment calculation `((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15` is safe from overflow as it's mathematically bounded between 1 and 16, making the use of uintptr_t appropriate for this pointer arithmetic.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:41:10.063Z
Learning: For SSE2 optimizations, `_mm_cvtsi128_si64` should be used instead of `_mm_extract_epi64` (SSE4.1) for extracting 64-bit values from 128-bit vectors, as it generates more efficient movq instructions.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
arch/x86/x86_intrins.h (2)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define `__GNUC__`, so adding `!defined(_MSC_VER)` to GCC detection macros is redundant when `__GNUC__` is already being checked.
⏰ Context from checks skipped due to timeout of 90000ms (136)
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC Symbol Prefix
- GitHub Check: Ubuntu GCC S390X No vectorized CRC32
- GitHub Check: Ubuntu GCC S390X
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: Ubuntu GCC PPC
- GitHub Check: Ubuntu GCC MIPS64
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Ubuntu GCC ARM HF Compat No Opt
- GitHub Check: Ubuntu GCC ARM HF No NEON
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC Symbol Prefix
- GitHub Check: Ubuntu GCC S390X No vectorized CRC32
- GitHub Check: Ubuntu GCC S390X
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: Ubuntu GCC PPC
- GitHub Check: Ubuntu GCC MIPS64
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Ubuntu GCC ARM HF Compat No Opt
- GitHub Check: Ubuntu GCC ARM HF No NEON
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC Symbol Prefix
- GitHub Check: Ubuntu GCC S390X No vectorized CRC32
- GitHub Check: Ubuntu GCC S390X
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: Ubuntu GCC PPC
- GitHub Check: Ubuntu GCC MIPS64
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Ubuntu GCC ARM HF Compat No Opt
- GitHub Check: Ubuntu GCC ARM HF No NEON
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC Symbol Prefix
- GitHub Check: Ubuntu GCC S390X No vectorized CRC32
- GitHub Check: Ubuntu GCC S390X
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: Ubuntu GCC PPC
- GitHub Check: Ubuntu GCC MIPS64
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Ubuntu GCC ARM HF Compat No Opt
- GitHub Check: Ubuntu GCC ARM HF No NEON
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC Symbol Prefix
- GitHub Check: Ubuntu GCC S390X No vectorized CRC32
- GitHub Check: Ubuntu GCC S390X
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: Ubuntu GCC PPC
- GitHub Check: Ubuntu GCC MIPS64
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Ubuntu GCC ARM HF Compat No Opt
- GitHub Check: Ubuntu GCC ARM HF No NEON
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC Symbol Prefix
- GitHub Check: Ubuntu GCC S390X No vectorized CRC32
- GitHub Check: Ubuntu GCC S390X
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: Ubuntu GCC PPC
- GitHub Check: Ubuntu GCC MIPS64
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Ubuntu GCC ARM HF Compat No Opt
- GitHub Check: Ubuntu GCC ARM HF No NEON
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: macOS GCC Symbol Prefix
- GitHub Check: Ubuntu GCC S390X No vectorized CRC32
- GitHub Check: Ubuntu GCC S390X
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: Ubuntu GCC PPC
- GitHub Check: Ubuntu GCC MIPS64
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Ubuntu GCC ARM HF Compat No Opt
- GitHub Check: Ubuntu GCC ARM HF No NEON
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC S390X No vectorized CRC32
- GitHub Check: Ubuntu GCC S390X
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC No Power8
- GitHub Check: Ubuntu GCC PPC
- GitHub Check: Ubuntu GCC MIPS64
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Ubuntu GCC ARM HF Compat No Opt
- GitHub Check: Ubuntu GCC ARM HF No NEON
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC S390X No vectorized CRC32
- GitHub Check: Ubuntu GCC S390X
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC
- GitHub Check: Ubuntu GCC MIPS64
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Ubuntu GCC ARM HF Compat No Opt
- GitHub Check: Ubuntu GCC ARM HF No NEON
- GitHub Check: Windows ClangCl Win64 Native Instructions (AVX)
- GitHub Check: Windows MSVC 2019 v140 Win64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC S390X No vectorized CRC32
- GitHub Check: Ubuntu GCC S390X
- GitHub Check: Ubuntu GCC PPC64LE
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu GCC PPC
- GitHub Check: Ubuntu GCC MIPS64
- GitHub Check: Ubuntu GCC AARCH64
- GitHub Check: Ubuntu GCC ARM HF Compat No Opt
- GitHub Check: Ubuntu GCC ARM HF No NEON
🔇 Additional comments (7)
test/test_crc32.cc (1)
270-272: No concerns; logic is consistent with prior suggestions.
Enabling SSE2-based Chorba tests ifWITHOUT_CHORBAis unset,X86_SSE2is defined, andNO_CHORBA_SSE2is unset aligns well with the existing pattern for other CRC variants.arch/x86/x86_intrins.h (3)
93-111: MSVC 32-bit fallback approach looks good.
Defining_mm_cvtsi128_si64xand_mm_cvtsi64x_si128via a union for 32-bit MSVC is a safe solution when native intrinsics aren’t available. This avoids strict alignment pitfalls and is consistent with the feedback that we skip additional MSVC Clang checks.
114-117: Clang-specific macros are appropriately isolated.
Redirecting to_mm_cvtsi64_si128and_mm_cvtsi128_si64for Clang is concise and avoids conflicts with MSVC-specific paths.
119-126: GCC 32-bit handling with union is coherent.
Providing_mm_cvtsi128_si64xand_mm_cvtsi64x_si128for GCC 32-bit ensures a reliable fallback when compiler-provided intrinsics are missing. This approach is consistent with the rest of the file.arch/x86/chorba_sse2.c (3)
1-2: Conditional compilation is correct.
Limiting SSE2 Chorba CRC logic to cases whereX86_SSE2is defined (andWITHOUT_CHORBAis not) ensures this code is only built for SSE2-capable x86 targets.
14-25: Macros leverage guaranteed 16-byte alignment.
These macros rely on SSE-friendly alignment. In debug scenarios, consider adding optional assert checks for alignment if needed, but the current approach is acceptable for performance builds.
824-851: Wrapper function handles alignment and thresholds effectively.
The alignment offset calculation is bounded and safe. Selecting between large SSE2, small SSE2, or a fallback approach aligns with user expectations. Post-condition (c ^ 0xffffffff) is standard for CRC finalization.
Dead2
left a comment
There was a problem hiding this comment.
Although we found this to be a little slower than the generic Chorba on cpus with double-pumped SSE2 (Like Zacate for example), it is however good for the rest of the SSE2+.
0407b79 to
2d4c022
Compare
|
I just removed an extra function call or two (one to memcpy, the other to braid). It was faster for me on modern hardware, I suspect it'll be the same on older. |
Did we want to try to notch out that variant when assigning from the function table or did we just want to issue that caveat so those who are bit by that can build without it? |
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (3)
arch/x86/chorba_sse2.c (3)
43-44: Clarify or parameterize the magic constants in the loop condition.The expression
(i + 256 + 40 + 32 + 32) < lenincludes multiple hardcoded offsets. Defining or documenting these values (e.g., through named constants) would enhance readability and maintainability.
65-80: Consider pruning large blocks of commented code.The extensive commented-out code blocks, while possibly useful for reference, can clutter the file and make it harder to maintain. Removing or moving them to dedicated documentation might improve overall readability.
- /* - chorba1 ^= next1; - ... - chorba8 ^= chorba2; - */ + /* If needed, maintain a short comment + * describing the original scalar logic. */
26-850: Consider splitting large function logic for better maintainability.
chorba_small_nondestructive_sse2is lengthy and contains multiple loops, repeated patterns, and extensive commented code. Extracting sub-steps (e.g., repeated XOR+shift steps) into helper functions can simplify debugging and ease future modifications.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
CMakeLists.txt(1 hunks)arch/x86/Makefile.in(2 hunks)arch/x86/chorba_sse2.c(1 hunks)arch/x86/x86_functions.h(3 hunks)arch/x86/x86_intrins.h(1 hunks)configure(1 hunks)crc32.h(1 hunks)functable.c(1 hunks)test/benchmarks/benchmark_crc32.cc(1 hunks)test/test_crc32.cc(1 hunks)win32/Makefile.msc(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (7)
- crc32.h
- arch/x86/Makefile.in
- test/benchmarks/benchmark_crc32.cc
- win32/Makefile.msc
- CMakeLists.txt
- configure
- functable.c
🧰 Additional context used
🧠 Learnings (3)
arch/x86/x86_functions.h (2)
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
arch/x86/x86_intrins.h (2)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: MSVC does not define `__GNUC__`, so adding `!defined(_MSC_VER)` to GCC detection macros is redundant when `__GNUC__` is already being checked.
arch/x86/chorba_sse2.c (7)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:830-831
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The alignment calculation `((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15` is safe from overflow as it's mathematically bounded between 1 and 16, making the use of uintptr_t appropriate for this pointer arithmetic.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: For SSE2 optimizations, `_mm_cvtsi128_si64` should be used instead of `_mm_extract_epi64` (SSE4.1) for extracting 64-bit values from 128-bit vectors, as it generates more efficient movq instructions.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
🧬 Code Definitions (3)
test/test_crc32.cc (1)
arch/x86/x86_functions.h (1)
crc32_chorba_sse2(29-29)
arch/x86/x86_functions.h (1)
arch/generic/generic_functions.h (1)
crc32(14-14)
arch/x86/chorba_sse2.c (3)
arch/x86/x86_intrins.h (6)
_mm_cvtsi64x_si128(107-109)_mm_cvtsi64x_si128(107-107)_mm_cvtsi128_si64x(101-105)_mm_cvtsi128_si64x(101-101)_mm_cvtsi128_si64x(120-124)_mm_cvtsi128_si64x(120-120)crc32.h (1)
crc32_braid_internal(18-18)arch/x86/x86_functions.h (1)
crc32_chorba_sse2(29-29)
⏰ Context from checks skipped due to timeout of 90000ms (87)
- GitHub Check: Ubuntu 20.04 GCC
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: macOS Clang (Target 10.10)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Windows NMake x64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu 20.04 GCC
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: macOS Clang (Target 10.10)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Windows NMake x64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu 20.04 GCC
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: macOS Clang (Target 10.10)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Windows NMake x64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu 20.04 GCC
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: macOS Clang (Target 10.10)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Windows NMake x64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu 20.04 GCC
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: macOS Clang (Target 10.10)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Windows NMake x64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu 20.04 GCC
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: macOS Clang (Target 10.10)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Windows NMake x64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu 20.04 GCC
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: macOS Clang (Target 10.10)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Windows NMake x64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu 20.04 GCC
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: macOS Clang (Target 10.10)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Windows NMake x64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu 20.04 GCC
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: macOS Clang (Target 10.10)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Windows NMake x64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
- GitHub Check: Ubuntu GCC PPC64
- GitHub Check: Ubuntu 20.04 GCC
- GitHub Check: Windows MSVC 2022 v142 Win64
- GitHub Check: macOS Clang (Target 10.10)
- GitHub Check: Windows MSVC 2022 v143 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64 Native Instructions (AVX)
- GitHub Check: Ubuntu GCC SPARC64
- GitHub Check: Windows NMake x64
- GitHub Check: Ubuntu GCC AARCH64 ASAN
🔇 Additional comments (12)
test/test_crc32.cc (1)
270-272: Good conditional addition of the SSE2-based Chorba test.This guard appropriately checks for
WITHOUT_CHORBA,X86_SSE2, andNO_CHORBA_SSE2to enable the optimized CRC32 test only when supported and not explicitly disabled. This aligns with the feedback on defaulting to the SSE2 variant unless it’s intentionally excluded.arch/x86/x86_functions.h (3)
9-16: Well-defined workaround for the MSVC stack-corruption bug.Defining
NO_CHORBA_SSE2for MSVC v1920–v1929 (32-bit) is a practical measure that prevents the broken compiler from emitting problematic SSE2 code. The embedded comment and reference link provide helpful context. Ensure it doesn't accidentally disable SSE2 optimizations for other valid toolchains that share version numbering but lack the bug.
28-30: Correct selective declaration of the SSE2 Chorba CRC function.Exposing
crc32_chorba_sse2only whenWITHOUT_CHORBAis not defined avoids compilation issues for builds that intentionally exclude Chorba. This is consistent with the other CPU-specific CRC variants.
102-105: Appropriate conditional override ofnative_crc32.By undefining and then redefining
native_crc32tocrc32_chorba_sse2, you ensure the SSE2 Chorba CRC impl is used when not excluded byWITHOUT_CHORBAorNO_CHORBA_SSE2. This follows the established approach of linkingnative_crc32to the fastest available CRC method.arch/x86/x86_intrins.h (3)
93-110: MSVC-specific inline implementations of_mm_cvtsi128_si64xand_mm_cvtsi64x_si128.
- Uses a union-based approach for reading and writing 64-bit data from 128-bit registers, aiding compatibility with 32-bit MSVC builds that lack native intrinsics for 64-bit extraction.
- The fallback to stack memory is acceptable given MSVC’s limitations, though performance might be suboptimal compared to a direct register move on x64.
114-117: Clang macros aligned with existing intrinsics.Redirecting
_mm_cvtsi64x_si128and_mm_cvtsi128_si64xto Clang’s built-in_mm_cvtsi64_si128and_mm_cvtsi128_si64is straightforward and avoids duplication. No concerns here.
119-126: GCC 32-bit fallback using a union for strict-aliasing safety.This union-based approach avoids direct pointer-casting, reducing aliasing conflicts. The macro
_mm_cvtsi64x_si128(a)likewise sets the high 64 bits to 0. This implementation is coherent with the compiler-specific constraints.arch/x86/chorba_sse2.c (5)
1-2: Good approach to conditionally enable SSE2 Chorba CRC code.Wrapping the implementation with
#if !defined(WITHOUT_CHORBA) && defined(X86_SSE2)aligns with the existing optimization strategies and makes SSE2 optimizations automatically enabled when available.
14-17: Confirmed: Aligned loads are fine here.
READ_NEXTuses_mm_load_si128for aligned loading, which is valid as long as the caller enforces 16-byte alignment. The usage is consistent with the design and the explicit alignment checks in the wrapper function.
19-24: NEXT_ROUND macro looks consistent.The XOR and shift operations appear correctly mapped to SSE2 instructions without extraneous steps. This pattern aligns well with the performance goals for the Chorba CRC calculation.
805-810: Good SSE2 intrinsics usage for final extraction.Using
_mm_cvtsi128_si64xworks well with SSE2 and avoids SSE4.1-only intrinsics. This maintains broad CPU compatibility while retaining performance benefits.
852-859:Details
✅ Verification successful
Switch to a more portable type and verify external dependencies.
- Prefer using
size_tinstead ofunsigned longforalgn_diffto ensure consistent behavior on different platforms:- unsigned long algn_diff = ((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15; + size_t algn_diff = ((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15;
- Validate that
crc32_chorba_118960_nondestructiveis declared and accessible.
🏁 Script executed:
#!/bin/bash # Check if the function 'crc32_chorba_118960_nondestructive' is declared and accessible in the repository. rg -S "crc32_chorba_118960_nondestructive" .Length of output: 551
Action Required: Update variable type for portability and verify dependency usage
Replace the declaration of
algn_diffwith asize_tto ensure consistent behavior across platforms:- unsigned long algn_diff = ((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15; + size_t algn_diff = ((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15;The function
crc32_chorba_118960_nondestructivehas been confirmed to be declared and available in the repository.
The version that's currently in the generic implementation for 32768 byte buffers leverages the stack. It manages to autovectorize but unfortunately the trips to the stack hurt its performance for CPUs which need this the most. This version is explicitly SIMD vectorized and doesn't use trips to the stack. In my testing it's ~10% faster than the "small" variant, and about 42% faster than the "32768" variant.
2d4c022 to
94f0a09
Compare
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
arch/x86/chorba_sse2.c (2)
26-850: The chorba_small_nondestructive_sse2 implementation is highly optimized but lacks documentation.The SSE2 implementation of the Chorba algorithm is well-engineered, with:
- Efficient data loading through aligned reads
- Parallel processing of multiple 64-bit words
- Careful unrolling of the main loop to maximize throughput
- Appropriate handling of remaining data
However, the function would benefit from documentation explaining its purpose, parameters, and alignment requirements.
+/* Computes CRC32 checksum using SSE2-optimized Chorba algorithm + * Processes data in 32-byte chunks with unrolled loops for maximum throughput + * @param crc: Initial CRC value + * @param buf: Input buffer (must be 16-byte aligned) + * @param len: Length of the buffer in bytes (must be multiple of 8) + * @return: Computed CRC32 value + */ Z_INTERNAL uint32_t chorba_small_nondestructive_sse2(uint32_t crc, const uint64_t* buf, size_t len) {
852-879: The wrapper function needs documentation and uses a magic number.The
crc32_chorba_sse2function:
- Handles alignment of input buffer correctly
- Dispatches to the appropriate implementation based on buffer size
- Properly handles pre and post-conditioning of CRC value
However, the function should document the
CHORBA_LARGE_THRESHOLDconstant (32768) as mentioned in the PR description and add function documentation.+/* Wrapper function for CRC32 computation using SSE2-optimized Chorba algorithm + * Handles buffer alignment and dispatches to appropriate implementation based on size + * @param crc: Initial CRC value + * @param buf: Input buffer + * @param len: Length of the buffer in bytes + * @return: Computed CRC32 value + */ Z_INTERNAL uint32_t crc32_chorba_sse2(uint32_t crc, const uint8_t *buf, size_t len) { uint32_t c; uint64_t* aligned_buf; size_t aligned_len; c = (~crc) & 0xffffffff; unsigned long algn_diff = ((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15; if (algn_diff < len) { if (algn_diff) { c = crc32_braid_internal(c, buf, algn_diff); } aligned_buf = (uint64_t*) (buf + algn_diff); aligned_len = len - algn_diff; + /* Threshold for large buffer optimization (32768 bytes) */ + #ifndef CHORBA_LARGE_THRESHOLD + #define CHORBA_LARGE_THRESHOLD 32768 + #endif if(aligned_len > CHORBA_LARGE_THRESHOLD) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (11)
CMakeLists.txt(1 hunks)arch/x86/Makefile.in(2 hunks)arch/x86/chorba_sse2.c(1 hunks)arch/x86/x86_functions.h(3 hunks)arch/x86/x86_intrins.h(1 hunks)configure(1 hunks)crc32.h(1 hunks)functable.c(1 hunks)test/benchmarks/benchmark_crc32.cc(1 hunks)test/test_crc32.cc(1 hunks)win32/Makefile.msc(2 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
- crc32.h
- win32/Makefile.msc
- CMakeLists.txt
- functable.c
- configure
- arch/x86/Makefile.in
🧰 Additional context used
🧠 Learnings (3)
arch/x86/x86_functions.h (2)
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
arch/x86/x86_intrins.h (2)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: MSVC does not define `__GNUC__`, so adding `!defined(_MSC_VER)` to GCC detection macros is redundant when `__GNUC__` is already being checked.
arch/x86/chorba_sse2.c (7)
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:830-831
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The alignment calculation `((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15` is safe from overflow as it's mathematically bounded between 1 and 16, making the use of uintptr_t appropriate for this pointer arithmetic.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-03-26T15:10:53.588Z
Learning: For SSE2 optimizations, `_mm_cvtsi128_si64` should be used instead of `_mm_extract_epi64` (SSE4.1) for extracting 64-bit values from 128-bit vectors, as it generates more efficient movq instructions.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
🧬 Code Definitions (3)
arch/x86/x86_functions.h (1)
arch/generic/generic_functions.h (1)
crc32(14-14)
test/test_crc32.cc (1)
arch/x86/x86_functions.h (1)
crc32_chorba_sse2(29-29)
arch/x86/chorba_sse2.c (3)
arch/x86/x86_intrins.h (6)
_mm_cvtsi64x_si128(107-109)_mm_cvtsi64x_si128(107-107)_mm_cvtsi128_si64x(101-105)_mm_cvtsi128_si64x(101-101)_mm_cvtsi128_si64x(120-124)_mm_cvtsi128_si64x(120-120)crc32.h (1)
crc32_braid_internal(18-18)arch/x86/x86_functions.h (1)
crc32_chorba_sse2(29-29)
🔇 Additional comments (12)
test/test_crc32.cc (1)
270-272: Conditional Test for chorba_sse2 Looks GoodThis addition properly gates the SSE2-based Chorba CRC test behind relevant macros, ensuring it’s only built and run when applicable.
test/benchmarks/benchmark_crc32.cc (1)
71-75: Benchmark Condition is ConsistentHooking up the SSE2 benchmark for Chorba CRC is consistent with the test logic, using the correct macros to avoid conflicting builds.
arch/x86/x86_functions.h (3)
9-16: Safe Fallback for MSVC Stack Corruption IssueDefining
NO_CHORBA_SSE2for affected MSVC versions is a pragmatic way to work around the compiler bug. The inline comment references the MSVC ticket, aiding future maintenance.
28-30: New SSE2 Chorba CRC Declaration is ClearThe prototype appropriately wraps the declaration under
!defined(WITHOUT_CHORBA), preventing unwanted compilation when Chorba is disabled.
102-105: Correct Macro Override for Native CRCUndefining and redefining
native_crc32tocrc32_chorba_sse2ensures this optimized implementation is used by default when no other macro bars its usage.arch/x86/x86_intrins.h (3)
93-112: MSVC-Specific Union-Based ApproachUsing a union to convert between
__m128iandint64_tavoids strict aliasing issues. This is a valid fallback for 32-bit MSVC targets lacking direct 64-bit register moves.
114-117: Simple Clang MacrosDefining
_mm_cvtsi64x_si128and_mm_cvtsi128_si64xas Clang aliases is straightforward and avoids duplication.
119-126: GCC 32-Bit Workaround is EffectiveThe union-based method for
_mm_cvtsi128_si64xensures safe extraction on GCC 32-bit builds. Including the_mm_cvtsi64x_si128macro complements this solution.arch/x86/chorba_sse2.c (4)
1-11: Appropriate includes and conditional compilation.The file is correctly guarded with conditional compilation that only enables the code when CHORBA algorithm is available and SSE2 is supported. The necessary headers for CRC32 calculation and SSE2 operations are properly included.
14-24: The READ_NEXT and NEXT_ROUND macros are concise and efficient.These macros implement the core operations for the SSE2-optimized Chorba algorithm:
- READ_NEXT loads aligned data from memory efficiently
- NEXT_ROUND performs the bit-manipulation operations required by the Chorba algorithm using SIMD instructions
The implementation leverages SSE2 effectively to process multiple elements in parallel.
811-832: Efficient aligned memory copy instead of calling memcpy.The implementation efficiently handles the remainder of the input buffer by using SSE2 instructions for the aligned copy operations instead of relying on a memcpy call. This avoids the function call overhead and potentially allows for better compiler optimization.
805-810:Details
✅ Verification successful
Potential issue with 32-bit compilation.
The code uses
_mm_cvtsi128_si64xwhich might not be properly defined on 32-bit platforms. In the PR comments, there was discussion about MSVC having issues with these functions on 32-bit architecture.
🏁 Script executed:
#!/bin/bash # Check if x86_intrins.h provides proper fallbacks for these functions grep -A10 "_mm_cvtsi128_si64" arch/x86/x86_intrins.hLength of output: 632
Action: Confirming 32-bit Fallback for
_mm_cvtsi128_si64xUsageAfter verifying the implementation in
arch/x86/x86_intrins.h, the intrinsic_mm_cvtsi128_si64xnow has a proper fallback—even on 32-bit platforms. The header defines an inline function that converts a__m128ivalue to a 64-bit integer via a union, and for some configurations it maps_mm_cvtsi128_si64xto_mm_cvtsi128_si64. This approach appears to address the previously noted 32-bit MSVC concerns.
- File:
arch/x86/chorba_sse2.c(Lines 805–810) uses_mm_cvtsi128_si64xas expected.- File:
arch/x86/x86_intrins.hprovides the necessary fallback definitions.- Next Step: Please verify in your MSVC 32-bit testing that these definitions are picked up correctly.
I don't think we need to bother with that really. First of all it is an extremely narrow userbase, quite possibly zero users will use zlib-ng on this really old cpu. Also it is not that much slower, so combined with the rest of the SSE2 optimizations it is nevertheless a solid win. |
The version that's currently in the generic implementation for 32768 byte buffers leverages the stack. It manages to autovectorize but unfortunately the trips to the stack hurt its performance for CPUs which need this the most. This version is explicitly SIMD vectorized and doesn't use trips to the stack. In my testing it's ~10% faster than the "small" variant, and about 42% faster than the "32768" variant.
Summary by CodeRabbit
New Features
crc32_chorba_sse2, enhancing efficiency for larger datasets.chorba_sse2functionality across different environments.Tests
chorba_sse2implementation.chorba_sse2CRC32 functionality.crc32_braid_internalto enhance CRC32 functionality.