Split CRC32 Braid and Chorba word types#1975
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughThis pull request refactors the CRC32 Chorba implementation by introducing a new private header Changes
Sequence Diagram(s)Skipped: Changes consist primarily of API refactoring, header reorganization, and type substitutions rather than introducing new control flow or multi-component interactions. Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 3 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches🧪 Generate unit tests (beta)
Tip Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (1)
crc32_chorba_p.h (1)
5-5: Consider reordering type definitions before threshold macros.Line 5 uses
chorba_word_tin the macro definition, but the typedef doesn't appear until lines 12-18. While this works because macros are expanded at usage time, it's unconventional and could confuse readers.Consider reordering to define
chorba_word_tandCHORBA_Wfirst (lines 11-18), then the threshold macros (lines 4-9). This would improve code clarity and follow the principle of defining dependencies before their use.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
arch/generic/crc32_c.c(2 hunks)arch/generic/crc32_chorba_c.c(8 hunks)arch/generic/generic_functions.h(0 hunks)arch/x86/chorba_sse2.c(2 hunks)arch/x86/chorba_sse41.c(2 hunks)crc32.h(0 hunks)crc32_braid_p.h(2 hunks)crc32_chorba_p.h(1 hunks)
💤 Files with no reviewable changes (2)
- arch/generic/generic_functions.h
- crc32.h
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Applied to files:
arch/generic/crc32_c.carch/x86/chorba_sse41.carch/generic/crc32_chorba_c.carch/x86/chorba_sse2.ccrc32_chorba_p.h
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Applied to files:
arch/generic/crc32_c.carch/x86/chorba_sse41.carch/generic/crc32_chorba_c.carch/x86/chorba_sse2.ccrc32_chorba_p.h
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Applied to files:
arch/generic/crc32_c.carch/x86/chorba_sse41.carch/generic/crc32_chorba_c.carch/x86/chorba_sse2.c
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Applied to files:
arch/generic/crc32_c.carch/x86/chorba_sse41.carch/x86/chorba_sse2.c
📚 Learning: 2025-02-21T01:44:03.996Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
Applied to files:
arch/generic/crc32_c.carch/x86/chorba_sse41.carch/x86/chorba_sse2.c
🧬 Code graph analysis (2)
crc32_braid_p.h (4)
arch/generic/crc32_c.c (1)
uint32_t(7-43)arch/x86/chorba_sse2.c (2)
uint32_t(27-851)uint32_t(853-880)arch/x86/chorba_sse41.c (2)
uint32_t(60-310)uint32_t(312-342)arch/generic/crc32_braid_c.c (2)
uint32_t(63-213)uint32_t(215-222)
arch/generic/crc32_chorba_c.c (4)
arch/generic/crc32_c.c (1)
uint32_t(7-43)arch/x86/chorba_sse2.c (2)
uint32_t(27-851)uint32_t(853-880)arch/x86/chorba_sse41.c (2)
uint32_t(60-310)uint32_t(312-342)zutil_p.h (1)
zng_alloc(12-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (92)
- GitHub Check: Windows GCC
- GitHub Check: Windows MSVC 2022 v142 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: macOS Clang
- GitHub Check: Windows GCC
- GitHub Check: Windows MSVC 2022 v142 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: macOS Clang
- GitHub Check: Windows MSVC ARM No Test
- GitHub Check: Windows GCC
- GitHub Check: Windows MSVC 2022 v142 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: macOS Clang
- GitHub Check: Windows MSVC ARM No Test
- GitHub Check: Windows GCC
- GitHub Check: Windows MSVC 2022 v142 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: macOS Clang
- GitHub Check: Windows MSVC ARM No Test
- GitHub Check: Windows GCC
- GitHub Check: Windows MSVC 2022 v142 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: macOS Clang
- GitHub Check: Windows MSVC ARM No Test
- GitHub Check: Windows GCC
- GitHub Check: Windows MSVC 2022 v142 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: macOS Clang
- GitHub Check: Windows MSVC ARM No Test
- GitHub Check: Windows GCC
- GitHub Check: Windows MSVC 2022 v142 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v141 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: macOS Clang
- GitHub Check: Windows MSVC ARM No Test
- GitHub Check: Windows GCC
- GitHub Check: Windows MSVC 2022 v142 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v141 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: macOS Clang
- GitHub Check: Windows MSVC ARM No Test
- GitHub Check: Windows GCC
- GitHub Check: Windows MSVC 2022 v142 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v141 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: macOS Clang
- GitHub Check: Windows MSVC ARM No Test
- GitHub Check: Windows GCC
- GitHub Check: Windows MSVC 2022 v142 Win32
- GitHub Check: Windows MSVC 2022 v143 Win64
- GitHub Check: Windows MSVC 2022 v141 Win32
- GitHub Check: Ubuntu MinGW i686
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: macOS Clang
🔇 Additional comments (18)
crc32_braid_p.h (2)
11-12: LGTM! Useful performance documentation.The TODO comment provides valuable context about platform-specific performance characteristics for POWER9, which aligns with the PR's goal of separating BRAID_W and CHORBA_W based on empirical benchmarking results.
42-42: LGTM! Proper encapsulation of internal function.Moving the
crc32_braid_internaldeclaration to this private header improves encapsulation and aligns with the broader PR changes that consolidate internal interfaces.crc32_chorba_p.h (2)
1-3: LGTM!Header guard follows proper naming conventions for private headers.
11-18: LGTM! Architecture detection is comprehensive.The architecture-specific type definitions properly distinguish between 64-bit and 32-bit platforms. The TODO comment on line 11 appropriately flags the open question about optimal CHORBA_W settings, which aligns with the PR's goal of separating BRAID_W and CHORBA_W based on empirical performance data.
arch/x86/chorba_sse41.c (2)
4-4: LGTM!Adding the
crc32_chorba_p.hinclude provides the necessarychorba_word_ttype definition and function declarations for the updated call site.
326-326: Ensure SSE4.1 code only builds on 64-bit. The cast(chorba_word_t*)aligned_bufrequiressizeof(chorba_word_t)==8; verify thatX86_SSE41is only enabled whenCHORBA_W==8(i.e. 64-bit targets) or add an explicit architecture guard or static assert.arch/x86/chorba_sse2.c (1)
4-4: LGTM!Adding the
crc32_chorba_p.hinclude provides the necessarychorba_word_ttype definition.arch/generic/crc32_c.c (3)
4-4: LGTM!Adding the
crc32_chorba_p.hinclude provides necessary type definitions and macros for the updated code paths.
22-22: LGTM! Clearer conditional using CHORBA_W.Replacing
OPTIMAL_CMP == 64withCHORBA_W == 8makes the conditional more explicit and aligns with the PR's goal of using architecture-specific Chorba word sizing. The guarded code paths use 64-bit functions, which correctly correspond toCHORBA_W == 8.
21-21: This cast is safe on both 32-bit and 64-bit builds
chorba_word_tis typedef’d in crc32_chorba_p.h to eitheruint32_t(when CHORBA_W==4) oruint64_t(when CHORBA_W==8), andaligned_buf` is 8-byte aligned (so it’s also valid for 4-byte loads). No change needed.arch/generic/crc32_chorba_c.c (8)
2-2: LGTM!The
zendian.hinclude provides the necessary byte-swapping macros for endianness handling.
6-6: LGTM!The
crc32_chorba_p.hinclude provides thechorba_word_ttype and related definitions.
12-13: LGTM! Correct bitbuffer sizing.The bitbuffer size definitions now correctly use
chorba_word_t, ensuring proper sizing for both 64-bit and 32-bit platforms.
32-32: LGTM! Function signature updated correctly.The function signature now uses
chorba_word_t*for the input parameter, which aligns with the declaration incrc32_chorba_p.hand enables architecture-specific word sizing.
34-36: LGTM! Buffer declarations updated correctly.Both the dynamic (EMSCRIPTEN) and stack-based buffer allocations now use
chorba_word_t, ensuring correct sizing for the target platform.
44-50: LGTM! Correct endianness handling.The initialization of
next1properly handles both little-endian and big-endian systems, using the appropriate byte-swapping based onCHORBA_W.
52-72: LGTM! Consistent variable type updates.All state variables (
next1throughnext22) now usechorba_word_t, maintaining consistency with the new word type throughout the algorithm.
76-115: LGTM! Correct loop bounds and indexing.The first pass loop now correctly uses
sizeof(chorba_word_t)for bounds calculation and array indexing, ensuring proper scaling for both 64-bit and 32-bit word sizes.
Codecov Report✅ All modified and coverable lines are covered by tests. Please upload reports for the commit 16455ed to get more accurate results. Additional details and impacted files@@ Coverage Diff @@
## develop #1975 +/- ##
===========================================
- Coverage 84.12% 81.95% -2.17%
===========================================
Files 158 161 +3
Lines 13055 13743 +688
Branches 3388 3117 -271
===========================================
+ Hits 10982 11263 +281
- Misses 988 1411 +423
+ Partials 1085 1069 -16 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
arch/generic/crc32_chorba_c.c
Outdated
| * @note Uses 128KB temporary buffer | ||
| */ | ||
| Z_INTERNAL uint32_t crc32_chorba_118960_nondestructive (uint32_t crc, const z_word_t* input, size_t len) { | ||
| Z_INTERNAL uint32_t crc32_chorba_118960_nondestructive (uint32_t crc, const uint8_t* buf, size_t len) { |
Check warning
Code scanning / CodeQL
Poorly documented large function Warning
arch/generic/crc32_chorba_c.c
Outdated
Check warning
Code scanning / CodeQL
Poorly documented large function Warning
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
arch/generic/crc32_chorba_c.c (1)
41-43: Restore constness on input viewsWe only read from
buf, so keeping these castsconstpreserves the original contract and prevents accidental writes later.- chorba_word_t* input = (chorba_word_t*) buf; - uint64_t* inputqwords = (uint64_t*) buf; + const chorba_word_t* input = (const chorba_word_t*) buf; + const uint64_t* inputqwords = (const uint64_t*) buf;
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
arch/generic/crc32_chorba_c.c(10 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Applied to files:
arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Applied to files:
arch/generic/crc32_chorba_c.c
📚 Learning: 2024-12-25T19:45:06.009Z
Learnt from: samrussell
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_braid_c.c:596-597
Timestamp: 2024-12-25T19:45:06.009Z
Learning: We should verify out-of-bounds pointer arithmetic in chorba_118960_nondestructive() and related loops where "input + i" is used, especially when len < expected block sizes.
Applied to files:
arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-18T02:53:23.621Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-18T02:53:23.621Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.
Applied to files:
arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Applied to files:
arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Applied to files:
arch/generic/crc32_chorba_c.c
🧬 Code graph analysis (1)
arch/generic/crc32_chorba_c.c (3)
arch/generic/crc32_c.c (1)
uint32_t(7-43)arch/x86/chorba_sse41.c (2)
uint32_t(60-311)uint32_t(313-343)zutil_p.h (1)
zng_alloc(12-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Windows MSVC 2022 v141 Win64
- GitHub Check: EL9 GCC S390X DFLTCC UBSAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (1)
crc32_chorba_p.h (1)
11-18: Consider additional 64-bit architectures.The TODO comment indicates potential expansion to more 64-bit architectures. Consider adding RISC-V 64-bit (
__riscvwith__riscv_xlen == 64) and LoongArch 64-bit (__loongarch64) to the condition, as these are increasingly common platforms that would benefit from 64-bit word processing.Example addition:
-#if defined(__x86_64__) || defined(_M_AMD64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__) +#if defined(__x86_64__) || defined(_M_AMD64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__) || (defined(__riscv) && __riscv_xlen == 64) || defined(__loongarch64)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
arch/generic/crc32_c.c(1 hunks)arch/generic/crc32_chorba_c.c(10 hunks)arch/x86/chorba_sse2.c(3 hunks)arch/x86/chorba_sse41.c(5 hunks)crc32_chorba_p.h(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- arch/x86/chorba_sse2.c
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Applied to files:
arch/generic/crc32_c.carch/x86/chorba_sse41.carch/generic/crc32_chorba_c.ccrc32_chorba_p.h
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Applied to files:
arch/generic/crc32_c.carch/x86/chorba_sse41.carch/generic/crc32_chorba_c.ccrc32_chorba_p.h
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Applied to files:
arch/generic/crc32_c.carch/x86/chorba_sse41.carch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-21T01:44:03.996Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
Applied to files:
arch/x86/chorba_sse41.c
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Applied to files:
arch/x86/chorba_sse41.carch/generic/crc32_chorba_c.c
📚 Learning: 2024-12-25T19:45:06.009Z
Learnt from: samrussell
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_braid_c.c:596-597
Timestamp: 2024-12-25T19:45:06.009Z
Learning: We should verify out-of-bounds pointer arithmetic in chorba_118960_nondestructive() and related loops where "input + i" is used, especially when len < expected block sizes.
Applied to files:
arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-18T02:53:23.621Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-18T02:53:23.621Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.
Applied to files:
arch/generic/crc32_chorba_c.c
🧬 Code graph analysis (4)
arch/generic/crc32_c.c (4)
arch/generic/crc32_chorba_c.c (4)
uint32_t(32-502)uint32_t(505-671)uint32_t(674-1239)uint32_t(1241-1453)arch/x86/chorba_sse2.c (2)
uint32_t(27-852)uint32_t(854-881)arch/x86/chorba_sse41.c (2)
uint32_t(60-311)uint32_t(313-343)arch/generic/crc32_braid_c.c (2)
uint32_t(63-213)uint32_t(215-222)
arch/x86/chorba_sse41.c (3)
arch/generic/crc32_chorba_c.c (4)
uint32_t(32-502)uint32_t(505-671)uint32_t(674-1239)uint32_t(1241-1453)arch/generic/crc32_c.c (1)
uint32_t(7-43)arch/x86/chorba_sse2.c (2)
uint32_t(27-852)uint32_t(854-881)
arch/generic/crc32_chorba_c.c (4)
arch/generic/crc32_c.c (1)
uint32_t(7-43)arch/x86/chorba_sse2.c (2)
uint32_t(27-852)uint32_t(854-881)arch/x86/chorba_sse41.c (2)
uint32_t(60-311)uint32_t(313-343)zutil_p.h (1)
zng_alloc(12-14)
crc32_chorba_p.h (3)
arch/generic/crc32_c.c (1)
uint32_t(7-43)arch/x86/chorba_sse2.c (2)
uint32_t(27-852)uint32_t(854-881)arch/x86/chorba_sse41.c (2)
uint32_t(60-311)uint32_t(313-343)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC -O1 UBSAN
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC -O1 UBSAN
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC -O1 UBSAN
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC -O1 UBSAN
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC -O1 UBSAN
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC -O1 UBSAN
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC -O1 UBSAN
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC -O1 UBSAN
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC -O1 UBSAN
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: macOS Clang ASAN (ARM64)
- GitHub Check: EL9 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC -O1 UBSAN
🔇 Additional comments (17)
arch/generic/crc32_c.c (4)
4-4: LGTM!The new include for
crc32_chorba_p.hprovides access to CHORBA_W, threshold macros, and Chorba function prototypes.
11-18: LGTM!The type change to
const uint8_t*foraligned_bufcorrectly aligns with the updated Chorba function signatures. The direct assignment without casting is appropriate since pointer arithmetic on byte buffers is straightforward.
13-13: LGTM!Using
uintptr_tforalgn_diffis correct for pointer arithmetic and improves portability across platforms where pointer andunsigned longsizes may differ.
21-32: LGTM!All function calls correctly pass
aligned_bufdirectly without casts, matching the updated function signatures that acceptconst uint8_t*. The conditional compilation usingCHORBA_W == 8properly selects between 64-bit and 32-bit Chorba variants.arch/x86/chorba_sse41.c (4)
4-4: LGTM!The include of
crc32_chorba_p.hprovides necessary CHORBA definitions and function prototypes.
15-15: LGTM!The prototype update for
chorba_small_nondestructive_sse2correctly reflects the new signature usingconst uint8_t*for the buffer parameter.
60-311: LGTM!The function signature correctly accepts
const uint8_t* bufand casts toconst uint64_t*internally. The caller ensures 16-byte alignment (as seen incrc32_chorba_sse41line 319), making the cast safe for SSE operations.
313-343: LGTM!The function correctly uses
const uint8_t*foraligned_bufand passes it directly to Chorba functions without casts. The alignment calculation ensures 16-byte alignment, which is appropriate for SSE operations.crc32_chorba_p.h (3)
1-3: LGTM!Standard include guard follows naming conventions.
20-23: LGTM!Function prototypes correctly use
Z_INTERNALvisibility andconst uint8_t*for buffer parameters, consistent with the new API design.
5-5: No change needed for CHORBA_LARGE_THRESHOLD: macros are expanded at use-time, sosizeof(chorba_word_t)compiles correctly despite appearing before the typedef.arch/generic/crc32_chorba_c.c (6)
1-14: LGTM!The includes and bitbuffer size macro definitions correctly use
chorba_word_tfor sizing, aligning with the new word type abstraction.
32-75: LGTM!The function signature correctly accepts
const uint8_t* bufand casts tochorba_word_t*internally. The endianness handling properly differentiates between 64-bit and 32-bit word sizes usingCHORBA_W.
78-174: LGTM!The loop correctly uses
sizeof(chorba_word_t)for all indexing calculations, ensuring consistency between the input pointer type and array accesses.
505-671: LGTM!The function correctly accepts
const uint8_t* buf, casts toconst uint64_t*, and usessizeof(uint64_t)consistently for indexing, as confirmed by previous review fixes.
674-1239: LGTM!The 64-bit small variant correctly uses
const uint8_t* bufparameter with internaluint64_t*casting and consistent endianness handling.
1241-1453: LGTM!The 32-bit small variant correctly uses
const uint8_t* bufparameter with internaluint32_t*casting and appropriate 32-bit endianness handling.
crc32_braid_p.h
Outdated
| /* Define BRAID_W and the associated z_word_t type. If BRAID_W is not defined, then a braided | ||
| calculation is not used, and the associated tables and code are not compiled. | ||
|
|
||
| TODO: According to crc32_braid_c.c, PowerPC POWER9 is fastest with BRAID_N=5, BRAID_W=4 |
There was a problem hiding this comment.
I'd appreciate it if someone with PowerPC hardware could check if this is still true, so that I'd know whether to keep or remove || defined(__powerpc64__) from the check.
There was a problem hiding this comment.
Would be useful to tag some people who have worked on the Power 8 and Power 9 functions to see if they have 64-bit PowerPC hardware available to test performance difference between the two choices... I mostly worked on 32-bit PowerPC and VMX/Altivec optimizations, but have 64-bit cross-compile toolchain to update the ABI files and test for compile failures.
ebb5864 to
b51544a
Compare
16455ed to
81c96c6
Compare
|
Static analysis CI shows some compiler warnings. Overall it looks good to me. I like how you moved things into |
|
81c96c6 to
1535be1
Compare
1535be1 to
aec7e25
Compare
This comment in
arch/generic/crc32_braid_c.csuggests thatBRAID_Wmay not be set to 8 on all 64-bit processors, but it may still be desirable to use uint64_t with Chorba on those platforms.