Skip to content

Split CRC32 Braid and Chorba word types#1975

Merged
Dead2 merged 3 commits intozlib-ng:developfrom
ccawley2011:chorba_w
Feb 17, 2026
Merged

Split CRC32 Braid and Chorba word types#1975
Dead2 merged 3 commits intozlib-ng:developfrom
ccawley2011:chorba_w

Conversation

@ccawley2011
Copy link
Copy Markdown
Contributor

@ccawley2011 ccawley2011 commented Oct 2, 2025

This comment in arch/generic/crc32_braid_c.c suggests that BRAID_W may not be set to 8 on all 64-bit processors, but it may still be desirable to use uint64_t with Chorba on those platforms.

  BRAID_N and BRAID_W are chosen empirically by benchmarking the execution time
  on a given processor. The choices for BRAID_N and BRAID_W below were based on
  testing on Intel Kaby Lake i7, AMD Ryzen 7, ARM Cortex-A57, Sparc64-VII, PowerPC
  POWER9, and MIPS64 Octeon II processors.
  The Intel, AMD, and ARM processors were all fastest with BRAID_N=5, BRAID_W=8.
  The Sparc, PowerPC, and MIPS64 were all fastest at BRAID_N=5, BRAID_W=4.
  They were all tested with either gcc or clang, all using the -O3 optimization
  level. Your mileage may vary.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Oct 2, 2025

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

This pull request refactors the CRC32 Chorba implementation by introducing a new private header crc32_chorba_p.h containing architecture-aware configuration and function prototypes. It relocates threshold macros from the public crc32.h to the private header, updates nondestructive Chorba function signatures to uniformly accept const uint8_t* buffers, and replaces z_word_t with a new chorba_word_t abstraction throughout the generic Chorba implementation and architecture-specific variants.

Changes

Cohort / File(s) Summary
Core Chorba Implementation
arch/generic/crc32_chorba_c.c
Replaced z_word_t with chorba_word_t throughout. Updated nondestructive function signatures from word-sized pointers to uniform const uint8_t* buffers. Changed macro references from BRAID_W to CHORBA_W. Reworked loops, memory allocation, bitbuffer operations, and endianness handling to operate on chorba_word_t.
Header Reorganization
crc32_chorba_p.h (new), crc32.h, arch/generic/generic_functions.h
Created new private header crc32_chorba_p.h with architecture-dependent chorba_word_t, CHORBA_W, size thresholds, and nondestructive function prototypes. Removed Chorba threshold macros and nondestructive declarations from public headers; moved to private header.
Architecture-Specific Updates
arch/x86/crc32_chorba_sse2.c, arch/x86/crc32_chorba_sse41.c
Updated to include crc32_chorba_p.h instead of crc32.h. Changed nondestructive function signatures to accept const uint8_t* with internal reinterpretation to word pointers. Updated call sites to pass byte-addressed buffers.
Include Cleanup
arch/arm/crc32_armv8.c, arch/arm/crc32_armv8_pmull_eor3.c, arch/loongarch/crc32_la.c, arch/x86/crc32_pclmulqdq_tpl.h, arch_functions.h, deflate.h, functable.h, inflate.h
Removed unnecessary crc32.h includes. No functional changes to implementation logic.
Build Configuration
CMakeLists.txt
Updated ZLIB_PRIVATE_HDRS to include crc32_chorba_p.h instead of crc32.h.
Documentation
crc32_braid_p.h
Added TODO comment regarding PowerPC POWER9 optimization tuning (BRAID_N=5, BRAID_W=4). No functional changes.

Sequence Diagram(s)

Skipped: Changes consist primarily of API refactoring, header reorganization, and type substitutions rather than introducing new control flow or multi-component interactions.

Possibly related PRs

Suggested labels

Architecture

Suggested reviewers

  • Dead2
  • nmoinvaz
🚥 Pre-merge checks | ✅ 3 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Split CRC32 Braid and Chorba word types' clearly and concisely describes the main change: separating BRAID_W and CHORBA_W configurations, allowing independent optimization of word types for different algorithms across architectures.
Description check ✅ Passed The description directly relates to the changeset by explaining the rationale: BRAID_W may not be 8 on all 64-bit processors, while Chorba may still benefit from uint64_t, necessitating separate word-type definitions and configurations.
Merge Conflict Detection ✅ Passed ✅ No merge conflicts detected when merging into develop

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

Issue Planner is now in beta. Read the docs and try it out! Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (1)
crc32_chorba_p.h (1)

5-5: Consider reordering type definitions before threshold macros.

Line 5 uses chorba_word_t in the macro definition, but the typedef doesn't appear until lines 12-18. While this works because macros are expanded at usage time, it's unconventional and could confuse readers.

Consider reordering to define chorba_word_t and CHORBA_W first (lines 11-18), then the threshold macros (lines 4-9). This would improve code clarity and follow the principle of defining dependencies before their use.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7d7db7a and e1ae8fc.

📒 Files selected for processing (8)
  • arch/generic/crc32_c.c (2 hunks)
  • arch/generic/crc32_chorba_c.c (8 hunks)
  • arch/generic/generic_functions.h (0 hunks)
  • arch/x86/chorba_sse2.c (2 hunks)
  • arch/x86/chorba_sse41.c (2 hunks)
  • crc32.h (0 hunks)
  • crc32_braid_p.h (2 hunks)
  • crc32_chorba_p.h (1 hunks)
💤 Files with no reviewable changes (2)
  • arch/generic/generic_functions.h
  • crc32.h
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.

Applied to files:

  • arch/generic/crc32_c.c
  • arch/x86/chorba_sse41.c
  • arch/generic/crc32_chorba_c.c
  • arch/x86/chorba_sse2.c
  • crc32_chorba_p.h
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.

Applied to files:

  • arch/generic/crc32_c.c
  • arch/x86/chorba_sse41.c
  • arch/generic/crc32_chorba_c.c
  • arch/x86/chorba_sse2.c
  • crc32_chorba_p.h
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.

Applied to files:

  • arch/generic/crc32_c.c
  • arch/x86/chorba_sse41.c
  • arch/generic/crc32_chorba_c.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

Applied to files:

  • arch/generic/crc32_c.c
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2025-02-21T01:44:03.996Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.

Applied to files:

  • arch/generic/crc32_c.c
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
🧬 Code graph analysis (2)
crc32_braid_p.h (4)
arch/generic/crc32_c.c (1)
  • uint32_t (7-43)
arch/x86/chorba_sse2.c (2)
  • uint32_t (27-851)
  • uint32_t (853-880)
arch/x86/chorba_sse41.c (2)
  • uint32_t (60-310)
  • uint32_t (312-342)
arch/generic/crc32_braid_c.c (2)
  • uint32_t (63-213)
  • uint32_t (215-222)
arch/generic/crc32_chorba_c.c (4)
arch/generic/crc32_c.c (1)
  • uint32_t (7-43)
arch/x86/chorba_sse2.c (2)
  • uint32_t (27-851)
  • uint32_t (853-880)
arch/x86/chorba_sse41.c (2)
  • uint32_t (60-310)
  • uint32_t (312-342)
zutil_p.h (1)
  • zng_alloc (12-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (92)
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v142 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: macOS Clang
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v142 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: macOS Clang
  • GitHub Check: Windows MSVC ARM No Test
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v142 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: macOS Clang
  • GitHub Check: Windows MSVC ARM No Test
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v142 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: macOS Clang
  • GitHub Check: Windows MSVC ARM No Test
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v142 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: macOS Clang
  • GitHub Check: Windows MSVC ARM No Test
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v142 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: macOS Clang
  • GitHub Check: Windows MSVC ARM No Test
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v142 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: macOS Clang
  • GitHub Check: Windows MSVC ARM No Test
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v142 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: macOS Clang
  • GitHub Check: Windows MSVC ARM No Test
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v142 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: macOS Clang
  • GitHub Check: Windows MSVC ARM No Test
  • GitHub Check: Windows GCC
  • GitHub Check: Windows MSVC 2022 v142 Win32
  • GitHub Check: Windows MSVC 2022 v143 Win64
  • GitHub Check: Windows MSVC 2022 v141 Win32
  • GitHub Check: Ubuntu MinGW i686
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC MIPS
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: macOS Clang
🔇 Additional comments (18)
crc32_braid_p.h (2)

11-12: LGTM! Useful performance documentation.

The TODO comment provides valuable context about platform-specific performance characteristics for POWER9, which aligns with the PR's goal of separating BRAID_W and CHORBA_W based on empirical benchmarking results.


42-42: LGTM! Proper encapsulation of internal function.

Moving the crc32_braid_internal declaration to this private header improves encapsulation and aligns with the broader PR changes that consolidate internal interfaces.

crc32_chorba_p.h (2)

1-3: LGTM!

Header guard follows proper naming conventions for private headers.


11-18: LGTM! Architecture detection is comprehensive.

The architecture-specific type definitions properly distinguish between 64-bit and 32-bit platforms. The TODO comment on line 11 appropriately flags the open question about optimal CHORBA_W settings, which aligns with the PR's goal of separating BRAID_W and CHORBA_W based on empirical performance data.

arch/x86/chorba_sse41.c (2)

4-4: LGTM!

Adding the crc32_chorba_p.h include provides the necessary chorba_word_t type definition and function declarations for the updated call site.


326-326: Ensure SSE4.1 code only builds on 64-bit. The cast (chorba_word_t*)aligned_buf requires sizeof(chorba_word_t)==8; verify that X86_SSE41 is only enabled when CHORBA_W==8 (i.e. 64-bit targets) or add an explicit architecture guard or static assert.

arch/x86/chorba_sse2.c (1)

4-4: LGTM!

Adding the crc32_chorba_p.h include provides the necessary chorba_word_t type definition.

arch/generic/crc32_c.c (3)

4-4: LGTM!

Adding the crc32_chorba_p.h include provides necessary type definitions and macros for the updated code paths.


22-22: LGTM! Clearer conditional using CHORBA_W.

Replacing OPTIMAL_CMP == 64 with CHORBA_W == 8 makes the conditional more explicit and aligns with the PR's goal of using architecture-specific Chorba word sizing. The guarded code paths use 64-bit functions, which correctly correspond to CHORBA_W == 8.


21-21: This cast is safe on both 32-bit and 64-bit builds
chorba_word_t is typedef’d in crc32_chorba_p.h to either uint32_t (when CHORBA_W==4) or uint64_t (when CHORBA_W==8), and aligned_buf` is 8-byte aligned (so it’s also valid for 4-byte loads). No change needed.

arch/generic/crc32_chorba_c.c (8)

2-2: LGTM!

The zendian.h include provides the necessary byte-swapping macros for endianness handling.


6-6: LGTM!

The crc32_chorba_p.h include provides the chorba_word_t type and related definitions.


12-13: LGTM! Correct bitbuffer sizing.

The bitbuffer size definitions now correctly use chorba_word_t, ensuring proper sizing for both 64-bit and 32-bit platforms.


32-32: LGTM! Function signature updated correctly.

The function signature now uses chorba_word_t* for the input parameter, which aligns with the declaration in crc32_chorba_p.h and enables architecture-specific word sizing.


34-36: LGTM! Buffer declarations updated correctly.

Both the dynamic (EMSCRIPTEN) and stack-based buffer allocations now use chorba_word_t, ensuring correct sizing for the target platform.


44-50: LGTM! Correct endianness handling.

The initialization of next1 properly handles both little-endian and big-endian systems, using the appropriate byte-swapping based on CHORBA_W.


52-72: LGTM! Consistent variable type updates.

All state variables (next1 through next22) now use chorba_word_t, maintaining consistency with the new word type throughout the algorithm.


76-115: LGTM! Correct loop bounds and indexing.

The first pass loop now correctly uses sizeof(chorba_word_t) for bounds calculation and array indexing, ensuring proper scaling for both 64-bit and 32-bit word sizes.

@codecov
Copy link
Copy Markdown

codecov bot commented Oct 2, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 81.95%. Comparing base (3f81a5c) to head (ebb5864).

⚠️ Current head ebb5864 differs from pull request most recent head 16455ed

Please upload reports for the commit 16455ed to get more accurate results.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1975      +/-   ##
===========================================
- Coverage    84.12%   81.95%   -2.17%     
===========================================
  Files          158      161       +3     
  Lines        13055    13743     +688     
  Branches      3388     3117     -271     
===========================================
+ Hits         10982    11263     +281     
- Misses         988     1411     +423     
+ Partials      1085     1069      -16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

* @note Uses 128KB temporary buffer
*/
Z_INTERNAL uint32_t crc32_chorba_118960_nondestructive (uint32_t crc, const z_word_t* input, size_t len) {
Z_INTERNAL uint32_t crc32_chorba_118960_nondestructive (uint32_t crc, const uint8_t* buf, size_t len) {

Check warning

Code scanning / CodeQL

Poorly documented large function Warning

Poorly documented function: fewer than 2% comments for a function of 449 lines.

Check warning

Code scanning / CodeQL

Poorly documented large function Warning

Poorly documented function: fewer than 2% comments for a function of 147 lines.
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
arch/generic/crc32_chorba_c.c (1)

41-43: Restore constness on input views

We only read from buf, so keeping these casts const preserves the original contract and prevents accidental writes later.

-    chorba_word_t* input = (chorba_word_t*) buf;
-    uint64_t* inputqwords = (uint64_t*) buf;
+    const chorba_word_t* input = (const chorba_word_t*) buf;
+    const uint64_t* inputqwords = (const uint64_t*) buf;
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 08cddf3 and e749797.

📒 Files selected for processing (1)
  • arch/generic/crc32_chorba_c.c (10 hunks)
🧰 Additional context used
🧠 Learnings (7)
📓 Common learnings
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.

Applied to files:

  • arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.

Applied to files:

  • arch/generic/crc32_chorba_c.c
📚 Learning: 2024-12-25T19:45:06.009Z
Learnt from: samrussell
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_braid_c.c:596-597
Timestamp: 2024-12-25T19:45:06.009Z
Learning: We should verify out-of-bounds pointer arithmetic in chorba_118960_nondestructive() and related loops where "input + i" is used, especially when len < expected block sizes.

Applied to files:

  • arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-18T02:53:23.621Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-18T02:53:23.621Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.

Applied to files:

  • arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.

Applied to files:

  • arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

Applied to files:

  • arch/generic/crc32_chorba_c.c
🧬 Code graph analysis (1)
arch/generic/crc32_chorba_c.c (3)
arch/generic/crc32_c.c (1)
  • uint32_t (7-43)
arch/x86/chorba_sse41.c (2)
  • uint32_t (60-311)
  • uint32_t (313-343)
zutil_p.h (1)
  • zng_alloc (12-14)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (30)
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
  • GitHub Check: Windows MSVC 2022 v141 Win64
  • GitHub Check: EL9 GCC S390X DFLTCC UBSAN
  • GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
crc32_chorba_p.h (1)

11-18: Consider additional 64-bit architectures.

The TODO comment indicates potential expansion to more 64-bit architectures. Consider adding RISC-V 64-bit (__riscv with __riscv_xlen == 64) and LoongArch 64-bit (__loongarch64) to the condition, as these are increasingly common platforms that would benefit from 64-bit word processing.

Example addition:

-#if defined(__x86_64__) || defined(_M_AMD64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__)
+#if defined(__x86_64__) || defined(_M_AMD64) || defined(__aarch64__) || defined(_M_ARM64) || defined(__powerpc64__) || (defined(__riscv) && __riscv_xlen == 64) || defined(__loongarch64)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e749797 and ebb5864.

📒 Files selected for processing (5)
  • arch/generic/crc32_c.c (1 hunks)
  • arch/generic/crc32_chorba_c.c (10 hunks)
  • arch/x86/chorba_sse2.c (3 hunks)
  • arch/x86/chorba_sse41.c (5 hunks)
  • crc32_chorba_p.h (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • arch/x86/chorba_sse2.c
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.

Applied to files:

  • arch/generic/crc32_c.c
  • arch/x86/chorba_sse41.c
  • arch/generic/crc32_chorba_c.c
  • crc32_chorba_p.h
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.

Applied to files:

  • arch/generic/crc32_c.c
  • arch/x86/chorba_sse41.c
  • arch/generic/crc32_chorba_c.c
  • crc32_chorba_p.h
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.

Applied to files:

  • arch/generic/crc32_c.c
  • arch/x86/chorba_sse41.c
  • arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-21T01:44:03.996Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.

Applied to files:

  • arch/x86/chorba_sse41.c
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

Applied to files:

  • arch/x86/chorba_sse41.c
  • arch/generic/crc32_chorba_c.c
📚 Learning: 2024-12-25T19:45:06.009Z
Learnt from: samrussell
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_braid_c.c:596-597
Timestamp: 2024-12-25T19:45:06.009Z
Learning: We should verify out-of-bounds pointer arithmetic in chorba_118960_nondestructive() and related loops where "input + i" is used, especially when len < expected block sizes.

Applied to files:

  • arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-18T02:53:23.621Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-18T02:53:23.621Z
Learning: In zlib-ng, when macro names and parameters are self-explanatory and scalar versions of the code provide context, additional documentation for the macros may not be necessary.

Applied to files:

  • arch/generic/crc32_chorba_c.c
🧬 Code graph analysis (4)
arch/generic/crc32_c.c (4)
arch/generic/crc32_chorba_c.c (4)
  • uint32_t (32-502)
  • uint32_t (505-671)
  • uint32_t (674-1239)
  • uint32_t (1241-1453)
arch/x86/chorba_sse2.c (2)
  • uint32_t (27-852)
  • uint32_t (854-881)
arch/x86/chorba_sse41.c (2)
  • uint32_t (60-311)
  • uint32_t (313-343)
arch/generic/crc32_braid_c.c (2)
  • uint32_t (63-213)
  • uint32_t (215-222)
arch/x86/chorba_sse41.c (3)
arch/generic/crc32_chorba_c.c (4)
  • uint32_t (32-502)
  • uint32_t (505-671)
  • uint32_t (674-1239)
  • uint32_t (1241-1453)
arch/generic/crc32_c.c (1)
  • uint32_t (7-43)
arch/x86/chorba_sse2.c (2)
  • uint32_t (27-852)
  • uint32_t (854-881)
arch/generic/crc32_chorba_c.c (4)
arch/generic/crc32_c.c (1)
  • uint32_t (7-43)
arch/x86/chorba_sse2.c (2)
  • uint32_t (27-852)
  • uint32_t (854-881)
arch/x86/chorba_sse41.c (2)
  • uint32_t (60-311)
  • uint32_t (313-343)
zutil_p.h (1)
  • zng_alloc (12-14)
crc32_chorba_p.h (3)
arch/generic/crc32_c.c (1)
  • uint32_t (7-43)
arch/x86/chorba_sse2.c (2)
  • uint32_t (27-852)
  • uint32_t (854-881)
arch/x86/chorba_sse41.c (2)
  • uint32_t (60-311)
  • uint32_t (313-343)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (40)
  • GitHub Check: macOS Clang ASAN (Intel)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC -O1 UBSAN
  • GitHub Check: macOS Clang ASAN (Intel)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC -O1 UBSAN
  • GitHub Check: macOS Clang ASAN (Intel)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC -O1 UBSAN
  • GitHub Check: macOS Clang ASAN (Intel)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC -O1 UBSAN
  • GitHub Check: macOS Clang ASAN (Intel)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC -O1 UBSAN
  • GitHub Check: macOS Clang ASAN (Intel)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC -O1 UBSAN
  • GitHub Check: macOS Clang ASAN (Intel)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC -O1 UBSAN
  • GitHub Check: macOS Clang ASAN (Intel)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC -O1 UBSAN
  • GitHub Check: macOS Clang ASAN (Intel)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC -O1 UBSAN
  • GitHub Check: macOS Clang ASAN (Intel)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: EL9 GCC S390X DFLTCC ASAN
  • GitHub Check: Ubuntu GCC -O1 UBSAN
🔇 Additional comments (17)
arch/generic/crc32_c.c (4)

4-4: LGTM!

The new include for crc32_chorba_p.h provides access to CHORBA_W, threshold macros, and Chorba function prototypes.


11-18: LGTM!

The type change to const uint8_t* for aligned_buf correctly aligns with the updated Chorba function signatures. The direct assignment without casting is appropriate since pointer arithmetic on byte buffers is straightforward.


13-13: LGTM!

Using uintptr_t for algn_diff is correct for pointer arithmetic and improves portability across platforms where pointer and unsigned long sizes may differ.


21-32: LGTM!

All function calls correctly pass aligned_buf directly without casts, matching the updated function signatures that accept const uint8_t*. The conditional compilation using CHORBA_W == 8 properly selects between 64-bit and 32-bit Chorba variants.

arch/x86/chorba_sse41.c (4)

4-4: LGTM!

The include of crc32_chorba_p.h provides necessary CHORBA definitions and function prototypes.


15-15: LGTM!

The prototype update for chorba_small_nondestructive_sse2 correctly reflects the new signature using const uint8_t* for the buffer parameter.


60-311: LGTM!

The function signature correctly accepts const uint8_t* buf and casts to const uint64_t* internally. The caller ensures 16-byte alignment (as seen in crc32_chorba_sse41 line 319), making the cast safe for SSE operations.


313-343: LGTM!

The function correctly uses const uint8_t* for aligned_buf and passes it directly to Chorba functions without casts. The alignment calculation ensures 16-byte alignment, which is appropriate for SSE operations.

crc32_chorba_p.h (3)

1-3: LGTM!

Standard include guard follows naming conventions.


20-23: LGTM!

Function prototypes correctly use Z_INTERNAL visibility and const uint8_t* for buffer parameters, consistent with the new API design.


5-5: No change needed for CHORBA_LARGE_THRESHOLD: macros are expanded at use-time, so sizeof(chorba_word_t) compiles correctly despite appearing before the typedef.

arch/generic/crc32_chorba_c.c (6)

1-14: LGTM!

The includes and bitbuffer size macro definitions correctly use chorba_word_t for sizing, aligning with the new word type abstraction.


32-75: LGTM!

The function signature correctly accepts const uint8_t* buf and casts to chorba_word_t* internally. The endianness handling properly differentiates between 64-bit and 32-bit word sizes using CHORBA_W.


78-174: LGTM!

The loop correctly uses sizeof(chorba_word_t) for all indexing calculations, ensuring consistency between the input pointer type and array accesses.


505-671: LGTM!

The function correctly accepts const uint8_t* buf, casts to const uint64_t*, and uses sizeof(uint64_t) consistently for indexing, as confirmed by previous review fixes.


674-1239: LGTM!

The 64-bit small variant correctly uses const uint8_t* buf parameter with internal uint64_t* casting and consistent endianness handling.


1241-1453: LGTM!

The 32-bit small variant correctly uses const uint8_t* buf parameter with internal uint32_t* casting and appropriate 32-bit endianness handling.

crc32_braid_p.h Outdated
/* Define BRAID_W and the associated z_word_t type. If BRAID_W is not defined, then a braided
calculation is not used, and the associated tables and code are not compiled.

TODO: According to crc32_braid_c.c, PowerPC POWER9 is fastest with BRAID_N=5, BRAID_W=4
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is there todo here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd appreciate it if someone with PowerPC hardware could check if this is still true, so that I'd know whether to keep or remove || defined(__powerpc64__) from the check.

Copy link
Copy Markdown
Collaborator

@mtl1979 mtl1979 Oct 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be useful to tag some people who have worked on the Power 8 and Power 9 functions to see if they have 64-bit PowerPC hardware available to test performance difference between the two choices... I mostly worked on 32-bit PowerPC and VMX/Altivec optimizations, but have 64-bit cross-compile toolchain to update the ABI files and test for compile failures.

@Dead2 Dead2 added the Rebase needed Please do a 'git rebase develop yourbranch' label Dec 2, 2025
@nmoinvaz nmoinvaz added cleanup Improving maintainability or removing code. Build Env labels Jan 9, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

@ccawley2011 ccawley2011 force-pushed the chorba_w branch 2 times, most recently from 16455ed to 81c96c6 Compare February 16, 2026 16:49
@coveralls
Copy link
Copy Markdown

coveralls commented Feb 16, 2026

Coverage Status

coverage: 94.831% (+0.8%) from 93.987%
when pulling aec7e25 on ccawley2011:chorba_w
into b9a22d6 on zlib-ng:develop.

@nmoinvaz nmoinvaz removed the Rebase needed Please do a 'git rebase develop yourbranch' label Feb 16, 2026
@nmoinvaz
Copy link
Copy Markdown
Member

nmoinvaz commented Feb 16, 2026

Static analysis CI shows some compiler warnings.

Overall it looks good to me. I like how you moved things into crc32_chorba_p.h. There are some minor clean up things I'm noticing, that are pre-existing, but they can be part of a separate PR that I can do later.

@nmoinvaz
Copy link
Copy Markdown
Member

/home/runner/work/zlib-ng/zlib-ng/arch/x86/crc32_chorba_sse2.c: In function ‘chorba_small_nondestructive_sse2’:
/home/runner/work/zlib-ng/zlib-ng/arch/x86/crc32_chorba_sse2.c:25:21: error: ‘input’ redeclared as different kind of symbol
   25 |     const uint64_t* input = (const uint64_t*)buf;
      |                     ^~~~~
/home/runner/work/zlib-ng/zlib-ng/arch/x86/crc32_chorba_sse2.c:23:83: note: previous definition of ‘input’ with type ‘const uint8_t *’ {aka ‘const unsigned char *’}
   23 | Z_INTERNAL uint32_t chorba_small_nondestructive_sse2(uint32_t crc, const uint8_t *input, size_t len) {
      |                                                                    ~~~~~~~~~~~~~~~^~~~~
/home/runner/work/zlib-ng/zlib-ng/arch/x86/crc32_chorba_sse2.c:25:46: error: ‘buf’ undeclared (first use in this function)
   25 |     const uint64_t* input = (const uint64_t*)buf;
      |                                              ^~~
/home/runner/work/zlib-ng/zlib-ng/arch/x86/crc32_chorba_sse2.c:25:46: note: each undeclared identifier is reported only once for each function it appears in
gmake[2]: *** [CMakeFiles/zlib-ng.dir/build.make:457: CMakeFiles/zlib-ng.dir/arch/x86/crc32_chorba_sse2.c.o] Error 1

Copy link
Copy Markdown
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dead2 Dead2 merged commit 4ee1a73 into zlib-ng:develop Feb 17, 2026
150 checks passed
@ccawley2011 ccawley2011 deleted the chorba_w branch February 17, 2026 22:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Build Env cleanup Improving maintainability or removing code.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants