Skip to content

Rewrite chorba dispatch functions#2006

Merged
Dead2 merged 1 commit intodevelopfrom
chorba-dispatch
Nov 14, 2025
Merged

Rewrite chorba dispatch functions#2006
Dead2 merged 1 commit intodevelopfrom
chorba-dispatch

Conversation

@Dead2
Copy link
Copy Markdown
Member

@Dead2 Dead2 commented Nov 13, 2025

  • Unify crc32_chorba, chorba_sse2 and chorba_sse41 dispatch functions.
  • Fix alignment calculation in crc32_chorba.
  • Fix length check to happen early, avoiding several extra branches for too short lengths, this also allows removing one function call to crc32_braid_internal to handle fallthrough. Gbench shows ~0.15-0.25ns saved per call for lengths shorter than CHORBA_SMALL_THRESHOLD.
  • Avoid calculating aligned len if buffer is already aligned

- Fixed alignment diff calculation in crc32_chorba.
- Fixed length check to happen early, avoiding extra branches for too short lengths,
this also allows removing one function call to crc32_braid_internal to handle those.
Gbench shows ~0.15-0.25ns saved per call for lengths shorter than CHORBA_SMALL_THRESHOLD.
- Avoid calculating aligned len if buffer is already aligned
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 13, 2025

Walkthrough

Refactors CRC32 Chorba implementations across multiple architectures to introduce alignment-aware processing with multi-path dispatching based on input size thresholds. Consolidates control flow in generic C and SSE2/SSE4.1 codepaths, and updates threshold macro definitions in crc32.h to be conditionally based on OPTIMAL_CMP.

Changes

Cohort / File(s) Summary
Chorba alignment and multi-path dispatch
arch/generic/crc32_chorba_c.c, arch/x86/chorba_sse2.c, arch/x86/chorba_sse41.c
Reworked control flow to compute alignment offset and dispatch to tiered processing paths (large-threshold nondestructive, medium-threshold, small-threshold) based on aligned buffer size; consolidates branches and removes intermediate aligned_len variables in favor of direct len-based thresholds; adds fallback to braid path for inputs below alignment threshold.
Threshold macro definitions
crc32.h
Replaces fixed CHORBA_SMALL_THRESHOLD_32BIT with conditionally-defined CHORBA_SMALL_THRESHOLD macro: 72 when OPTIMAL_CMP == 64, otherwise 80.

Sequence Diagram

sequenceDiagram
    participant Caller
    participant crc32_chorba
    participant Alignment as Alignment Check
    participant LargePath as Large Path<br/>(118960)
    participant MediumPath as Medium Path<br/>(32768)
    participant SmallPath as Small Path
    participant BraidPath as Braid Path

    Caller->>crc32_chorba: input buffer, length, crc
    crc32_chorba->>Alignment: calculate algn_diff
    
    alt len > algn_diff + THRESHOLD
        Alignment->>Alignment: align buffer
        note over Alignment: Process aligned portion
        
        alt len > LARGE_THRESHOLD
            Alignment->>LargePath: crc32_chorba_118960_nondestructive
            LargePath-->>crc32_chorba: processed crc
        else len in medium window (OPTIMAL_CMP==64)
            Alignment->>MediumPath: crc32_chorba_32768_nondestructive
            MediumPath-->>crc32_chorba: processed crc
        else
            Alignment->>SmallPath: crc32_chorba_small_nondestructive
            SmallPath-->>crc32_chorba: processed crc
        end
    else
        Alignment->>BraidPath: crc32_braid_internal
        BraidPath-->>crc32_chorba: processed crc
    end
    
    crc32_chorba->>Caller: finalized crc (XOR 0xffffffff)
Loading

Possibly related PRs

Suggested labels

Architecture

Suggested reviewers

  • KungFuJesus
  • nmoinvaz

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Rewrite chorba dispatch functions' directly matches the main objective of the PR, which is to unify and rewrite the crc32_chorba, chorba_sse2, and chorba_sse41 dispatch functions.
Description check ✅ Passed The description clearly outlines the changes made: unifying dispatch functions, fixing alignment and length checks, avoiding aligned length calculations, and includes performance benchmark results, all matching the file-level summaries.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chorba-dispatch

📜 Recent review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8003f57 and b36e16b.

📒 Files selected for processing (4)
  • arch/generic/crc32_chorba_c.c (1 hunks)
  • arch/x86/chorba_sse2.c (1 hunks)
  • arch/x86/chorba_sse41.c (1 hunks)
  • crc32.h (1 hunks)
🧰 Additional context used
🧠 Learnings (10)
📓 Common learnings
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1833
File: test/test_crc32.cc:0-0
Timestamp: 2024-12-22T20:40:03.280Z
Learning: In the crc32_align test class (test/test_crc32.cc), the alignment offset parameter is always guaranteed to be non-negative because the test parameters are controlled and it never makes sense to have a negative offset.
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1921
File: arch/riscv/chunkset_rvv.c:97-102
Timestamp: 2025-06-09T16:46:28.468Z
Learning: In RISC-V chunkset_rvv.c CHUNKCOPY function, the alignment logic intentionally copies sizeof(chunk_t) bytes but advances pointers by align bytes to avoid unaligned access. This "re-reading" technique is a deliberate optimization to maintain proper alignment for subsequent operations, not a bug.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.

Applied to files:

  • crc32.h
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
  • arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.

Applied to files:

  • crc32.h
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
  • arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

Applied to files:

  • crc32.h
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
  • arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.

Applied to files:

  • crc32.h
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
  • arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-21T01:44:03.996Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.

Applied to files:

  • crc32.h
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
  • arch/generic/crc32_chorba_c.c
📚 Learning: 2024-12-25T19:45:06.009Z
Learnt from: samrussell
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_braid_c.c:596-597
Timestamp: 2024-12-25T19:45:06.009Z
Learning: We should verify out-of-bounds pointer arithmetic in chorba_118960_nondestructive() and related loops where "input + i" is used, especially when len < expected block sizes.

Applied to files:

  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
  • arch/generic/crc32_chorba_c.c
📚 Learning: 2025-06-09T16:46:28.468Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1921
File: arch/riscv/chunkset_rvv.c:97-102
Timestamp: 2025-06-09T16:46:28.468Z
Learning: In RISC-V chunkset_rvv.c CHUNKCOPY function, the alignment logic intentionally copies sizeof(chunk_t) bytes but advances pointers by align bytes to avoid unaligned access. This "re-reading" technique is a deliberate optimization to maintain proper alignment for subsequent operations, not a bug.

Applied to files:

  • arch/x86/chorba_sse2.c
  • arch/generic/crc32_chorba_c.c
📚 Learning: 2025-02-21T01:37:54.508Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:830-831
Timestamp: 2025-02-21T01:37:54.508Z
Learning: The alignment calculation `((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15` is safe from overflow as it's mathematically bounded between 1 and 16, making the use of uintptr_t appropriate for this pointer arithmetic.

Applied to files:

  • arch/x86/chorba_sse2.c
  • arch/generic/crc32_chorba_c.c
📚 Learning: 2024-12-22T20:40:03.280Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1833
File: test/test_crc32.cc:0-0
Timestamp: 2024-12-22T20:40:03.280Z
Learning: In the crc32_align test class (test/test_crc32.cc), the alignment offset parameter is always guaranteed to be non-negative because the test parameters are controlled and it never makes sense to have a negative offset.

Applied to files:

  • arch/generic/crc32_chorba_c.c
🔇 Additional comments (4)
crc32.h (1)

16-20: Conditional small-threshold macro looks consistent with callers

Defining CHORBA_SMALL_THRESHOLD off OPTIMAL_CMP while keeping CHORBA_SMALL_THRESHOLD_64BIT at 72 cleanly lets the generic dispatcher tune the 32‑bit small threshold without affecting the x86 64‑bit paths. This matches usage in crc32_chorba() and the SSE dispatchers, and doesn’t introduce ABI or behavioral changes.

arch/x86/chorba_sse2.c (1)

849-875: SSE2 dispatch realignment and short-length routing look correct

The new crc32_chorba_sse2 path:

  • Correctly computes a bounded 16‑byte algn_diff, uses braid for the unaligned prefix, and only advances buf/len when it will take the Chorba path.
  • Uses len > algn_diff + CHORBA_SMALL_THRESHOLD_64BIT, which is algebraically equivalent to the old aligned_len > threshold check once len is decremented by algn_diff, so the Chorba small function still only sees sufficiently large, 16‑byte‑aligned ranges.
  • Safely routes very short buffers through crc32_braid_internal only, avoiding unnecessary calls and any SSE alignment hazards.

Pre/post‑conditioning of c is preserved, so observable CRC results stay unchanged.

arch/x86/chorba_sse41.c (1)

307-335: SSE4.1 dispatcher’s length/alignment gating and tiered paths are sound

The refactored crc32_chorba_sse41:

  • Aligns once to 16 bytes, consumes the prefix via crc32_braid_internal, and only then dispatches on the aligned len.
  • Uses len > algn_diff + CHORBA_SMALL_THRESHOLD_64BIT to ensure the Chorba code only runs when there is enough aligned data; shorter buffers are handled entirely by braid.
  • Applies large / medium / small Chorba thresholds to the already‑aligned len, and the medium band (CHORBA_MEDIUM_LOWER_THRESHOLD < len <= CHORBA_MEDIUM_UPPER_THRESHOLD) guarantees that crc32_chorba_32768_nondestructive_sse41 only sees sizes where its internal bitbuffer/tail logic is safe.
  • Preserves the existing CRC complementing convention at entry/exit.

Overall the control flow now mirrors the generic dispatcher while respecting SSE4.1 preconditions.

arch/generic/crc32_chorba_c.c (1)

1450-1481: Generic Chorba dispatcher’s alignment and tiered routing look correct

The new crc32_chorba():

  • Safely aligns to 8 bytes, only subtracts algn_diff from len when len > algn_diff + CHORBA_SMALL_THRESHOLD, and processes any unaligned prefix via crc32_braid_internal, preventing underflow and OOB pointer arithmetic.
  • Uses the new CHORBA_SMALL_THRESHOLD to gate entry into the small Chorba variants, and only calls the 118960 and 32768 variants in size ranges where their internal bitbuffer/tail logic is well-defined.
  • Ensures that short buffers (or those barely longer than the alignment padding) fall back entirely to the braid implementation, which avoids unnecessary setup while keeping results identical.
  • Maintains the existing CRC pre/post‑conditioning contract around all dispatched helpers.

This unifies the control flow with the x86 dispatchers without introducing correctness or bounds issues.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@nmoinvaz
Copy link
Copy Markdown
Member

LGTM

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 14, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.99%. Comparing base (8003f57) to head (b36e16b).
⚠️ Report is 2 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2006      +/-   ##
===========================================
- Coverage    83.21%   78.99%   -4.22%     
===========================================
  Files          160      160              
  Lines        12955    13691     +736     
  Branches      3145     3459     +314     
===========================================
+ Hits         10780    10815      +35     
- Misses        1138     1831     +693     
- Partials      1037     1045       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Dead2 Dead2 merged commit a72d5f2 into develop Nov 14, 2025
302 of 304 checks passed
@Dead2 Dead2 mentioned this pull request Nov 16, 2025
@Dead2 Dead2 deleted the chorba-dispatch branch November 29, 2025 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants