Skip to content

Reorganize Chorba activation#2004

Merged
Dead2 merged 1 commit intodevelopfrom
chorba-reorganization
Nov 13, 2025
Merged

Reorganize Chorba activation#2004
Dead2 merged 1 commit intodevelopfrom
chorba-reorganization

Conversation

@Dead2
Copy link
Copy Markdown
Member

@Dead2 Dead2 commented Nov 12, 2025

Split out separately from #2003. Easier to review, and with disabled Chorba-sse variants on old MSVC versions.

  • WITHOUT_CHORBA will now only disable the crc32_chorba C fallback.

  • SSE2, SSE41 and pclmul variants will still be able to use their Chorba-algorithm based code, but their fallback to the generic crc32_chorba C code in SSE2 and SSE41 will be disabled, reducing their performance on really big input buffers (not used during deflate/inflate, only when calling crc32 directly).

  • Remove the crc32_c function (and its file crc32_c.c), instead use the normal functable routing to select between crc32_braid and crc32_chorba.

  • Disable sse2 and sse4.1 variants of Chorba-crc32 on MSVC older than 2022 due to code generation bug in 2019 causing segfaults.

  • Compile either crc32_chorba_small_nondestructive or crc32_chorba_small_nondestructive_32bit, not both. Shaves ~1.1KB off the library size.

  • Don't compile crc32_chorba_32768_nondestructive on 32bit arch.

@Dead2 Dead2 added the cleanup Improving maintainability or removing code. label Nov 12, 2025
@Dead2 Dead2 force-pushed the chorba-reorganization branch from 3302f0d to b3d891f Compare November 12, 2025 21:14
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Nov 12, 2025

Walkthrough

Replaces internal crc32_c with a new alignment-aware crc32_chorba dispatcher, removes crc32_c from builds and callers, remaps native_crc32 to crc32_chorba or crc32_braid via new WITHOUT_CHORBA / WITHOUT_CHORBA_SSE guards, adds an MSVC SSE guard, and updates SSE gating, templates, functable, tests, and benchmarks.

Changes

Cohort / File(s) Summary
Build system
CMakeLists.txt, Makefile.in, arch/generic/Makefile.in
Added MSVC guard (define WITHOUT_CHORBA_SSE for older MSVC); removed arch/generic/crc32_c.c from source/object lists and deleted its build rules.
Generic dispatcher & implementation
arch/generic/crc32_chorba_c.c, arch/generic/crc32_c.c
Added crc32_chorba alignment-aware dispatcher (crc32_chorba_c.c); removed the old crc32_c implementation file.
Generic headers & mapping
arch/generic/generic_functions.h, crc32.h
Removed crc32_c declaration; added crc32_chorba declaration under #ifndef WITHOUT_CHORBA; changed native_crc32 macro to map to crc32_chorba or crc32_braid; removed crc32_braid_internal declaration from crc32.h.
Arch-specific call sites
arch/riscv/crc32_zbc.c, arch/s390/crc32-vx.c
Removed extern for crc32_c; replaced calls to crc32_c with crc32_braid in short/unaligned/trailing paths; added arch_functions.h include in RISC‑V file.
x86 SSE Chorba & headers
arch/x86/chorba_sse2.c, arch/x86/chorba_sse41.c, arch/x86/x86_functions.h
Introduced WITHOUT_CHORBA_SSE gating for SSE Chorba paths; replaced some forward externs/includes with arch_functions.h; removed some public declarations; added chorba_small_nondestructive_sse2() declaration under new gate; updated READ_NEXT and Chorba threshold/selection gating.
x86 templates / fold
arch/x86/crc32_fold_pclmulqdq_tpl.h, arch/x86/crc32_pclmulqdq_tpl.h
Made previously gated Chorba/fold_12 code unconditional (removed WITHOUT_CHORBA guards); unified immintrin.h inclusion; added CRC32_FOLD_COPY variant.
Function dispatch / runtime init
functable.c
Replaced default fallback references from crc32_c to crc32_braid; registered crc32_chorba under WITH_OPTIM gates; updated SSE selection to use WITHOUT_CHORBA_SSE.
Tests & benchmarks
test/benchmarks/benchmark_crc32.cc, test/test_crc32.cc
Removed generic_chorba/generic (crc32_c) registrations/tests; added chorba_c (crc32_chorba) registration/test guarded by WITHOUT_CHORBA; moved SSE Chorba registrations/tests under `defined(X86_SSE2

Sequence Diagram(s)

sequenceDiagram
    participant Caller
    participant native_crc32
    participant crc32_chorba
    participant crc32_braid_internal
    participant crc32_braid

    Note right of native_crc32: compile-time mapping
    Caller->>native_crc32: call(crc, buf, len)
    alt WITHOUT_CHORBA not defined
        native_crc32->>crc32_chorba: dispatch(crc, buf, len)
        crc32_chorba->>crc32_braid_internal: handle unaligned prefix
        crc32_braid_internal-->>crc32_chorba: prefix_crc
        alt aligned_len > large_threshold
            crc32_chorba->>crc32_braid: large nondestructive Chorba path
        else aligned_len in medium_range
            crc32_chorba->>crc32_braid: medium/small Chorba path
        else
            crc32_chorba->>crc32_braid_internal: braid fallback
        end
        crc32_chorba-->>native_crc32: final_crc
    else
        native_crc32->>crc32_braid: direct braid path
        crc32_braid-->>native_crc32: final_crc
    end
    native_crc32-->>Caller: result
Loading

Possibly related PRs

Suggested labels

Architecture

Suggested reviewers

  • KungFuJesus
  • nmoinvaz

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 54.55% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Reorganize Chorba activation' accurately summarizes the main change: restructuring how the Chorba CRC32 algorithm is activated and gated across the codebase with new preprocessor conditions.
Description check ✅ Passed The description is directly related to the changeset, detailing the reorganization of Chorba activation, removal of crc32_c, introduction of WITHOUT_CHORBA_SSE, MSVC version handling, and compilation optimizations.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch chorba-reorganization

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (7)
arch/x86/chorba_sse41.c (1)

320-324: Document the two-level gating hierarchy for clarity.

The conditional structure enables:

  1. File-level: !WITHOUT_CHORBA_SSE - enables SSE41 Chorba variant
  2. Large-path: !WITHOUT_CHORBA - enables fallback to generic large Chorba

This allows WITHOUT_CHORBA to disable just the generic large path while keeping the SSE41 medium/small implementations. Consider adding a brief comment explaining this hierarchy, as the nested conditionals with the else continuation on line 323 may be non-obvious to future maintainers.

Consider adding a comment like:

// When WITHOUT_CHORBA is defined, skip the generic large path fallback
// but retain SSE41-optimized medium/small paths
#if !defined(WITHOUT_CHORBA)
    if(aligned_len > CHORBA_LARGE_THRESHOLD) {
        c = crc32_chorba_118960_nondestructive(c, (z_word_t*) aligned_buf, aligned_len);
    } else
#endif
arch/x86/crc32_fold_pclmulqdq_tpl.h (1)

108-136: Double-check XOR_INITIAL128 usage once per block.

Within the Chorba loop XOR_INITIAL128 is applied to chorba8 only (line 134). That’s fine if and only if the later xmm_t* loads in this loop should not receive XOR_INITIAL128 (unlike the standard 64B loop below). Please confirm the intended single application; otherwise, initial CRC may be mixed incorrectly.

CMakeLists.txt (1)

220-223: Macro naming: unify on WITHOUT_CHORBA_SSE.

Headers also define NO_CHORBA_SSE for v142 range. Consider removing NO_CHORBA_SSE and using a single WITHOUT_CHORBA_SSE everywhere to avoid drift. I.e., replace the header-only guard with this definition or map NO_CHORBA_SSE to WITHOUT_CHORBA_SSE centrally.

arch/x86/x86_functions.h (2)

27-31: Keep helper internal; avoid exposing non-API SSE2 helper.

chorba_small_nondestructive_sse2 is only used inside arch/x86/chorba_sse2.c. Drop this declaration from the public x86 header or mark it Z_INTERNAL to avoid accidental external use.

-    uint32_t chorba_small_nondestructive_sse2(uint32_t c, const uint64_t *aligned_buf, size_t aligned_len);
+    Z_INTERNAL uint32_t chorba_small_nondestructive_sse2(uint32_t c, const uint64_t *aligned_buf, size_t aligned_len);

Or remove it entirely if no other TU needs it.


13-15: Consolidate NO_CHORBA_SSE vs WITHOUT_CHORBA_SSE.

The header defines NO_CHORBA_SSE for a specific MSVC range, while other guards use WITHOUT_CHORBA_SSE. Use a single macro (prefer WITHOUT_CHORBA_SSE as set by CMake) and, if needed, shim NO_CHORBA_SSE to it to prevent configuration mismatches.

Also applies to: 39-41, 107-111, 123-126

test/test_crc32.cc (1)

294-296: Gating aligns with build flags; add a large-buffer check for chorba_c.

The new chorba_c tests are correctly guarded. Consider also running crc32_large_buf for chorba_c (as you do for others) to catch large‑threshold regressions.

Also applies to: 317-322

arch/generic/crc32_chorba_c.c (1)

1449-1478: Clarify 8‑byte vs 16‑byte alignment choice.

algn_diff targets 8B alignment (using a 16‑modulus trick). SSE code aligns to 16B. If 8B is intentional for generic C (to match uint64_t loads), please add a brief comment; otherwise, consider standardizing to the common “align to 16” pattern for consistency. Function’s xor in/out and path selection look correct.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 780838b and b3d891f.

📒 Files selected for processing (17)
  • CMakeLists.txt (1 hunks)
  • Makefile.in (0 hunks)
  • arch/generic/Makefile.in (0 hunks)
  • arch/generic/crc32_c.c (0 hunks)
  • arch/generic/crc32_chorba_c.c (1 hunks)
  • arch/generic/generic_functions.h (2 hunks)
  • arch/riscv/crc32_zbc.c (2 hunks)
  • arch/s390/crc32-vx.c (2 hunks)
  • arch/x86/chorba_sse2.c (2 hunks)
  • arch/x86/chorba_sse41.c (3 hunks)
  • arch/x86/crc32_fold_pclmulqdq_tpl.h (1 hunks)
  • arch/x86/crc32_pclmulqdq_tpl.h (0 hunks)
  • arch/x86/x86_functions.h (4 hunks)
  • crc32.h (0 hunks)
  • functable.c (4 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
  • test/test_crc32.cc (2 hunks)
💤 Files with no reviewable changes (5)
  • Makefile.in
  • arch/x86/crc32_pclmulqdq_tpl.h
  • arch/generic/crc32_c.c
  • arch/generic/Makefile.in
  • crc32.h
🧰 Additional context used
🧠 Learnings (23)
📓 Common learnings
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.

Applied to files:

  • functable.c
  • arch/generic/generic_functions.h
  • arch/s390/crc32-vx.c
  • arch/x86/x86_functions.h
  • CMakeLists.txt
  • arch/x86/chorba_sse2.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • test/test_crc32.cc
  • arch/riscv/crc32_zbc.c
  • arch/generic/crc32_chorba_c.c
  • arch/x86/chorba_sse41.c
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.

Applied to files:

  • functable.c
  • arch/generic/generic_functions.h
  • arch/s390/crc32-vx.c
  • arch/x86/x86_functions.h
  • CMakeLists.txt
  • arch/x86/chorba_sse2.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • test/test_crc32.cc
  • arch/riscv/crc32_zbc.c
  • arch/generic/crc32_chorba_c.c
  • arch/x86/chorba_sse41.c
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.

Applied to files:

  • functable.c
  • arch/generic/generic_functions.h
  • arch/s390/crc32-vx.c
  • arch/x86/x86_functions.h
  • CMakeLists.txt
  • arch/x86/chorba_sse2.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • test/test_crc32.cc
  • arch/riscv/crc32_zbc.c
  • arch/generic/crc32_chorba_c.c
  • arch/x86/chorba_sse41.c
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

Applied to files:

  • functable.c
  • arch/generic/generic_functions.h
  • arch/s390/crc32-vx.c
  • arch/x86/x86_functions.h
  • CMakeLists.txt
  • arch/x86/chorba_sse2.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • test/test_crc32.cc
  • arch/riscv/crc32_zbc.c
  • arch/generic/crc32_chorba_c.c
  • arch/x86/chorba_sse41.c
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2024-10-29T02:22:52.846Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.

Applied to files:

  • functable.c
  • arch/x86/x86_functions.h
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/chorba_sse41.c
📚 Learning: 2025-02-21T01:44:03.996Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.

Applied to files:

  • functable.c
  • arch/x86/x86_functions.h
  • arch/x86/chorba_sse2.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • test/test_crc32.cc
  • arch/generic/crc32_chorba_c.c
  • arch/x86/chorba_sse41.c
📚 Learning: 2025-02-23T16:51:54.545Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.

Applied to files:

  • arch/x86/x86_functions.h
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • test/test_crc32.cc
  • arch/x86/chorba_sse41.c
📚 Learning: 2024-10-08T21:51:45.330Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.

Applied to files:

  • arch/x86/x86_functions.h
  • arch/x86/chorba_sse2.c
  • arch/x86/chorba_sse41.c
📚 Learning: 2024-12-20T23:35:59.830Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1830
File: arch/generic/generic_functions.h:39-53
Timestamp: 2024-12-20T23:35:59.830Z
Learning: The `longest_match_unaligned_*` functions are templated using the LONGEST_MATCH macro in match_tpl.h, so their implementations are generated rather than explicitly defined.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2024-10-08T19:37:14.998Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1797
File: inflate.c:84-86
Timestamp: 2024-10-08T19:37:14.998Z
Learning: When `INFLATE_ALLOW_INVALID_DISTANCE_TOOFAR_ARRR` is not defined, `state->sane` is completely removed from the codebase and does not need to be initialized.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2024-09-25T16:25:56.686Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1797
File: inflate.c:81-83
Timestamp: 2024-09-25T16:25:56.686Z
Learning: When `INFLATE_STRICT` is not defined, the variable `state->dmax` is completely removed from the code.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2025-02-23T16:50:50.925Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define `__GNUC__`, so adding `!defined(_MSC_VER)` to GCC detection macros is redundant when `__GNUC__` is already being checked.

Applied to files:

  • CMakeLists.txt
📚 Learning: 2024-10-29T02:22:55.489Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1805
File: arch/x86/chunkset_avx512.c:32-34
Timestamp: 2024-10-29T02:22:55.489Z
Learning: In `arch/x86/chunkset_avx512.c`, the `gen_mask` function's `len` parameter cannot exceed 32 because it is only called on the remaining bytes from a 32-byte vector.

Applied to files:

  • CMakeLists.txt
  • arch/riscv/crc32_zbc.c
  • arch/x86/chorba_sse41.c
📚 Learning: 2024-10-08T19:37:14.998Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1778
File: chunkset_tpl.h:164-165
Timestamp: 2024-10-08T19:37:14.998Z
Learning: In `chunkset_tpl.h`, using `goto` in the `CHUNKMEMSET` function aids the compiler in inlining the function, so it should be retained.

Applied to files:

  • CMakeLists.txt
📚 Learning: 2025-02-21T01:37:54.508Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:830-831
Timestamp: 2025-02-21T01:37:54.508Z
Learning: The alignment calculation `((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15` is safe from overflow as it's mathematically bounded between 1 and 16, making the use of uintptr_t appropriate for this pointer arithmetic.

Applied to files:

  • arch/x86/chorba_sse2.c
  • arch/x86/chorba_sse41.c
📚 Learning: 2024-12-25T19:45:06.009Z
Learnt from: samrussell
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_braid_c.c:596-597
Timestamp: 2024-12-25T19:45:06.009Z
Learning: We should verify out-of-bounds pointer arithmetic in chorba_118960_nondestructive() and related loops where "input + i" is used, especially when len < expected block sizes.

Applied to files:

  • arch/x86/chorba_sse2.c
  • arch/generic/crc32_chorba_c.c
  • arch/x86/chorba_sse41.c
📚 Learning: 2025-06-09T16:46:28.468Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1921
File: arch/riscv/chunkset_rvv.c:97-102
Timestamp: 2025-06-09T16:46:28.468Z
Learning: In RISC-V chunkset_rvv.c CHUNKCOPY function, the alignment logic intentionally copies sizeof(chunk_t) bytes but advances pointers by align bytes to avoid unaligned access. This "re-reading" technique is a deliberate optimization to maintain proper alignment for subsequent operations, not a bug.

Applied to files:

  • arch/x86/chorba_sse2.c
  • arch/riscv/crc32_zbc.c
  • arch/x86/chorba_sse41.c
📚 Learning: 2024-12-22T20:40:03.280Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1833
File: test/test_crc32.cc:0-0
Timestamp: 2024-12-22T20:40:03.280Z
Learning: In the crc32_align test class (test/test_crc32.cc), the alignment offset parameter is always guaranteed to be non-negative because the test parameters are controlled and it never makes sense to have a negative offset.

Applied to files:

  • arch/x86/chorba_sse2.c
  • test/test_crc32.cc
  • arch/x86/chorba_sse41.c
📚 Learning: 2025-04-15T09:30:10.081Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:30:10.081Z
Learning: Feature detection modules like riscv_features.c should not be compiled with feature-specific flags (like RVVFLAG) because they need to be compilable on all systems regardless of feature support. These modules perform runtime detection and must initialize feature availability flags to zero on unsupported systems.

Applied to files:

  • arch/riscv/crc32_zbc.c
📚 Learning: 2025-06-10T07:38:03.297Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1921
File: arch/riscv/chunkset_rvv.c:103-104
Timestamp: 2025-06-10T07:38:03.297Z
Learning: In RISC-V chunkset_rvv.c CHUNKCOPY function, when dist < sizeof(chunk_t), the vl variable intentionally becomes 0, causing the while loop to not execute. This is correct behavior because copying full chunks is not safe when the distance is smaller than chunk size, and the function appropriately falls back to memcpy for handling remaining bytes.

Applied to files:

  • arch/riscv/crc32_zbc.c
📚 Learning: 2024-10-07T21:18:37.806Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.

Applied to files:

  • arch/x86/chorba_sse41.c
📚 Learning: 2024-10-07T22:00:02.180Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: chunkset_tpl.h:92-108
Timestamp: 2024-10-07T22:00:02.180Z
Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.

Applied to files:

  • arch/x86/chorba_sse41.c
🧬 Code graph analysis (2)
arch/x86/crc32_fold_pclmulqdq_tpl.h (1)
arch/x86/crc32_pclmulqdq_tpl.h (2)
  • fold_12 (168-208)
  • fold_4 (125-166)
arch/generic/crc32_chorba_c.c (4)
arch/riscv/crc32_zbc.c (1)
  • uint32_t (83-98)
arch/x86/chorba_sse2.c (2)
  • uint32_t (23-847)
  • uint32_t (849-879)
arch/x86/chorba_sse41.c (2)
  • uint32_t (55-305)
  • uint32_t (307-340)
arch/generic/crc32_braid_c.c (2)
  • uint32_t (63-213)
  • uint32_t (215-222)
🔇 Additional comments (12)
arch/riscv/crc32_zbc.c (3)

9-9: LGTM! Include provides necessary declarations.

The addition of arch_functions.h is necessary to provide the declaration of crc32_braid used in the fallback paths below.


91-91: Consistent with the change at line 86.

This change correctly replaces crc32_c with crc32_braid for handling unaligned trailing bytes, maintaining consistency with the short-length fallback path.


86-86: No changes needed—replacement is correct.

The verification confirms that crc32_braid is the established baseline CRC32 implementation used consistently throughout the codebase as a fallback for short buffers and unaligned data. It's declared in arch/generic/generic_functions.h, implemented with proper pre/post-conditioning in arch/generic/crc32_braid_c.c, and serves as the default native_crc32 in the functable. This replacement pattern is identical across multiple architecture-specific implementations (S390, x86, generic), confirming that the change at line 86 aligns with the codebase's established refactoring approach.

arch/generic/generic_functions.h (2)

28-34: LGTM - Clean declaration of crc32_chorba with proper gating.

The new crc32_chorba function is properly declared under the #ifndef WITHOUT_CHORBA guard, consistent with the PR's goal to make this the new generic CRC implementation that can be selectively disabled.


53-57: LGTM - Correct conditional mapping for native_crc32.

The conditional mapping properly selects between crc32_chorba (when Chorba is enabled) and crc32_braid (when Chorba is disabled), replacing the previous crc32_c approach. This aligns with the PR's architectural changes.

arch/x86/chorba_sse41.c (3)

1-1: LGTM - Correct feature gating for SSE41 Chorba variant.

The change from WITHOUT_CHORBA to WITHOUT_CHORBA_SSE properly separates the SSE-specific Chorba implementations from the generic C fallback, as described in the PR objectives.


10-10: LGTM - Cleaner include structure.

Consolidating to arch_functions.h removes the need for forward declarations and improves maintainability.


12-15: No actionable issues found; macro implementation is correct.

The READ_NEXT macro correctly loads two contiguous __m128i values: a from byte offset off and b from offset off + 16. The macro uses _mm_load_si128 (aligned load), which is appropriate since alignment is enforced by the calling function's loop structure (i += 32 ensures proper alignment). This implementation is identical to the working code in chorba_sse2.c and was part of the original SSE4.1 implementation.

test/benchmarks/benchmark_crc32.cc (1)

65-73: LGTM - Correct benchmark registration structure.

The separation of generic Chorba (chorba_c under #ifndef WITHOUT_CHORBA) from SSE-specific variants (chorba_sse2 and chorba_sse41 under WITHOUT_CHORBA_SSE guards) correctly reflects the new two-tier gating architecture introduced in this PR.

arch/s390/crc32-vx.c (1)

205-206: Switch to braid looks correct; pre/post-conditioning preserved.

Using crc32_braid for short, prealign, and tail maintains the xor-in/out contract around crc32_le_vgfm_16. No behavioral regression anticipated.
If you have a reproducer for MSVC/Clang s390 builds, please run crc32 known-vectors to double‑check chaining across the three segments (short, prealign+VX, tail).

Also applies to: 210-212, 218-221

arch/x86/chorba_sse2.c (1)

854-872: Good: consistent use of crc32_braid_internal with explicit xor in/out.

Prealign with braid_internal, then aligned Chorba path(s), and small fallback are all using the same pre/post-conditioning. Alignment math is the common 16-byte form. LGTM.
If you tweak thresholds later, keep SSE2’s 16B load requirement in mind; the wrapper is the right place to enforce it. Based on learnings.

functable.c (1)

83-86: Runtime selection order is sane; braid as baseline, Chorba when enabled.

Defaults to braid, promotes to generic Chorba when allowed, and then to arch paths (SSE2/SSE41/PCLMUL etc.). Good separation and precedence. Ensure WITHOUT_CHORBA_SSE is consistently defined by CMake across all TUs.

Also applies to: 98-103, 113-117, 118-136, 146-151

@Dead2 Dead2 force-pushed the chorba-reorganization branch 2 times, most recently from 8ada950 to 71f36da Compare November 12, 2025 21:44
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
arch/x86/x86_functions.h (1)

13-15: Critical: Macro name mismatch will break MSVC 2019 workaround.

The macro is defined as NO_CHORBA_SSE on line 14, but the entire codebase uses WITHOUT_CHORBA_SSE (lines 27, 39, 107, 123 in this file, and throughout the PR). This means the MSVC 2019 code-generation bug workaround will never activate, potentially causing segfaults on affected platforms.

Apply this diff:

-#define NO_CHORBA_SSE
+#define WITHOUT_CHORBA_SSE
🧹 Nitpick comments (1)
arch/x86/crc32_fold_pclmulqdq_tpl.h (1)

108-324: Broaden Chorba loop eligibility to cover 640 B spans

Each iteration consumes 128 B of preload plus 512 B of folded payload, so the loop only needs 640 B available. The current guard (512 + 64 + 16*8) demands 704 B, which skips the optimized path for inputs between 640 B and 703 B and falls back to the slower 64 B loop instead. Dropping the extra 64 B keeps the safety margin but lets those lengths benefit from the fast path.

-    while (len >= 512 + 64 + 16*8) {
+    while (len >= 16*8 + 32*16) {
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8ada950 and 71f36da.

📒 Files selected for processing (17)
  • CMakeLists.txt (1 hunks)
  • Makefile.in (0 hunks)
  • arch/generic/Makefile.in (0 hunks)
  • arch/generic/crc32_c.c (0 hunks)
  • arch/generic/crc32_chorba_c.c (3 hunks)
  • arch/generic/generic_functions.h (2 hunks)
  • arch/riscv/crc32_zbc.c (2 hunks)
  • arch/s390/crc32-vx.c (2 hunks)
  • arch/x86/chorba_sse2.c (2 hunks)
  • arch/x86/chorba_sse41.c (3 hunks)
  • arch/x86/crc32_fold_pclmulqdq_tpl.h (1 hunks)
  • arch/x86/crc32_pclmulqdq_tpl.h (0 hunks)
  • arch/x86/x86_functions.h (4 hunks)
  • crc32.h (0 hunks)
  • functable.c (4 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
  • test/test_crc32.cc (2 hunks)
💤 Files with no reviewable changes (5)
  • arch/x86/crc32_pclmulqdq_tpl.h
  • crc32.h
  • arch/generic/Makefile.in
  • arch/generic/crc32_c.c
  • Makefile.in
🚧 Files skipped from review as they are similar to previous changes (2)
  • arch/generic/crc32_chorba_c.c
  • CMakeLists.txt
🧰 Additional context used
🧠 Learnings (23)
📓 Common learnings
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: In zlib-ng, the policy for supporting old compilers is to maintain compatibility for up to two years after GitHub removes support for them in CI, or when there is no long-time contributor left testing for the architecture if the architecture was not supported by GitHub Actions at all.
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

Applied to files:

  • arch/s390/crc32-vx.c
  • arch/riscv/crc32_zbc.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • test/test_crc32.cc
  • arch/x86/chorba_sse41.c
  • functable.c
  • arch/x86/chorba_sse2.c
  • arch/generic/generic_functions.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.

Applied to files:

  • arch/s390/crc32-vx.c
  • arch/riscv/crc32_zbc.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • test/test_crc32.cc
  • arch/x86/chorba_sse41.c
  • functable.c
  • arch/x86/chorba_sse2.c
  • arch/generic/generic_functions.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.

Applied to files:

  • arch/s390/crc32-vx.c
  • arch/riscv/crc32_zbc.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • test/test_crc32.cc
  • arch/x86/chorba_sse41.c
  • functable.c
  • arch/x86/chorba_sse2.c
  • arch/generic/generic_functions.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.

Applied to files:

  • arch/s390/crc32-vx.c
  • arch/riscv/crc32_zbc.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • test/test_crc32.cc
  • arch/x86/chorba_sse41.c
  • functable.c
  • arch/x86/chorba_sse2.c
  • arch/generic/generic_functions.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2025-06-09T16:46:28.468Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1921
File: arch/riscv/chunkset_rvv.c:97-102
Timestamp: 2025-06-09T16:46:28.468Z
Learning: In RISC-V chunkset_rvv.c CHUNKCOPY function, the alignment logic intentionally copies sizeof(chunk_t) bytes but advances pointers by align bytes to avoid unaligned access. This "re-reading" technique is a deliberate optimization to maintain proper alignment for subsequent operations, not a bug.

Applied to files:

  • arch/riscv/crc32_zbc.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2025-04-15T09:30:10.081Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1904
File: arch/riscv/Makefile.in:12-14
Timestamp: 2025-04-15T09:30:10.081Z
Learning: Feature detection modules like riscv_features.c should not be compiled with feature-specific flags (like RVVFLAG) because they need to be compilable on all systems regardless of feature support. These modules perform runtime detection and must initialize feature availability flags to zero on unsupported systems.

Applied to files:

  • arch/riscv/crc32_zbc.c
📚 Learning: 2025-06-10T07:38:03.297Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1921
File: arch/riscv/chunkset_rvv.c:103-104
Timestamp: 2025-06-10T07:38:03.297Z
Learning: In RISC-V chunkset_rvv.c CHUNKCOPY function, when dist < sizeof(chunk_t), the vl variable intentionally becomes 0, causing the while loop to not execute. This is correct behavior because copying full chunks is not safe when the distance is smaller than chunk size, and the function appropriately falls back to memcpy for handling remaining bytes.

Applied to files:

  • arch/riscv/crc32_zbc.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
📚 Learning: 2024-10-29T02:22:55.489Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1805
File: arch/x86/chunkset_avx512.c:32-34
Timestamp: 2024-10-29T02:22:55.489Z
Learning: In `arch/x86/chunkset_avx512.c`, the `gen_mask` function's `len` parameter cannot exceed 32 because it is only called on the remaining bytes from a 32-byte vector.

Applied to files:

  • arch/riscv/crc32_zbc.c
  • arch/x86/chorba_sse41.c
📚 Learning: 2024-10-29T02:22:52.846Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • arch/x86/chorba_sse41.c
  • functable.c
📚 Learning: 2025-02-23T16:51:54.545Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • test/test_crc32.cc
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2024-12-25T19:45:06.009Z
Learnt from: samrussell
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_braid_c.c:596-597
Timestamp: 2024-12-25T19:45:06.009Z
Learning: We should verify out-of-bounds pointer arithmetic in chorba_118960_nondestructive() and related loops where "input + i" is used, especially when len < expected block sizes.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2025-02-21T01:44:03.996Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • test/test_crc32.cc
  • arch/x86/chorba_sse41.c
  • functable.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2024-10-08T21:51:45.330Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2024-10-07T21:23:13.401Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
📚 Learning: 2024-10-29T02:18:25.966Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1805
File: arch/x86/chunkset_avx512.c:28-30
Timestamp: 2024-10-29T02:18:25.966Z
Learning: In `chunkset_avx512.c`, the `gen_half_mask` function does not require validation for `len` since it will never exceed 16 due to computing the remainder for a 16-byte load.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
📚 Learning: 2024-12-20T23:35:59.830Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1830
File: arch/generic/generic_functions.h:39-53
Timestamp: 2024-12-20T23:35:59.830Z
Learning: The `longest_match_unaligned_*` functions are templated using the LONGEST_MATCH macro in match_tpl.h, so their implementations are generated rather than explicitly defined.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2024-10-08T19:37:14.998Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1797
File: inflate.c:84-86
Timestamp: 2024-10-08T19:37:14.998Z
Learning: When `INFLATE_ALLOW_INVALID_DISTANCE_TOOFAR_ARRR` is not defined, `state->sane` is completely removed from the codebase and does not need to be initialized.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2024-09-25T16:25:56.686Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1797
File: inflate.c:81-83
Timestamp: 2024-09-25T16:25:56.686Z
Learning: When `INFLATE_STRICT` is not defined, the variable `state->dmax` is completely removed from the code.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2024-12-22T20:40:03.280Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1833
File: test/test_crc32.cc:0-0
Timestamp: 2024-12-22T20:40:03.280Z
Learning: In the crc32_align test class (test/test_crc32.cc), the alignment offset parameter is always guaranteed to be non-negative because the test parameters are controlled and it never makes sense to have a negative offset.

Applied to files:

  • test/test_crc32.cc
  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2024-10-07T21:18:37.806Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: arch/x86/chunkset_avx2.c:82-85
Timestamp: 2024-10-07T21:18:37.806Z
Learning: In `arch/x86/chunkset_avx2.c`, when working with AVX2-capable x86 CPUs, unaligned memory access using `_mm_loadu_si128` is acceptable since there is no performance penalty on architectures after Nehalem. Ensuring alignment may introduce unnecessary overhead due to arbitrary offsets into the window.

Applied to files:

  • arch/x86/chorba_sse41.c
📚 Learning: 2025-02-21T01:37:54.508Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:830-831
Timestamp: 2025-02-21T01:37:54.508Z
Learning: The alignment calculation `((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15` is safe from overflow as it's mathematically bounded between 1 and 16, making the use of uintptr_t appropriate for this pointer arithmetic.

Applied to files:

  • arch/x86/chorba_sse41.c
  • arch/x86/chorba_sse2.c
📚 Learning: 2024-10-07T22:00:02.180Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: chunkset_tpl.h:92-108
Timestamp: 2024-10-07T22:00:02.180Z
Learning: In the `HALFCHUNKCOPY` function in `chunkset_tpl.h`, the `len` parameter is always bound to be small by the callers and will not exceed the limits of an `int32_t`.

Applied to files:

  • arch/x86/chorba_sse41.c
🧬 Code graph analysis (1)
arch/x86/crc32_fold_pclmulqdq_tpl.h (1)
arch/x86/crc32_pclmulqdq_tpl.h (2)
  • fold_12 (168-208)
  • fold_4 (125-166)
🔇 Additional comments (12)
arch/riscv/crc32_zbc.c (3)

9-9: LGTM: Include added for crc32_braid declaration.

The addition of arch_functions.h correctly provides the declaration for crc32_braid, which replaces crc32_c in this file.


85-87: LGTM: Short buffer fallback correctly uses crc32_braid.

The replacement of crc32_c with crc32_braid is appropriate for short buffers (< 16 bytes). The conditioning logic is correct since crc32_braid handles XOR pre/post-conditioning internally.


89-94: LGTM: Unaligned portion handling correctly uses crc32_braid.

The replacement is correct. The crc32_braid call returns a properly conditioned CRC value, which is then pre-conditioned at line 95 (crc ^= 0xFFFFFFFF) before being passed to the clmul implementation. The conditioning flow is consistent and correct.

arch/x86/x86_functions.h (2)

27-30: LGTM! Guard and declaration are correct.

The WITHOUT_CHORBA_SSE guard properly disables Chorba SSE2 variants when needed, and exposing chorba_small_nondestructive_sse2 is necessary since the SSE4.1 implementation needs to call it as a fallback.


107-110: native_crc32 mapping is consistent and correct.

The conditional definition at lines 107-110 properly overrides the generic base (crc32_chorba or crc32_braid) with the SSE2-optimized variant (crc32_chorba_sse2) when appropriate. The nested conditions—DISABLE_RUNTIME_CPU_DETECTION, HAVE_BUILTIN_CTZ, X86_SSE2, and !WITHOUT_CHORBA_SSE—follow the same pattern used consistently across all x86 architecture variants (SSE4.1, PCLMUL, VPCLMULQDQ). The #undef before #define correctly prevents conflicts with prior definitions. No functable or runtime dispatch routing is used for CRC32 in this codebase; routing is entirely compile-time macro-based. No issues found.

test/benchmarks/benchmark_crc32.cc (1)

65-73: LGTM! Benchmark registrations updated consistently.

The benchmarks correctly reflect the reorganization:

  • Generic Chorba (chorba_c) is now registered unconditionally when WITHOUT_CHORBA is not defined
  • SSE2 and SSE4.1 variants are properly gated with WITHOUT_CHORBA_SSE

The guards align with the new macro scheme introduced in this PR.

arch/x86/chorba_sse2.c (3)

1-1: LGTM! File-level guard correctly gates SSE2 Chorba variants.

The guard #if defined(X86_SSE2) && !defined(WITHOUT_CHORBA_SSE) ensures the entire SSE2 Chorba implementation is disabled when the macro is set (e.g., on MSVC 2019 non-AMD64 builds with the code-generation bug).


9-9: LGTM! Header reorganization aligns with PR objectives.

Switching from arch/generic/generic_functions.h to arch_functions.h is consistent with the removal of crc32_c and the new functable routing approach.


862-871: LGTM! Conditional compilation logic is correct.

The nested conditional structure properly implements the PR's objectives:

  • When WITHOUT_CHORBA is defined: skips large Chorba path, keeps SSE2-optimized small path
  • When WITHOUT_CHORBA is not defined: uses large path for very large buffers, small SSE2 path for medium buffers, braid fallback for small buffers

This maintains SSE2 Chorba algorithm implementations while allowing the generic C fallback to be disabled independently.

arch/x86/chorba_sse41.c (3)

1-1: LGTM! File-level guard correctly gates SSE4.1 Chorba variants.

Consistent with the SSE2 changes, this guard ensures the SSE4.1 Chorba implementation is disabled when WITHOUT_CHORBA_SSE is set.


10-10: LGTM! Header reorganization is consistent.

Matching the SSE2 file, this switches to arch_functions.h for the new functable routing.


320-329: LGTM! Multi-tier fallback logic is correct.

The SSE4.1 implementation properly handles multiple buffer size thresholds:

  1. Very large (> CHORBA_LARGE_THRESHOLD): uses generic 118960 variant (when available)
  2. Medium (CHORBA_MEDIUM range): uses SSE4.1-optimized 32768 variant
  3. Small-to-medium (> CHORBA_SMALL_THRESHOLD_64BIT): falls back to SSE2 small variant
  4. Small: uses braid internal

The cross-call to chorba_small_nondestructive_sse2 is valid since SSE4.1 is a superset of SSE2, and the declaration is now properly exposed in x86_functions.h (line 29).

@codecov
Copy link
Copy Markdown

codecov bot commented Nov 12, 2025

Codecov Report

❌ Patch coverage is 98.06763% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.26%. Comparing base (780838b) to head (f2d7337).
⚠️ Report is 2 commits behind head on develop.

Files with missing lines Patch % Lines
arch/riscv/crc32_zbc.c 0.00% 2 Missing ⚠️
test/benchmarks/benchmark_crc32.cc 0.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2004      +/-   ##
===========================================
- Coverage    83.31%   83.26%   -0.06%     
===========================================
  Files          161      160       -1     
  Lines        12960    12963       +3     
  Branches      3149     3149              
===========================================
- Hits         10798    10794       -4     
- Misses        1134     1142       +8     
+ Partials      1028     1027       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@Dead2
Copy link
Copy Markdown
Member Author

Dead2 commented Nov 13, 2025

Develop x86-64:
text data bss dec hex filename
172705 1336 8 174049 2a7e1 libz-ng.so.2

PR x86-64:
text data bss dec hex filename
171593 1336 8 172937 2a389 libz-ng.so.2

Develop rpi5:
text data bss dec hex filename
119167 1560 8 120735 1d79f libz-ng.so.2

PR rpi5:
text data bss dec hex filename
118303 1560 8 119871 1d43f libz-ng.so.2

Default compile (with chorba enabled) results in library size reductions:
x86_64: 1112 bytes smaller
aarch64: 864 bytes smaller

Performance for default builds is unchanged, as expected since this does not change anything for those builds.
Performance for Chorba-disabled builds are faster on x86(-64) systems that don't have pclmul instructions but have SSE2 or SSE4.1, since those are no longer disabled along with chorba_c. (See benchmarks for the "no-chorba first commit" here: https://github.com/zlib-ng/zlib-ng/pull/2003)
Also [v]plcmul fold copy functions are improved, not the plain crc32 ones, used during inflate.

Copy link
Copy Markdown
Collaborator

@mtl1979 mtl1979 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just some nitpick comments about logically backwards preprocessor tests.

@Dead2 Dead2 force-pushed the chorba-reorganization branch from 71f36da to 34355e3 Compare November 13, 2025 15:50
Now WITHOUT_CHORBA will only disable the crc32_chorba C fallback.

SSE2, SSE41 and pclmul variants will still be able to use their Chorba-algorithm based code,
but their fallback to the generic crc32_chorba C code in SSE2 and SSE41 will be disabled,
reducing their performance on really big input buffers (not used during deflate/inflate,
only when calling crc32 directly).

Remove the crc32_c function (and its file crc32_c.c), instead use the normal functable
routing to select between crc32_braid and crc32_chorba.

Disable sse2 and sse4.1 variants of Chorba-crc32 on MSVC older than 2022 due to code
generation bug in 2019 causing segfaults.

Compile either crc32_chorba_small_nondestructive or crc32_chorba_small_nondestructive_32bit,
not both. Don't compile crc32_chorba_32768_nondestructive on 32bit arch.
@Dead2 Dead2 force-pushed the chorba-reorganization branch from 34355e3 to f2d7337 Compare November 13, 2025 17:29
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
arch/x86/x86_functions.h (1)

13-30: Fix MSVC Chorba-SSE guard mismatch

We still #define NO_CHORBA_SSE here, but every new guard now keys off WITHOUT_CHORBA_SSE. On MSVC 2019 the safeguard never engages, so the Chorba SSE2/SSE4.1 code stays enabled and the known code-gen segfault this PR wants to avoid will still fire. Please emit the same symbol the rest of the file (and functable.c) is checking.

-#define NO_CHORBA_SSE
+#define WITHOUT_CHORBA_SSE
🧹 Nitpick comments (2)
arch/x86/crc32_fold_pclmulqdq_tpl.h (2)

114-114: Clarify loop entry condition for better readability.

The condition len >= 512 + 64 + 16*8 evaluates correctly to 704 bytes but obscures the structure. Consider rewriting as len >= 16*8 + 8*64 + 64 (preload + 8 processing groups + margin) or simply len >= 704 with a comment explaining the breakdown.

Apply this diff:

-    while (len >= 512 + 64 + 16*8) {
+    // Require 128B preload + 8×64B processing + 64B margin = 704B
+    while (len >= 16*8 + 8*64 + 64) {

115-323: Consider using aligned loads for potential performance gain.

Throughout the Chorba loop, _mm_loadu_si128 (unaligned) is used, but src is guaranteed to be 16-byte aligned after the alignment correction (lines 54–91) and remains aligned through subsequent advances (128B and 512B are multiples of 16). The standard loop at line 328 uses _mm_load_si128 (aligned), suggesting this is safe and intended.

While modern CPUs have minimal penalty for unaligned loads on aligned data, using aligned loads would be more consistent and could improve performance on older architectures.

Example change for the preload section:

-        __m128i chorba8 = _mm_loadu_si128((__m128i *)src);
-        __m128i chorba7 = _mm_loadu_si128((__m128i *)src + 1);
+        __m128i chorba8 = _mm_load_si128((__m128i *)src);
+        __m128i chorba7 = _mm_load_si128((__m128i *)src + 1);

Apply similar changes to all other _mm_loadu_si128 calls within the Chorba loop (lines 141–144, 163–166, 186–189, 209–212, 232–235, 255–258, 278–281, 300–303).

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 34355e3 and f2d7337.

📒 Files selected for processing (17)
  • CMakeLists.txt (1 hunks)
  • Makefile.in (0 hunks)
  • arch/generic/Makefile.in (0 hunks)
  • arch/generic/crc32_c.c (0 hunks)
  • arch/generic/crc32_chorba_c.c (3 hunks)
  • arch/generic/generic_functions.h (2 hunks)
  • arch/riscv/crc32_zbc.c (2 hunks)
  • arch/s390/crc32-vx.c (2 hunks)
  • arch/x86/chorba_sse2.c (2 hunks)
  • arch/x86/chorba_sse41.c (3 hunks)
  • arch/x86/crc32_fold_pclmulqdq_tpl.h (1 hunks)
  • arch/x86/crc32_pclmulqdq_tpl.h (0 hunks)
  • arch/x86/x86_functions.h (4 hunks)
  • crc32.h (0 hunks)
  • functable.c (4 hunks)
  • test/benchmarks/benchmark_crc32.cc (1 hunks)
  • test/test_crc32.cc (2 hunks)
💤 Files with no reviewable changes (5)
  • arch/generic/Makefile.in
  • Makefile.in
  • arch/generic/crc32_c.c
  • arch/x86/crc32_pclmulqdq_tpl.h
  • crc32.h
🚧 Files skipped from review as they are similar to previous changes (6)
  • test/test_crc32.cc
  • CMakeLists.txt
  • arch/s390/crc32-vx.c
  • arch/x86/chorba_sse2.c
  • arch/x86/chorba_sse41.c
  • arch/riscv/crc32_zbc.c
🧰 Additional context used
🧠 Learnings (19)
📓 Common learnings
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.

Applied to files:

  • functable.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/generic/generic_functions.h
  • arch/generic/crc32_chorba_c.c
  • arch/x86/x86_functions.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.

Applied to files:

  • functable.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/generic/generic_functions.h
  • arch/generic/crc32_chorba_c.c
  • arch/x86/x86_functions.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.

Applied to files:

  • functable.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/generic/generic_functions.h
  • arch/generic/crc32_chorba_c.c
  • arch/x86/x86_functions.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

Applied to files:

  • functable.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/generic/generic_functions.h
  • arch/generic/crc32_chorba_c.c
  • arch/x86/x86_functions.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2024-10-29T02:22:52.846Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1805
File: inffast_tpl.h:257-262
Timestamp: 2024-10-29T02:22:52.846Z
Learning: In `inffast_tpl.h`, when AVX512 is enabled, the branch involving `chunkcopy_safe` is intentionally eliminated to optimize performance.

Applied to files:

  • functable.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2025-02-21T01:44:03.996Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:26-28
Timestamp: 2025-02-21T01:44:03.996Z
Learning: The alignment requirements for chorba_small_nondestructive_sse2 (16-byte alignment and multiple of 8 length) are enforced by its calling function, making additional checks redundant.

Applied to files:

  • functable.c
  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/generic/crc32_chorba_c.c
  • arch/x86/x86_functions.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2024-12-25T19:45:06.009Z
Learnt from: samrussell
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_braid_c.c:596-597
Timestamp: 2024-12-25T19:45:06.009Z
Learning: We should verify out-of-bounds pointer arithmetic in chorba_118960_nondestructive() and related loops where "input + i" is used, especially when len < expected block sizes.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2025-06-09T16:46:28.468Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1921
File: arch/riscv/chunkset_rvv.c:97-102
Timestamp: 2025-06-09T16:46:28.468Z
Learning: In RISC-V chunkset_rvv.c CHUNKCOPY function, the alignment logic intentionally copies sizeof(chunk_t) bytes but advances pointers by align bytes to avoid unaligned access. This "re-reading" technique is a deliberate optimization to maintain proper alignment for subsequent operations, not a bug.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
📚 Learning: 2024-10-08T21:51:45.330Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1778
File: arch/x86/chunkset_avx2.c:160-171
Timestamp: 2024-10-08T21:51:45.330Z
Learning: In `arch/x86/chunkset_avx2.c`, within the `GET_HALFCHUNK_MAG` function, using a conditional branch to select between `_mm_loadl_epi64` and `_mm_loadu_si128` is not recommended because the branching cost outweighs the savings from the load.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
  • arch/x86/x86_functions.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2024-10-07T21:23:13.401Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1802
File: chunkset_tpl.h:135-135
Timestamp: 2024-10-07T21:23:13.401Z
Learning: In the `CHUNKMEMSET` function within `chunkset_tpl.h`, extra bounds checks are avoided to maintain performance in critical code sections. Branching is minimized to prevent negative impacts on speculative execution. The variable `len` is enforced with `safelen` early on.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
📚 Learning: 2025-06-10T07:38:03.297Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 1921
File: arch/riscv/chunkset_rvv.c:103-104
Timestamp: 2025-06-10T07:38:03.297Z
Learning: In RISC-V chunkset_rvv.c CHUNKCOPY function, when dist < sizeof(chunk_t), the vl variable intentionally becomes 0, causing the while loop to not execute. This is correct behavior because copying full chunks is not safe when the distance is smaller than chunk size, and the function appropriately falls back to memcpy for handling remaining bytes.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
📚 Learning: 2024-10-29T02:18:25.966Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1805
File: arch/x86/chunkset_avx512.c:28-30
Timestamp: 2024-10-29T02:18:25.966Z
Learning: In `chunkset_avx512.c`, the `gen_half_mask` function does not require validation for `len` since it will never exceed 16 due to computing the remainder for a 16-byte load.

Applied to files:

  • arch/x86/crc32_fold_pclmulqdq_tpl.h
📚 Learning: 2025-02-23T16:51:54.545Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/x86_intrins.h:114-117
Timestamp: 2025-02-23T16:51:54.545Z
Learning: In x86/x86_intrins.h, the Clang macros for _mm_cvtsi64x_si128 and _mm_cvtsi128_si64x don't need additional MSVC guards since MSVC's implementation is already protected by `defined(_MSC_VER) && !defined(__clang__)`, making them mutually exclusive.

Applied to files:

  • arch/x86/x86_functions.h
  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2024-12-20T23:35:59.830Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1830
File: arch/generic/generic_functions.h:39-53
Timestamp: 2024-12-20T23:35:59.830Z
Learning: The `longest_match_unaligned_*` functions are templated using the LONGEST_MATCH macro in match_tpl.h, so their implementations are generated rather than explicitly defined.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2024-10-08T19:37:14.998Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1797
File: inflate.c:84-86
Timestamp: 2024-10-08T19:37:14.998Z
Learning: When `INFLATE_ALLOW_INVALID_DISTANCE_TOOFAR_ARRR` is not defined, `state->sane` is completely removed from the codebase and does not need to be initialized.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2024-09-25T16:25:56.686Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1797
File: inflate.c:81-83
Timestamp: 2024-09-25T16:25:56.686Z
Learning: When `INFLATE_STRICT` is not defined, the variable `state->dmax` is completely removed from the code.

Applied to files:

  • arch/x86/x86_functions.h
📚 Learning: 2025-02-23T16:50:50.925Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/x86_intrins.h:0-0
Timestamp: 2025-02-23T16:50:50.925Z
Learning: MSVC does not define `__GNUC__`, so adding `!defined(_MSC_VER)` to GCC detection macros is redundant when `__GNUC__` is already being checked.

Applied to files:

  • test/benchmarks/benchmark_crc32.cc
📚 Learning: 2024-10-04T03:17:24.773Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1778
File: chunkset_tpl.h:164-165
Timestamp: 2024-10-04T03:17:24.773Z
Learning: In `chunkset_tpl.h`, using `goto` in the `CHUNKMEMSET` function aids the compiler in inlining the function, so it should be retained.

Applied to files:

  • test/benchmarks/benchmark_crc32.cc
🧬 Code graph analysis (2)
arch/x86/crc32_fold_pclmulqdq_tpl.h (1)
arch/x86/crc32_pclmulqdq_tpl.h (2)
  • fold_12 (168-208)
  • fold_4 (125-166)
arch/generic/crc32_chorba_c.c (4)
arch/riscv/crc32_zbc.c (1)
  • uint32_t (83-98)
arch/x86/chorba_sse2.c (2)
  • uint32_t (23-847)
  • uint32_t (849-879)
arch/x86/chorba_sse41.c (2)
  • uint32_t (55-305)
  • uint32_t (307-340)
arch/generic/crc32_braid_c.c (2)
  • uint32_t (63-213)
  • uint32_t (215-222)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup Improving maintainability or removing code.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants