Synchronise ARMv8 and Loongarch CRC32 implementations#1969
Synchronise ARMv8 and Loongarch CRC32 implementations#1969Dead2 merged 1 commit intozlib-ng:developfrom
Conversation
WalkthroughIntroduces ARMv8 CRC32 folding APIs, wires them into runtime function-table selection, and adjusts ARM tail-processing logic. Adds crc32_fold and crc32_fold_copy helpers for ARMv8. Minor include and signature formatting changes in LoongArch CRC32 file. No broader control flow changes beyond functable assignments. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant App
participant CPUDetect as CPU Detect
participant FuncTable as Function Table
participant ARMv8 as ARMv8 CRC32
App->>CPUDetect: Initialize at startup
CPUDetect-->>FuncTable: cf.arm.has_crc32 = true/false
alt ARMv8 CRC32 available
FuncTable->>FuncTable: set ft.crc32 = crc32_armv8
FuncTable->>FuncTable: set ft.crc32_fold = crc32_fold_armv8
FuncTable->>FuncTable: set ft.crc32_fold_copy = crc32_fold_copy_armv8
else Fallback
FuncTable->>FuncTable: leave fold/fold_copy as non-ARM implementations
end
App->>FuncTable: call ft.crc32_fold(src,len,init)
FuncTable->>ARMv8: crc32_fold_armv8(crc, src, len, init)
ARMv8-->>FuncTable: updated crc->value
FuncTable-->>App: return
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
arch/arm/crc32_armv8.c (1)
10-83: Add the missing<string.h>include formemcpy.
crc32_fold_copy_armv8()andcrc32_fold_armv8()callmemcpy, but this TU no longer pulls in a header that declares it. Please add#include <string.h>(mirroring the LoongArch CRC file) so we keep the build warning‑free under-Werror/C99.#include "zbuild.h" #include "crc32.h" +#include <string.h>
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
arch/arm/arm_functions.h(2 hunks)arch/arm/crc32_armv8.c(2 hunks)arch/loongarch/crc32_la.c(1 hunks)functable.c(1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.
Applied to files:
arch/arm/crc32_armv8.carch/loongarch/crc32_la.c
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Applied to files:
arch/arm/crc32_armv8.carch/loongarch/crc32_la.c
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
Applied to files:
arch/arm/crc32_armv8.carch/loongarch/crc32_la.carch/arm/arm_functions.h
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.
Applied to files:
arch/arm/crc32_armv8.carch/loongarch/crc32_la.carch/arm/arm_functions.h
📚 Learning: 2025-07-04T16:59:44.725Z
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: LoongArch64 is still a quite new architecture where compilers don't yet have intrinsic functions for all the common operations, making inline assembly implementations necessary as fallbacks.
Applied to files:
arch/loongarch/crc32_la.c
🧬 Code graph analysis (3)
functable.c (1)
arch/arm/crc32_armv8.c (2)
crc32_fold_armv8(81-84)crc32_fold_copy_armv8(76-79)
arch/arm/crc32_armv8.c (1)
arch/arm/acle_intrins.h (6)
__crc32w(32-36)__crc32w(56-60)__crc32h(26-30)__crc32h(50-54)__crc32b(20-24)__crc32b(44-48)
arch/arm/arm_functions.h (1)
arch/arm/crc32_armv8.c (2)
crc32_fold_copy_armv8(76-79)crc32_fold_armv8(81-84)
| # if (defined(ARM_CRC32) && defined(__ARM_FEATURE_CRC32)) | ||
| # undef native_crc32 | ||
| # define native_crc32 crc32_armv8 | ||
| # undef native_crc32_fold | ||
| # define native_crc32_fold crc32_fold_armv8 | ||
| # undef native_crc32_fold_copy | ||
| # define native_crc32_fold_copy crc32_fold_copy_armv8 | ||
| # endif |
There was a problem hiding this comment.
Keep the entire fold family together.
Defining native_crc32_fold/native_crc32_fold_copy without the complementary native_crc32_fold_reset/native_crc32_fold_final leaves the fold API split between ARM and generic implementations. Please add the matching definitions (or explain why they must stay generic) so the functable can stay consistent when DISABLE_RUNTIME_CPU_DETECTION is enabled.
🤖 Prompt for AI Agents
In arch/arm/arm_functions.h around lines 64-71, the ARM-specific block defines
native_crc32_fold and native_crc32_fold_copy but omits native_crc32_fold_reset
and native_crc32_fold_final, splitting the fold API between ARM and generic
implementations; to fix, add matching undef/define lines in the same #if
defined(ARM_CRC32) && defined(__ARM_FEATURE_CRC32) block mapping
native_crc32_fold_reset to crc32_fold_reset_armv8 and native_crc32_fold_final to
crc32_fold_final_armv8 (same pattern as the other defines) so the entire fold
family is consistently overridden for ARM.
| #include "crc32.h" | ||
| #include <stdint.h> |
There was a problem hiding this comment.
Restore the memcpy prototype after dropping zutil.h.
Removing zutil.h (and zmemory.h) also removed the declaration of memcpy. We now use memcpy in this file without including <string.h>, which breaks strict builds. Add <string.h> to the include list.
#include "crc32.h"
+#include <string.h>
#include <stdint.h>📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| #include "crc32.h" | |
| #include <stdint.h> | |
| #include "crc32.h" | |
| #include <string.h> | |
| #include <stdint.h> |
🤖 Prompt for AI Agents
In arch/loongarch/crc32_la.c around lines 8-9, the file uses memcpy but
zutil.h/zmemory.h were removed and no header now declares memcpy; add an include
for <string.h> to the include list (alongside crc32.h and stdint.h) so memcpy is
properly prototyped for strict builds.
| #ifdef ARM_CRC32 | ||
| if (cf.arm.has_crc32) { | ||
| ft.crc32 = &crc32_armv8; | ||
| ft.crc32_fold = &crc32_fold_armv8; | ||
| ft.crc32_fold_copy = &crc32_fold_copy_armv8; | ||
| } |
There was a problem hiding this comment.
Reinitialize fold state when switching to ARM CRC.
Unlike the generic path, the ARM CRC32 allocation now overrides only ft.crc32, ft.crc32_fold, and ft.crc32_fold_copy. Without also keeping the matching crc32_fold_* reset/final pointers coherent (as the LoongArch block does), downstream code that relies on the fold API can hit mixed implementations. Please wire the ARM CRC32 block the same way the LoongArch block does to avoid inconsistencies.
🤖 Prompt for AI Agents
In functable.c around lines 206-211, the ARM CRC32 branch currently only assigns
ft.crc32, ft.crc32_fold and ft.crc32_fold_copy, which can leave the fold
reset/final pointers pointing at generic implementations; update this block to
also reinitialize the fold state by assigning the matching fold reset and fold
final pointers (e.g., ft.crc32_fold_reset and ft.crc32_fold_final) to the
ARM-specific functions (crc32_fold_reset_armv8, crc32_fold_final_armv8) so the
ARM path is coherent with the LoongArch block.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #1969 +/- ##
===========================================
+ Coverage 81.60% 82.76% +1.16%
===========================================
Files 163 163
Lines 14073 14081 +8
Branches 3165 3166 +1
===========================================
+ Hits 11484 11654 +170
+ Misses 1597 1425 -172
- Partials 992 1002 +10 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
nmoinvaz
left a comment
There was a problem hiding this comment.
Surprised we were missing some of these but it LGTM.
| #ifdef ARM_CRC32 | ||
| if (cf.arm.has_crc32) { | ||
| ft.crc32 = &crc32_armv8; | ||
| ft.crc32_fold = &crc32_fold_armv8; |
There was a problem hiding this comment.
Why are these functions needed? Usually if crc32_fold does not exist it will automatically do memcpy.
There was a problem hiding this comment.
The generic crc32_fold_copy_c function uses an indirect call via FUNCTABLE_CALL.
zlib-ng/arch/generic/crc32_fold_c.c
Lines 16 to 17 in fe17958
Creating separate functions crc32_fold_armv8 and crc32_fold_loongarch64, replaces two indirect calls via FUNCTABLE_CALL with one.
There was a problem hiding this comment.
Direct calls usually require one less register and in code where a lot of registers are used, it means one less chance of spilling unnecessarily to stack. Direct calls can also be optimized better as compiler knows if the target function is small enough to be inlined.
Summary by CodeRabbit
Refactor
Style