Skip to content

Synchronise ARMv8 and Loongarch CRC32 implementations#1969

Merged
Dead2 merged 1 commit intozlib-ng:developfrom
ccawley2011:armv8-loongarch-crc32
Sep 27, 2025
Merged

Synchronise ARMv8 and Loongarch CRC32 implementations#1969
Dead2 merged 1 commit intozlib-ng:developfrom
ccawley2011:armv8-loongarch-crc32

Conversation

@ccawley2011
Copy link
Copy Markdown
Contributor

@ccawley2011 ccawley2011 commented Sep 25, 2025

Summary by CodeRabbit

  • Refactor

    • Improved CRC32 performance on ARMv8 devices by enabling optimized folding and copy routines when supported.
    • Enhanced runtime selection to use faster CRC32 paths on compatible ARM hardware.
  • Style

    • Minor formatting and include cleanup in LoongArch CRC32 module; no functional changes.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Sep 25, 2025

Walkthrough

Introduces ARMv8 CRC32 folding APIs, wires them into runtime function-table selection, and adjusts ARM tail-processing logic. Adds crc32_fold and crc32_fold_copy helpers for ARMv8. Minor include and signature formatting changes in LoongArch CRC32 file. No broader control flow changes beyond functable assignments.

Changes

Cohort / File(s) Summary of changes
ARMv8 CRC32 folding API and wiring
arch/arm/arm_functions.h, arch/arm/crc32_armv8.c, functable.c
Declared crc32_fold_armv8 and crc32_fold_copy_armv8; mapped native_crc32_fold and native_crc32_fold_copy to ARMv8 variants; implemented helpers using crc32_armv8; updated tail handling to bitmask checks; assigned ft.crc32_fold and ft.crc32_fold_copy when ARM CRC32 is available.
LoongArch CRC32 formatting cleanup
arch/loongarch/crc32_la.c
Removed unused includes; compacted function signature formatting; no logic changes.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant App
  participant CPUDetect as CPU Detect
  participant FuncTable as Function Table
  participant ARMv8 as ARMv8 CRC32

  App->>CPUDetect: Initialize at startup
  CPUDetect-->>FuncTable: cf.arm.has_crc32 = true/false

  alt ARMv8 CRC32 available
    FuncTable->>FuncTable: set ft.crc32 = crc32_armv8
    FuncTable->>FuncTable: set ft.crc32_fold = crc32_fold_armv8
    FuncTable->>FuncTable: set ft.crc32_fold_copy = crc32_fold_copy_armv8
  else Fallback
    FuncTable->>FuncTable: leave fold/fold_copy as non-ARM implementations
  end

  App->>FuncTable: call ft.crc32_fold(src,len,init)
  FuncTable->>ARMv8: crc32_fold_armv8(crc, src, len, init)
  ARMv8-->>FuncTable: updated crc->value
  FuncTable-->>App: return
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

Architecture

Suggested reviewers

  • Dead2
  • KungFuJesus
  • nmoinvaz

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 16.67% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title accurately captures the primary objective of the pull request, which is to bring ARMv8 and Loongarch CRC32 code paths into alignment by adding new ARMv8 CRC32 folding functions and harmonizing the Loongarch implementation, and it does so in a concise, clear manner.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
arch/arm/crc32_armv8.c (1)

10-83: Add the missing <string.h> include for memcpy.

crc32_fold_copy_armv8() and crc32_fold_armv8() call memcpy, but this TU no longer pulls in a header that declares it. Please add #include <string.h> (mirroring the LoongArch CRC file) so we keep the build warning‑free under -Werror/C99.

 #include "zbuild.h"
 #include "crc32.h"
+#include <string.h>
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7800569 and 5c4e412.

📒 Files selected for processing (4)
  • arch/arm/arm_functions.h (2 hunks)
  • arch/arm/crc32_armv8.c (2 hunks)
  • arch/loongarch/crc32_la.c (1 hunks)
  • functable.c (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📚 Learning: 2025-02-23T16:49:52.043Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-23T16:49:52.043Z
Learning: In zlib-ng, bounds checking for CRC32 computation is handled by the caller, not within the individual CRC32 implementation functions like `crc32_chorba_sse2`.

Applied to files:

  • arch/arm/crc32_armv8.c
  • arch/loongarch/crc32_la.c
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

Applied to files:

  • arch/arm/crc32_armv8.c
  • arch/loongarch/crc32_la.c
📚 Learning: 2025-01-23T22:01:53.422Z
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-01-23T22:01:53.422Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.

Applied to files:

  • arch/arm/crc32_armv8.c
  • arch/loongarch/crc32_la.c
  • arch/arm/arm_functions.h
📚 Learning: 2025-02-21T01:42:40.488Z
Learnt from: KungFuJesus
PR: zlib-ng/zlib-ng#1872
File: arch/x86/chorba_sse2.c:0-0
Timestamp: 2025-02-21T01:42:40.488Z
Learning: In the SSE2-optimized Chorba CRC implementation (chorba_small_nondestructive_sse), the input buffer length is enforced to be a multiple of 16 bytes due to SSE2 operations, making additional checks for smaller alignments (like 8 bytes) redundant.

Applied to files:

  • arch/arm/crc32_armv8.c
  • arch/loongarch/crc32_la.c
  • arch/arm/arm_functions.h
📚 Learning: 2025-07-04T16:59:44.725Z
Learnt from: mtl1979
PR: zlib-ng/zlib-ng#1925
File: arch/loongarch/lasxintrin_ext.h:38-65
Timestamp: 2025-07-04T16:59:44.725Z
Learning: LoongArch64 is still a quite new architecture where compilers don't yet have intrinsic functions for all the common operations, making inline assembly implementations necessary as fallbacks.

Applied to files:

  • arch/loongarch/crc32_la.c
🧬 Code graph analysis (3)
functable.c (1)
arch/arm/crc32_armv8.c (2)
  • crc32_fold_armv8 (81-84)
  • crc32_fold_copy_armv8 (76-79)
arch/arm/crc32_armv8.c (1)
arch/arm/acle_intrins.h (6)
  • __crc32w (32-36)
  • __crc32w (56-60)
  • __crc32h (26-30)
  • __crc32h (50-54)
  • __crc32b (20-24)
  • __crc32b (44-48)
arch/arm/arm_functions.h (1)
arch/arm/crc32_armv8.c (2)
  • crc32_fold_copy_armv8 (76-79)
  • crc32_fold_armv8 (81-84)

Comment on lines 64 to 71
# if (defined(ARM_CRC32) && defined(__ARM_FEATURE_CRC32))
# undef native_crc32
# define native_crc32 crc32_armv8
# undef native_crc32_fold
# define native_crc32_fold crc32_fold_armv8
# undef native_crc32_fold_copy
# define native_crc32_fold_copy crc32_fold_copy_armv8
# endif
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the entire fold family together.

Defining native_crc32_fold/native_crc32_fold_copy without the complementary native_crc32_fold_reset/native_crc32_fold_final leaves the fold API split between ARM and generic implementations. Please add the matching definitions (or explain why they must stay generic) so the functable can stay consistent when DISABLE_RUNTIME_CPU_DETECTION is enabled.

🤖 Prompt for AI Agents
In arch/arm/arm_functions.h around lines 64-71, the ARM-specific block defines
native_crc32_fold and native_crc32_fold_copy but omits native_crc32_fold_reset
and native_crc32_fold_final, splitting the fold API between ARM and generic
implementations; to fix, add matching undef/define lines in the same #if
defined(ARM_CRC32) && defined(__ARM_FEATURE_CRC32) block mapping
native_crc32_fold_reset to crc32_fold_reset_armv8 and native_crc32_fold_final to
crc32_fold_final_armv8 (same pattern as the other defines) so the entire fold
family is consistently overridden for ARM.

Comment on lines 8 to 9
#include "crc32.h"
#include <stdint.h>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Restore the memcpy prototype after dropping zutil.h.

Removing zutil.h (and zmemory.h) also removed the declaration of memcpy. We now use memcpy in this file without including <string.h>, which breaks strict builds. Add <string.h> to the include list.

 #include "crc32.h"
+#include <string.h>
 #include <stdint.h>
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
#include "crc32.h"
#include <stdint.h>
#include "crc32.h"
#include <string.h>
#include <stdint.h>
🤖 Prompt for AI Agents
In arch/loongarch/crc32_la.c around lines 8-9, the file uses memcpy but
zutil.h/zmemory.h were removed and no header now declares memcpy; add an include
for <string.h> to the include list (alongside crc32.h and stdint.h) so memcpy is
properly prototyped for strict builds.

Comment on lines 206 to 211
#ifdef ARM_CRC32
if (cf.arm.has_crc32) {
ft.crc32 = &crc32_armv8;
ft.crc32_fold = &crc32_fold_armv8;
ft.crc32_fold_copy = &crc32_fold_copy_armv8;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Reinitialize fold state when switching to ARM CRC.

Unlike the generic path, the ARM CRC32 allocation now overrides only ft.crc32, ft.crc32_fold, and ft.crc32_fold_copy. Without also keeping the matching crc32_fold_* reset/final pointers coherent (as the LoongArch block does), downstream code that relies on the fold API can hit mixed implementations. Please wire the ARM CRC32 block the same way the LoongArch block does to avoid inconsistencies.

🤖 Prompt for AI Agents
In functable.c around lines 206-211, the ARM CRC32 branch currently only assigns
ft.crc32, ft.crc32_fold and ft.crc32_fold_copy, which can leave the fold
reset/final pointers pointing at generic implementations; update this block to
also reinitialize the fold state by assigning the matching fold reset and fold
final pointers (e.g., ft.crc32_fold_reset and ft.crc32_fold_final) to the
ARM-specific functions (crc32_fold_reset_armv8, crc32_fold_final_armv8) so the
ARM path is coherent with the LoongArch block.

@codecov
Copy link
Copy Markdown

codecov bot commented Sep 25, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.76%. Comparing base (7800569) to head (5c4e412).
⚠️ Report is 3 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1969      +/-   ##
===========================================
+ Coverage    81.60%   82.76%   +1.16%     
===========================================
  Files          163      163              
  Lines        14073    14081       +8     
  Branches      3165     3166       +1     
===========================================
+ Hits         11484    11654     +170     
+ Misses        1597     1425     -172     
- Partials       992     1002      +10     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Member

@nmoinvaz nmoinvaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Surprised we were missing some of these but it LGTM.

@nmoinvaz nmoinvaz added cleanup Improving maintainability or removing code. Architecture Architecture specific labels Sep 26, 2025
Copy link
Copy Markdown
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dead2 Dead2 merged commit bdbcee5 into zlib-ng:develop Sep 27, 2025
155 checks passed
@ccawley2011 ccawley2011 deleted the armv8-loongarch-crc32 branch September 28, 2025 10:35
@Dead2 Dead2 mentioned this pull request Nov 5, 2025
#ifdef ARM_CRC32
if (cf.arm.has_crc32) {
ft.crc32 = &crc32_armv8;
ft.crc32_fold = &crc32_fold_armv8;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are these functions needed? Usually if crc32_fold does not exist it will automatically do memcpy.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generic crc32_fold_copy_c function uses an indirect call via FUNCTABLE_CALL.

crc->value = FUNCTABLE_CALL(crc32)(crc->value, src, len);
memcpy(dst, src, len);

Creating separate functions crc32_fold_armv8 and crc32_fold_loongarch64, replaces two indirect calls via FUNCTABLE_CALL with one.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Direct calls usually require one less register and in code where a lot of registers are used, it means one less chance of spilling unnecessarily to stack. Direct calls can also be optimized better as compiler knows if the target function is small enough to be inlined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Architecture Architecture specific cleanup Improving maintainability or removing code.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants