Skip to content

Fix a big endian bug on the 32k and larger specializations of chorba#1891

Merged
Dead2 merged 1 commit intozlib-ng:developfrom
KungFuJesus:fix_generic_chorba_endians
Mar 28, 2025
Merged

Fix a big endian bug on the 32k and larger specializations of chorba#1891
Dead2 merged 1 commit intozlib-ng:developfrom
KungFuJesus:fix_generic_chorba_endians

Conversation

@KungFuJesus
Copy link
Copy Markdown
Collaborator

@KungFuJesus KungFuJesus commented Mar 25, 2025

It turns out there's something in there preventing them from being endian independent. @samrussell found the issue in this PR, it's simply the fact that we weren't byteswapping the crc before folding it into the multiplications. While the value was 0xFFFFFFFF - it still gets widened to a 64 bit variable, which means the 4 bytes ended up falling on the wrong half of the word.

I've also added a test case which enabled me to spot this issue while testing another SIMD acceleration of chorba.

Summary by CodeRabbit

  • Refactor

    • Improved internal data handling to ensure consistent performance across systems with varied architecture settings.
  • Tests

    • Added a new test case featuring an aligned 32768-byte buffer with cyclic value initialization to validate data integrity during processing.
    • Introduced a setup function to initialize the test buffer before execution.
    • Updated CRC32 test case with an expected value of 0x217726B2.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Mar 25, 2025

Walkthrough

This pull request modifies CRC32 functions in arch/generic/crc32_chorba_c.c by conditionally assigning values based on the system's byte order. It also introduces a new test setup in test/test_crc32.cc by adding an aligned buffer, a helper function to initialize it, and a new test case for verifying a 32768-byte CRC32 computation. No changes were made to the public API.

Changes

File Path Change Summary
arch/generic/crc32_chorba_c.c Updated crc32_chorba_118960_nondestructive and crc32_chorba_32768_nondestructive to conditionally assign values using crc directly on little-endian systems and ZSWAP64(crc) otherwise.
test/test_crc32.cc Added an aligned buffer fullwin_buf[32768], a new function setup_buf() to initialize the buffer, and modified the tests array to include a new 32768-byte CRC32 test case with proper static initialization.

Sequence Diagram(s)

sequenceDiagram
    participant CRC_Function as crc32_chorba Function
    participant System as CPU/Environment
    CRC_Function->>System: Check BYTE_ORDER
    alt BYTE_ORDER == LITTLE_ENDIAN
        System-->>CRC_Function: Return crc directly
    else
        System-->>CRC_Function: Return ZSWAP64(crc)
    end
Loading
sequenceDiagram
    participant TestRunner as Test Framework
    participant StaticInit as Static Initializer
    participant SetupBuffer as setup_buf Function
    TestRunner->>StaticInit: Initialize static variable 'setup'
    StaticInit->>SetupBuffer: Call setup_buf() to initialize fullwin_buf
    SetupBuffer-->>StaticInit: Return initialization success
Loading

Possibly related PRs

  • Chorba #1837: The changes in the main PR are related to the modifications in the retrieved PR as both involve updates to CRC32 computation functions, specifically focusing on the crc32_chorba functions in the crc32_chorba_c.c file.
  • Clean up crc32_braid. #1873: The changes in the main PR, which modify the crc32_chorba_118960_nondestructive and crc32_chorba_32768_nondestructive functions, are related to the changes in the retrieved PR that involve the renaming of constants and the internal handling of CRC32 functions, specifically as they both pertain to the CRC32 Chorba algorithm.

Suggested labels

bug, Architecture

Suggested reviewers

  • Dead2
✨ Finishing Touches
  • 📝 Generate Docstrings

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai plan to trigger planning for file edits and PR creation.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@KungFuJesus KungFuJesus force-pushed the fix_generic_chorba_endians branch from be11e0e to d0fda47 Compare March 25, 2025 22:03
@samrussell
Copy link
Copy Markdown
Contributor

samrussell commented Mar 25, 2025 via email

@codecov
Copy link
Copy Markdown

codecov bot commented Mar 25, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 82.99%. Comparing base (fd0d263) to head (32812e6).
Report is 5 commits behind head on develop.

Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #1891      +/-   ##
===========================================
+ Coverage    82.00%   82.99%   +0.99%     
===========================================
  Files          141      138       -3     
  Lines        12666    12528     -138     
  Branches      2906     2812      -94     
===========================================
+ Hits         10387    10398      +11     
+ Misses        1290     1199      -91     
+ Partials       989      931      -58     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@samrussell
Copy link
Copy Markdown
Contributor

samrussell commented Mar 25, 2025 via email

@KungFuJesus
Copy link
Copy Markdown
Collaborator Author

Hah, yeah that was it, at least for the 32k variant.

@KungFuJesus KungFuJesus force-pushed the fix_generic_chorba_endians branch from d0fda47 to 594a9a2 Compare March 25, 2025 22:35
@KungFuJesus KungFuJesus changed the title Disable the 32k and larger specializations of chorba on big endian Fix a big endian bug on the 32k and larger specializations of chorba Mar 25, 2025
@KungFuJesus KungFuJesus force-pushed the fix_generic_chorba_endians branch from 594a9a2 to e0549fd Compare March 25, 2025 22:42
@KungFuJesus KungFuJesus force-pushed the fix_generic_chorba_endians branch from e0549fd to 712e56d Compare March 25, 2025 23:37
In testing a SIMD vectorization for this, I wrote a gtest which stumbled
onto the fact that this had a bug on big endian. Before the initial CRC
had been mixed in it needed to be byte swapped.
@KungFuJesus KungFuJesus force-pushed the fix_generic_chorba_endians branch from 712e56d to 32812e6 Compare March 26, 2025 20:47
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 712e56d and 32812e6.

📒 Files selected for processing (2)
  • arch/generic/crc32_chorba_c.c (3 hunks)
  • test/test_crc32.cc (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
arch/generic/crc32_chorba_c.c (1)
Learnt from: Dead2
PR: zlib-ng/zlib-ng#1837
File: arch/generic/crc32_c.c:19-29
Timestamp: 2025-03-26T15:10:53.588Z
Learning: The Chorba CRC32 functions (crc32_chorba_118960_nondestructive, crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, crc32_chorba_small_nondestructive_32bit) are declared in crc32_c.h.
🪛 Cppcheck (2.10-2)
test/test_crc32.cc

[error] 29-29: There is an unknown macro here somewhere. Configuration is required. If ALIGNED_ is a macro then please configure it.

(unknownMacro)

arch/generic/crc32_chorba_c.c

[error] 648-648: Uninitialized variable

(uninitvar)

⏰ Context from checks skipped due to timeout of 90000ms (61)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
  • GitHub Check: macOS Clang ASAN (ARM64)
  • GitHub Check: Windows MSVC 2019 v141 Win64
  • GitHub Check: Ubuntu GCC Native Instructions (AVX)
  • GitHub Check: Ubuntu 20.04 GCC
  • GitHub Check: Ubuntu GCC Symbol Prefix
  • GitHub Check: macOS GCC Symbol Prefix & Compat (ARM64)
  • GitHub Check: macOS GCC Symbol Prefix (ARM64)
🔇 Additional comments (5)
test/test_crc32.cc (3)

29-38: Good addition of the test buffer for the 32K specialization.

The implementation of a properly aligned 32768-byte buffer with a helper function to initialize it systematically is a good approach for testing the CRC32 algorithm with large inputs.

🧰 Tools
🪛 Cppcheck (2.10-2)

[error] 29-29: There is an unknown macro here somewhere. Configuration is required. If ALIGNED_ is a macro then please configure it.

(unknownMacro)


195-196: Good test case addition for 32K buffer size.

This test case specifically targets the 32K specialization of the chorba algorithm that was having the endian issue. The expected CRC value 0x217726B2 should verify correctness on both little and big endian systems.


40-40:

Details

✅ Verification successful

Static initialization looks good.

Using the setup_buf() function to initialize the static buffer ensures the test data is properly prepared before any tests run.


🏁 Script executed:

#!/bin/bash
# Verify that the static buffer initialization approach is effective by checking if any test functions 
# directly reference the static variable before it's initialized

# Search for any usage of buf32k in test functions
rg -A 3 -B 3 "buf32k" test/test_crc32.cc

Length of output: 631


Static Buffer Initialization Confirmed

The static variable buf32k is correctly initialized via the setup_buf() function before being referenced in the tests array, and no test functions use it prior to initialization. The output from the verification script confirms that the buffer is set up in the proper order.

arch/generic/crc32_chorba_c.c (2)

647-647: Minor style fix with added space after 'for'.

Adding a space after the for keyword improves code consistency with the rest of the codebase.


647-649: Potential uninitialized variable issue in the loop.

According to static analysis, there might be an uninitialized variable in this loop. It's worth reviewing this loop to ensure all variables are properly initialized.

While the static analyzer flags a potential issue, nothing obvious stands out. The loop variables appear to be properly initialized. Could be a false positive, but worth double-checking the code path that leads to this loop to ensure all variables (crc, final_bytes, and bitbufferbytes) are properly initialized before use.

🧰 Tools
🪛 Cppcheck (2.10-2)

[error] 648-648: Uninitialized variable

(uninitvar)

Copy link
Copy Markdown
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dead2 Dead2 merged commit 18b933b into zlib-ng:develop Mar 28, 2025
146 of 152 checks passed
@coderabbitai coderabbitai bot mentioned this pull request May 27, 2025
@Dead2 Dead2 mentioned this pull request Nov 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants