Skip to content

Use GCC's may_alias attribute for access to buffers in crc32_chorba#2078

Merged
Dead2 merged 1 commit intozlib-ng:developfrom
phprus:crc32_chorba-alias-1
Jan 14, 2026
Merged

Use GCC's may_alias attribute for access to buffers in crc32_chorba#2078
Dead2 merged 1 commit intozlib-ng:developfrom
phprus:crc32_chorba-alias-1

Conversation

@phprus
Copy link
Copy Markdown
Contributor

@phprus phprus commented Jan 7, 2026

Converting a pointer from type z_word_t* (on 32-bit systems: uint32_t*) to uint64_t* breaks strict aliasing rules.

On GCC -fstrict-aliasing is on by default at -O2 or above ( https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Common-Type-Attributes.html ).

Fix #1926
The compiler warning is no longer present.

Develop: https://github.com/zlib-ng/zlib-ng/actions/runs/20773110891/job/59653058832#step:17:46
PR: https://github.com/phprus/zlib-ng/actions/runs/20795098755/job/59726505687#step:17:46

@mtl1979 @samrussell

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 7, 2026

Walkthrough

Adds an internal may_alias-capable 64-bit alias and adjusts internal 64-bit pointer usage in the Chorba CRC32 implementation; renames several nondestructive parameter identifiers from buf to input and updates corresponding header declarations without changing public APIs.

Changes

Cohort / File(s) Summary
Chorba CRC32 implementation
arch/generic/crc32_chorba_c.c
Adds #include <zmemory.h> and an internal uint64a_t typedef (uses may_alias when available/needed); switches internal qword pointers to uint64a_t*/const uint64a_t*; updates internal references to address uninitialized-warning issues.
Public function declarations
arch/generic/generic_functions.h
Renames parameter identifiers from buf to input in declarations for crc32_chorba_32768_nondestructive, crc32_chorba_small_nondestructive, and crc32_chorba_small_nondestructive_32bit (no type or API changes).

Possibly related PRs

  • Fix 32bit large chorba #1912: Introduces/uses internal 64-bit pointer aliases and updates internal pointer types in Chorba CRC32 code (closely related changes).
  • Clean up crc32_braid. #1873: Edits the Chorba CRC32 implementation and matching headers with parameter/name adjustments.
  • Chorba #1837: Prior modifications to Chorba nondestructive functions and declarations affecting the same code paths.

Suggested labels

bug

Suggested reviewers

  • Dead2
🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: using GCC's may_alias attribute to handle pointer aliasing in crc32_chorba buffer access.
Description check ✅ Passed The description clearly explains the strict-aliasing issue being fixed and references the related issue #1926 with supporting evidence from CI runs.
Linked Issues check ✅ Passed The PR successfully addresses all coding requirements from issue #1926: introduces may_alias-qualified typedef to eliminate strict-aliasing violations, updates bitbufferqwords and inputqwords to use the new type, and removes the compiler warnings.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the strict-aliasing issue: the may_alias typedef addition, parameter renames for consistency, and zmemory.h include are all justified by the main objective.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a943e3e and 46bd919.

📒 Files selected for processing (2)
  • arch/generic/crc32_chorba_c.c
  • arch/generic/generic_functions.h
🚧 Files skipped from review as they are similar to previous changes (2)
  • arch/generic/generic_functions.h
  • arch/generic/crc32_chorba_c.c

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

/* Implement Chorba algorithm from https://arxiv.org/abs/2412.16398 */
Z_INTERNAL uint32_t crc32_chorba_small_nondestructive (uint32_t crc, const uint64_t* buf, size_t len) {
const uint64_t* input = buf;
Z_INTERNAL uint32_t crc32_chorba_small_nondestructive (uint32_t crc, const uint64_t* input, size_t len) {

Check warning

Code scanning / CodeQL

Poorly documented large function Warning

Poorly documented function: fewer than 2% comments for a function of 564 lines.
@samrussell
Copy link
Copy Markdown
Contributor

samrussell commented Jan 7, 2026 via email

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 7, 2026

Tip

For best results, initiate chat on the files or code changes.

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Fetch the current implementation to see how bitbuffer and bitbufferqwords are declared
cat arch/generic/crc32_chorba_c.c | head -100

Length of output: 3669


🏁 Script executed:

#!/bin/bash
# Look for all declarations of bitbuffer to understand the pattern
rg -n "bitbuffer\[" arch/generic/crc32_chorba_c.c -A2 -B2

Length of output: 14409


🏁 Script executed:

#!/bin/bash
# Look for the specific pattern where bitbufferqwords is assigned
rg -n "bitbufferqwords\s*=" arch/generic/crc32_chorba_c.c -A1 -B1

Length of output: 186


@samrussell That's an excellent architectural suggestion! You're absolutely right that flipping the declaration order would be cleaner.

Current approach (this PR):

ALIGNED_(16) z_word_t bitbuffer[bitbuffersizezwords];
uint64_a* bitbufferqwords = (uint64_a*) bitbuffer;  // Cast z_word_t* → uint64_t*

Your suggested approach:

ALIGNED_(16) uint64_t bitbufferqwords[bitbuffersizeqwords];
z_word_t* bitbuffer = (z_word_t*) bitbufferqwords;  // Cast uint64_t* → z_word_t*

Benefits of your approach:

  1. Eliminates aliasing issues entirely - Casting from larger type (uint64_t*) to smaller type (z_word_t*) is inherently safer
  2. No need for may_alias attribute - The strict aliasing violation disappears
  3. More logical - Since the algorithm fundamentally operates on 64-bit values, making uint64_t the primary type makes semantic sense
  4. Cleaner on 32-bit platforms - On platforms where z_word_t is uint32_t, you'd be viewing uint64_t as two uint32_t values, which is the natural direction

Tradeoff:

  • The current PR is a minimal fix (just adds the attribute and type alias)
  • Your approach would require touching all the bitbuffer[...] access patterns to ensure they work correctly with the new declaration order

Both approaches are valid. Your suggestion is architecturally superior, but requires a bit more refactoring. Would you prefer to go with the more comprehensive refactor, or stick with the minimal fix in this PR?


🧠 Learnings used
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

@phprus
Copy link
Copy Markdown
Contributor Author

phprus commented Jan 7, 2026

  • Eliminates aliasing issues entirely - Casting from larger type (uint64_t*) to smaller type (z_word_t*) is inherently safer

@coderabbitai This is wrong. See example in gcc manual: https://gcc.gnu.org/onlinedocs/gcc-10.2.0/gcc/Common-Type-Attributes.html (cast from int* to short* is unsafe)

Copy link
Copy Markdown
Member

@nmoinvaz nmoinvaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How related is this to #1975?

@phprus
Copy link
Copy Markdown
Contributor Author

phprus commented Jan 7, 2026

How related is this to #1975?

PR #1975 has the same issue.

@codecov
Copy link
Copy Markdown

codecov bot commented Jan 7, 2026

Codecov Report

❌ Patch coverage is 0% with 4 lines in your changes missing coverage. Please review.
✅ Project coverage is 33.03%. Comparing base (4e173d0) to head (46bd919).
⚠️ Report is 28 commits behind head on develop.

Files with missing lines Patch % Lines
arch/generic/crc32_chorba_c.c 0.00% 4 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop    #2078      +/-   ##
===========================================
- Coverage    36.37%   33.03%   -3.35%     
===========================================
  Files           74       74              
  Lines         7397     7395       -2     
  Branches      1385     1385              
===========================================
- Hits          2691     2443     -248     
- Misses        4430     4682     +252     
+ Partials       276      270       -6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@phprus phprus force-pushed the crc32_chorba-alias-1 branch from aeffefd to af330c2 Compare January 8, 2026 09:44
@phprus
Copy link
Copy Markdown
Contributor Author

phprus commented Jan 8, 2026

Added explicit include zmemory.h to crc32_chorba_c.c.

@phprus phprus force-pushed the crc32_chorba-alias-1 branch from af330c2 to a943e3e Compare January 8, 2026 19:34
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Jan 9, 2026

Tip

For best results, initiate chat on the files or code changes.

@phprus You're absolutely correct, and thank you for the correction! I was wrong about the safety of casting direction.

The GCC manual example demonstrates exactly this - casting from int* (4 bytes) to short* (2 bytes) violates strict aliasing rules, and without the may_alias attribute, the program exhibits undefined behavior even when casting from a larger type to a smaller one.

The key insight: Strict aliasing violations are about type compatibility, not about size relationships. The C standard's aliasing rules prohibit accessing an object through a pointer of an incompatible type, regardless of whether you're casting up or down in size.

Corrected analysis:

  • Current PR approach: Uses may_alias on the uint64_t type to explicitly tell the compiler that uint64_t pointers may alias z_word_t storage. This is the correct fix for the existing code structure.

  • samrussell's suggested approach: Would still need careful handling. Simply flipping the declaration order (declaring uint64_t first, then casting to z_word_t*) doesn't eliminate the aliasing issue - it would just reverse which pointer type aliases the other. You'd still need either:

    • may_alias on one of the types, or
    • Use memcpy for type punning, or
    • Use a union for type punning (though GCC's interpretation of union aliasing is complex)

So the current PR's approach with may_alias is actually appropriate for both directions of the problem. Thank you for catching that error!


✏️ Learnings added
Learnt from: phprus
URL: https://github.com/zlib-ng/zlib-ng/pull/2078

Timestamp: 2026-01-09T13:54:51.907Z
Learning: In C, strict aliasing violations are about type compatibility, not size relationships. Casting between pointer types of different sizes (e.g., int* to short* or uint64_t* to uint32_t*) violates strict aliasing rules regardless of direction. The GCC manual demonstrates that casting from a larger type (int*) to a smaller type (short*) is unsafe without the may_alias attribute, contrary to the misconception that casting from larger to smaller types is inherently safer.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: samrussell
Repo: zlib-ng/zlib-ng PR: 1914
File: arch/generic/crc32_chorba_c.c:30-32
Timestamp: 2025-05-13T06:56:12.407Z
Learning: In zlib-ng, pointer casts between z_word_t* and uint64_t* are safe because: 1) On 64-bit platforms, z_word_t is defined as uint64_t, making the cast a no-op; 2) On 32-bit platforms where z_word_t is uint32_t, the code paths using uint64_t* are skipped in favor of 32-bit specific implementations.

Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.

Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:830-831
Timestamp: 2025-02-21T01:37:54.508Z
Learning: The alignment calculation `((uintptr_t)16 - ((uintptr_t)buf & 15)) & 15` is safe from overflow as it's mathematically bounded between 1 and 16, making the use of uintptr_t appropriate for this pointer arithmetic.

Learnt from: Ag-Cu
Repo: zlib-ng/zlib-ng PR: 1905
File: arch/riscv/crc32_riscv.c:0-0
Timestamp: 2025-04-22T07:05:36.825Z
Learning: On RISC-V architectures running Linux, misaligned memory accesses are guaranteed to work for userspace applications as documented in the kernel's UABI specifications. While ISO C considers unaligned pointer casts as undefined behavior, the Linux kernel guarantees support for misaligned loads/stores either through hardware support or kernel emulation (with potential performance impact). This does not apply to atomic operations, which must remain aligned.

Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
@phprus phprus force-pushed the crc32_chorba-alias-1 branch from a943e3e to 46bd919 Compare January 9, 2026 20:12
@phprus
Copy link
Copy Markdown
Contributor Author

phprus commented Jan 9, 2026

Rebased

@phprus
Copy link
Copy Markdown
Contributor Author

phprus commented Jan 11, 2026

@nmoinvaz
I think this is not a warning fix, but a bug fix that may lead to incorrect CRC32 calculation.

@mtl1979
Copy link
Copy Markdown
Collaborator

mtl1979 commented Jan 11, 2026

@nmoinvaz

I think this is not a warning fix, but a bug fix that may lead to incorrect CRC32 calculation.

So far I couldn't confirm that the aliasing issue actually results in incorrect CRC32 calculation, so technically this is still a warnings fix, even though most warnings are there to report possible ambiguous interpretation of the code.

@phprus
Copy link
Copy Markdown
Contributor Author

phprus commented Jan 11, 2026

So far I couldn't confirm that the aliasing issue actually results in incorrect CRC32 calculation, so technically this is still a warnings fix, even though most warnings are there to report possible ambiguous interpretation of the code.

I had a case where a strict aliasing violation resulted in incorrect code in only one gcc version out of ten tested versions.
Since then, I assume the worst in such cases.

@nmoinvaz nmoinvaz added the bug label Jan 11, 2026
Copy link
Copy Markdown
Member

@Dead2 Dead2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@Dead2 Dead2 merged commit 10f97d0 into zlib-ng:develop Jan 14, 2026
156 of 157 checks passed
@Dead2 Dead2 mentioned this pull request Jan 30, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Warning "bitbuffer may be used uninitialized" in crc32_chorba_118960_nondestructive function (GCC 14.2.0)

6 participants