Fix unaligned access in ACLE based crc32#1833
Conversation
WalkthroughThe pull request modifies the Changes
Possibly related PRs
Suggested labels
Suggested reviewers
📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (1)
🧰 Additional context used📓 Learnings (1)test/test_crc32.cc (2)🔇 Additional comments (3)test/test_crc32.cc (3)
The array covers all possible misalignment cases for up to 8-byte aligned accesses, which is appropriate for testing ARM ACLE CRC32 implementation.
The static analysis tool reports that Let's verify the array usage:
Improve memory safety and error handling in alignment test class The current implementation has several potential issues:
Consider this safer implementation: class crc32_align : public ::testing::TestWithParam<int> {
public:
void hash(int param, crc32_func crc32) {
+ ASSERT_GE(param, 0) << "Alignment offset must be non-negative";
+
+ size_t size;
+ if (__builtin_add_overflow(128, param, &size)) {
+ FAIL() << "Buffer size would overflow";
+ return;
+ }
+
uint8_t *buf = (uint8_t*)zng_alloc(sizeof(uint8_t) * (128 + param));
- if (buf != NULL) {
- (void)crc32(0, buf + param, 128);
- } else {
- FAIL();
- }
+ ASSERT_NE(buf, nullptr) << "Memory allocation failed";
+
+ try {
+ (void)crc32(0, buf + param, 128);
+ } catch (...) {
+ zng_free(buf);
+ throw;
+ }
zng_free(buf);
}
};⛔ Skipped due to learningsThank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 0
🧹 Nitpick comments (2)
arch/arm/crc32_acle.c (2)
49-52: Performance optimization through simplified pointer handlingThe main loop's changes offer potential performance benefits:
- Single pointer increment and length decrement instead of multiple conditional reassignments
- Safe memory access through
memcpyfor 8-byte chunks- Natural alignment handling for optimal CPU performance
This aligns with the PR's goal of performance improvement for larger buffers.
Consider adding a size threshold for small buffers to bypass the alignment handling overhead when processing very small inputs.
Line range hint
14-66: Excellent architectural improvements for CRC32 ACLE implementationThe changes successfully address both objectives:
- Alignment Safety: Eliminates unaligned memory access through consistent use of
memcpyand proper variable declarations- Performance: Simplifies pointer handling and maintains efficient processing of larger buffers while ensuring proper alignment
The implementation maintains a good balance between safety and performance.
Consider adding performance benchmarks to the test suite to validate the performance improvements, especially for larger buffers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
arch/arm/crc32_acle.c(2 hunks)
🔇 Additional comments (3)
arch/arm/crc32_acle.c (3)
14-16: LGTM: Good alignment-safe variable declarations
The introduction of properly-typed local variables (buf2, buf4, buf8) ensures natural alignment for different access sizes, which is crucial for the subsequent memcpy operations.
56-66: LGTM: Robust handling of remaining bytes
The implementation properly handles remaining bytes with:
- Consistent use of
memcpyfor safe memory access - Correct progression from larger (4-byte) to smaller (2-byte) chunks
- Proper buffer pointer advancement
33-43: Improved alignment handling with safe memory access
The use of memcpy for different sized chunks (2 and 4 bytes) properly handles potential unaligned access while maintaining the original alignment progression. This change effectively addresses the alignment sanitizer concerns.
Let's verify the alignment assumptions:
✅ Verification successful
Alignment handling is correctly implemented across architectures
The verification confirms that the alignment handling in arch/arm/crc32_acle.c is consistent with best practices and aligns with other architecture implementations:
- The ARM implementation correctly uses
memcpyfor all data sizes (2, 4, and 8 bytes) with proper alignment checks - Similar alignment-aware implementations are found in other architectures:
- S390 uses explicit alignment masks and vector attributes
- x86 implementations handle alignment through SIMD-specific approaches
- The generic implementation uses safe memory access patterns
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Check for other potential unaligned access patterns in the codebase
rg -l "(__packed__|__attribute__\(\(packed\)\))" | while read file; do
echo "=== $file ==="
rg "struct.*(__packed__|__attribute__\(\(packed\)\))" "$file" -A 5
done
Length of output: 62
Script:
#!/bin/bash
# Let's check for alignment-related patterns and memory access in the CRC implementation
rg -B 2 -A 2 "(memcpy|aligned|unaligned|__packed__|__attribute__)" arch/arm/crc32_acle.c
# Also check for similar CRC implementations in other architectures
fd "crc32.*\.[ch]$" | while read -r file; do
echo "=== $file ==="
rg -B 2 -A 2 "(memcpy|aligned|unaligned|__packed__|__attribute__)" "$file"
done
Length of output: 6751
|
The code is supposed to ensure that the pointers are aligned by performing smaller loads until alignment is reached, so if that isn't happening it's likely that there's something else wrong. Using |
It doesn't though, as the compiler optimizes memcpys of fixed lengths to call builtins that translate the calls to direct unaligned loads on architectures that support it (which ARM does). This is more to ensure that we're calling well defined behavior within C. The sanitizer detected unaligned access at runtime, so the ladder of ifs had a lapse in proper alignment checks. Actually I think I see where the alignment checks are failing because the & trick for powers of 2 requires a subtraction of one. This is being done for the first one but not subsequent ones. |
0eeb75c to
a60a8af
Compare
|
Now the access should actually be aligned to the data type being read (for the slight benefit of speed on any particularly ARM chip). |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #1833 +/- ##
===========================================
- Coverage 82.40% 82.39% -0.02%
===========================================
Files 136 136
Lines 11021 11035 +14
Branches 2799 2805 +6
===========================================
+ Hits 9082 9092 +10
Misses 1239 1239
- Partials 700 704 +4 ☔ View full report in Codecov by Sentry. |
The behaviour on ARMv6 and newer is dependent on how the operating system configures alignment exceptions - if they're enabled (or ARMv4/v5 behaviour is enabled on ARMv6), applications and libraries will need to be compiled without unaligned access, and the behaviour observed in PR #1548 will occur instead. RISC OS is an example of a platform that both supports ARMv8 with CRC extensions and has alignment exceptions enabled by default. |
|
Hmm, so it sounds like the may_alias attribute is needed for those cases. I can write this so that it just dereferences a pointer but it's hard to imagine a case where the compiler wouldn't treat that the same way as us calling a fixed sized memcpy as it sees we're assigning the pointer from something that only has 1 byte alignment. FWIW, with the full test corpora this passes without memcpy and simply just deref'ing the pointers as the respective types, so you shouldn't be seeing unaligned access regardless. The memcpys are more there as a guard rail (though it sounds like one you can't rely on if you're on RISC OS, lest you want to compile your binaries with forced memcpy's there). |
a60a8af to
4ca955b
Compare
|
@ccawley2011 It's now memcpy free. We'll have to rely on the alignment sanitizer to police it here. |
Aarch64 rpi5Before PRAfter PRPerformance here seems to be pretty much identical. |
|
Unaligned access is not fixed Ubuntu GCC AARCH64 No NEON UBSAN (https://github.com/zlib-ng/zlib-ng/actions/runs/12447135765/job/34750296246?pr=1833): |
Thanks for noticing, we have had many months of failing CIs due to github not supporting s390x themselves, so we have gotten too used to CI always failing. |
|
Ugh, I ran through the whole data test corpus without issue but alignment is just kind of a chance thing if we can't generate unaligned input. Maybe I'll need to test it that way, forcing unaligned input. |
4ca955b to
959374f
Compare
|
Test for unaligned access (pseudocode): char buf[32];
size_t len = sizeof(buf);
for (size_t i = 0; i < len; i++)
{
buf[i] = 'a' + (i % 26);
}
for (size_t offset = 0; offset < sizeof(uint64_t); offset++)
{
for (size_t current_len = 1; current_len < (current_len - offset); current_len++)
{
c = crc(0, buf + offset, current_len);
ignore(c);
}
} |
Yep that's more or less how I tested it, I just did it in the benchmark code where we start with a 64 byte aligned address and walked values 1-7 for that. |
Maybe add this test to zlib-ng? |
959374f to
d042ddc
Compare
|
Rebased to go all green. |
I mean my test consisted of running that in conjunction with ubsan. If not compiled with ubsan it's not much of a test but I suppose I could write it as at least one runner will run it with that. I've never written gtests before, though. |
d042ddc to
32b2652
Compare
|
Ok, I added a gtest specifically for this. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
test/test_crc32.cc (1)
Line range hint
204-247: Consider adding more comprehensive alignment test cases.While the current implementation tests basic alignment scenarios, consider enhancing the test coverage:
- Test with different buffer sizes
- Test with different CRC initial values
- Test edge cases around alignment boundaries
Example enhancement:
struct align_test_param { int offset; size_t size; uint32_t initial_crc; }; static const align_test_param align_test_cases[] = { {1, 128, 0}, {2, 64, 0xffffffff}, {4, 256, 0x12345678}, // Add more test cases }; class crc32_align : public ::testing::TestWithParam<align_test_param> { // Update implementation accordingly };
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
arch/arm/crc32_acle.c(2 hunks)test/test_crc32.cc(5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- arch/arm/crc32_acle.c
🔇 Additional comments (4)
test/test_crc32.cc (4)
29-31: LGTM: Alignment test values cover essential cases.
The array includes offsets 1-7, which effectively tests various alignment boundaries for different data types (16-bit, 32-bit, 64-bit).
219-219: LGTM: Test suite registration is correct.
The alignment test cases are properly registered with the test framework using the alignment offsets.
230-237: LGTM: Test macro follows the same pattern as TEST_CRC32.
The new alignment test macro correctly mirrors the structure of the existing TEST_CRC32 macro, maintaining consistency in the codebase.
247-247: LGTM: ACLE alignment test is properly guarded.
The alignment test for ACLE is correctly placed within the ARM_ACLE conditional compilation block and uses the appropriate CPU feature flag.
| /* Specifically to test where we had dodgy alignment in the acle CRC32 | ||
| * function. All others are either byte level access or use intrinsics | ||
| * that work with unaligned access */ | ||
| class crc32_align : public ::testing::TestWithParam<int> { | ||
| public: | ||
| void hash(int param, crc32_func crc32) { | ||
| uint8_t *buf = (uint8_t*)zng_alloc(sizeof(uint8_t) * (128 + param)); | ||
| if (buf != NULL) { | ||
| (void)crc32(0, buf + param, 128); | ||
| } | ||
| zng_free(buf); | ||
| } | ||
| }; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider adding buffer initialization and size validation.
While the test class correctly allocates memory for alignment testing, there are a few improvements to consider:
- The allocated buffer is uninitialized, which might lead to undefined behavior
- There's no validation that
paramis non-negative - Memory allocation failure isn't properly handled
Consider this improved implementation:
class crc32_align : public ::testing::TestWithParam<int> {
public:
void hash(int param, crc32_func crc32) {
+ ASSERT_GE(param, 0) << "Alignment offset must be non-negative";
uint8_t *buf = (uint8_t*)zng_alloc(sizeof(uint8_t) * (128 + param));
- if (buf != NULL) {
- (void)crc32(0, buf + param, 128);
- }
+ ASSERT_NE(buf, nullptr) << "Memory allocation failed";
+ // Initialize buffer to avoid undefined behavior
+ memset(buf, 0xAA, 128 + param);
+ (void)crc32(0, buf + param, 128);
zng_free(buf);
}
};📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| /* Specifically to test where we had dodgy alignment in the acle CRC32 | |
| * function. All others are either byte level access or use intrinsics | |
| * that work with unaligned access */ | |
| class crc32_align : public ::testing::TestWithParam<int> { | |
| public: | |
| void hash(int param, crc32_func crc32) { | |
| uint8_t *buf = (uint8_t*)zng_alloc(sizeof(uint8_t) * (128 + param)); | |
| if (buf != NULL) { | |
| (void)crc32(0, buf + param, 128); | |
| } | |
| zng_free(buf); | |
| } | |
| }; | |
| /* Specifically to test where we had dodgy alignment in the acle CRC32 | |
| * function. All others are either byte level access or use intrinsics | |
| * that work with unaligned access */ | |
| class crc32_align : public ::testing::TestWithParam<int> { | |
| public: | |
| void hash(int param, crc32_func crc32) { | |
| ASSERT_GE(param, 0) << "Alignment offset must be non-negative"; | |
| uint8_t *buf = (uint8_t*)zng_alloc(sizeof(uint8_t) * (128 + param)); | |
| ASSERT_NE(buf, nullptr) << "Memory allocation failed"; | |
| // Initialize buffer to avoid undefined behavior | |
| memset(buf, 0xAA, 128 + param); | |
| (void)crc32(0, buf + param, 128); | |
| zng_free(buf); | |
| } | |
| }; |
There was a problem hiding this comment.
We need to use single buffer for all offsets to ensure that all addresses are checked.
zng_alloc guarantees alignment. So everything will work. I made a mistake.
There was a problem hiding this comment.
@phprus Entendido! A abordagem mais simples seria alocar um buffer grande o suficiente para o maior deslocamento antes de todos os testes, usar esse mesmo buffer para cada offset e liberar no final. Segue um exemplo de como você pode implementar isso usando o fixture de teste:
#include <gtest/gtest.h>
class crc32_align : public ::testing::TestWithParam {
protected:
static inline uint8_t *sBuffer = nullptr;
// Ajuste este valor de acordo com o valor máximo de offset que deseja testar.
// Supondo que o maior offset seja 7, precisamos 128 + 7 = 135 bytes.
static const int kBufferSize = 128 + 7;
static void SetUpTestSuite() {
sBuffer = (uint8_t*)zng_alloc(sizeof(uint8_t) * kBufferSize);
ASSERT_NE(sBuffer, nullptr) << "Memory allocation failed";
// Inicializa o buffer para evitar comportamentos indefinidos
memset(sBuffer, 0xAA, kBufferSize);
}
static void TearDownTestSuite() {
zng_free(sBuffer);
sBuffer = nullptr;
}
void hash(int param, crc32_func crc32) {
// Ainda assim podemos garantir que param >= 0
ASSERT_GE(param, 0) << "Alignment offset must be non-negative";
// Usa o mesmo buffer alocado, variando apenas o início.
(void)crc32(0, sBuffer + param, 128);
}
};
INSTANTIATE_TEST_SUITE_P(crc32_alignment, crc32_align, testing::ValuesIn(align_offsets));
#define TEST_CRC32_ALIGN(name, func, support_flag)
TEST_P(crc32_align, name) {
if (!(support_flag)) {
GTEST_SKIP();
return;
}
hash(GetParam(), func);
}
Dessa forma, usamos um único buffer para todos os offsets, garantindo que cada endereço seja testado. Se você tiver mais de um conjunto de offsets maior que 7, basta ajustar kBufferSize apropriadamente. Se surgir alguma dúvida adicional, é só avisar!
✏️ Learnings added
Learnt from: phprus
PR: zlib-ng/zlib-ng#1833
File: test/test_crc32.cc:204-216
Timestamp: 2024-12-22T18:12:46.241Z
Learning: Use uma única alocação de buffer para testar vários deslocamentos (offsets) em testes de alinhamento, garantindo que todas as posições sejam efetivamente cobertas.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
There was a problem hiding this comment.
We don't actually care what the memory contents are, we are simply testing varying alignments against the alignment sanitizer, here. Additionally, if an allocation somehow fails we have bigger problems and it will crash, causing a failure.
There was a problem hiding this comment.
Seems like the humans are having a chat. I'll hop back into my burrow for now. If you need me again, just tag @coderabbitai in a new comment, and I'll come hopping out!
32b2652 to
fa36a89
Compare
fa36a89 to
38cd582
Compare
|
Ok, I think this is now good to go. |
38cd582 to
19b558c
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
arch/arm/crc32_acle.c(2 hunks)test/test_crc32.cc(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- arch/arm/crc32_acle.c
🧰 Additional context used
📓 Learnings (1)
test/test_crc32.cc (1)
Learnt from: phprus
PR: zlib-ng/zlib-ng#1833
File: test/test_crc32.cc:204-216
Timestamp: 2024-12-22T18:12:46.391Z
Learning: Use uma única alocação de buffer para testar vários deslocamentos (offsets) em testes de alinhamento, garantindo que todas as posições sejam efetivamente cobertas.
🔇 Additional comments (2)
test/test_crc32.cc (2)
233-235: LGTM! Comprehensive alignment test coverage
The array covers all possible misalignment scenarios (1-7 bytes) which is perfect for testing unaligned access cases.
237-248: LGTM! Well-structured test implementation
The test macro and instantiation follow good testing practices and maintain consistency with existing code patterns.
Let's verify the usage of align_offsets array to address the static analysis warning:
This fixes a rightful complaint from the alignment sanitizer that we alias memory in an unaligned fashion. A nice added bonus is that this improves performance a tiny bit on the larger buffers, perhaps due to loops that idiomatically decrement a count and increment a single buffer pointer rather than the maze of conditional pointer reassignments. While here, let's write a unit test just for this. Since this is the only variant that accesses memory in a potentially unaligned fashion that doesn't explicitly go byte by byte or use intrinsics that don't require alignment, we'll enable it only for this function for now. Adding more tests later if need be should be possible. For everything else not crc, we're relying on ubsan to hopefully catch things by chance.
19b558c to
90722f4
Compare
This fixes a rightful complaint from the alignment sanitizer that we alias memory in an unaligned fashion. A nice added bonus is that this improves performance a tiny by on the larger buffers, perhaps due to loops that idiomatically decrement a count and increment a single buffer pointer rather than the maze of conditional pointer reassignments.
Summary by CodeRabbit
Bug Fixes
New Features