Make deflate output deterministic if PREFIX3(stream) is reused after deflateReset#2102
Conversation
WalkthroughThis PR tightens a boundary check in LONGEST_MATCH (changing Changes
Possibly related PRs
Suggested labels
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
match_tpl.htest/CMakeLists.txttest/test_deflate_reset_deterministic.cc
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: nmoinvaz
Repo: zlib-ng/zlib-ng PR: 2077
File: test/test_crc32_copy.cc:15-29
Timestamp: 2026-01-12T23:00:52.569Z
Learning: In zlib-ng PR #2077, the test data in hash_test_strings_p.h (hash_tests array) was structured to exclude NULL buffer cases. NULL handling is performed at the API level in adler32.c and crc32.c, so test implementations in test_crc32_copy.cc and test_adler32_copy.cc do not need NULL guards for memcmp calls.
📚 Learning: 2025-11-15T12:43:50.353Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 2001
File: test/CMakeLists.txt:10-12
Timestamp: 2025-11-15T12:43:50.353Z
Learning: In zlib-ng, the TEST_STOCK_ZLIB CMake variable is intentionally not defined as a CMake option. It's meant to be set manually via command-line (e.g., -DTEST_STOCK_ZLIB=ON) when testing against stock zlib. The testing workflow for stock zlib requires undocumented manual steps and is considered experimental/incomplete.
Applied to files:
test/CMakeLists.txt
📚 Learning: 2026-01-12T23:00:52.569Z
Learnt from: nmoinvaz
Repo: zlib-ng/zlib-ng PR: 2077
File: test/test_crc32_copy.cc:15-29
Timestamp: 2026-01-12T23:00:52.569Z
Learning: In zlib-ng PR #2077, the test data in hash_test_strings_p.h (hash_tests array) was structured to exclude NULL buffer cases. NULL handling is performed at the API level in adler32.c and crc32.c, so test implementations in test_crc32_copy.cc and test_adler32_copy.cc do not need NULL guards for memcmp calls.
Applied to files:
test/CMakeLists.txttest/test_deflate_reset_deterministic.cc
📚 Learning: 2024-12-20T23:35:59.830Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1830
File: arch/generic/generic_functions.h:39-53
Timestamp: 2024-12-20T23:35:59.830Z
Learning: The `longest_match_unaligned_*` functions are templated using the LONGEST_MATCH macro in match_tpl.h, so their implementations are generated rather than explicitly defined.
Applied to files:
match_tpl.h
🪛 Cppcheck (2.19.0)
test/test_deflate_reset_deterministic.cc
[information] 3-3: Include file
(missingInclude)
[information] 7-7: Include file
(missingInclude)
[information] 10-10: Include file
(missingIncludeSystem)
[information] 11-11: Include file
(missingIncludeSystem)
[information] 12-12: Include file
(missingIncludeSystem)
[information] 13-13: Include file
(missingIncludeSystem)
[information] 14-14: Include file
(missingIncludeSystem)
[information] 16-16: Include file
(missingInclude)
[information] 20-20: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingInclude)
🔇 Additional comments (4)
match_tpl.h (1)
161-165: LGTM! Correct fix for the determinism issue.The change from
len > lookaheadtolen >= lookaheadproperly prevents the algorithm from searching for better matches when the current match already reaches the end of available input. AfterdeflateReset, stale hash table entries could cause different match decisions; by returning early at the boundary condition, the output becomes deterministic regardless of prior stream state.test/test_deflate_reset_deterministic.cc (2)
1-21: LGTM!The include structure follows the existing test patterns in zlib-ng, with proper ZLIB_COMPAT handling for the header selection.
346-379: LGTM! Well-designed test case.The test correctly validates deterministic output after
deflateResetby:
- Establishing a baseline compression of
data_awith a fresh stream- Using a second stream to compress different data (
data_b), populating hash tables- Resetting the second stream and re-compressing
data_a- Asserting both compressions match
This directly exercises the scenario from issue #2100. The
sizeof(data_a)-1correctly accounts for the implicit null terminator in the string literal.test/CMakeLists.txt (1)
187-187: LGTM!Correct placement within the
NOT TEST_STOCK_ZLIBblock, as this test validates zlib-ng specific deterministic behavior that stock zlib doesn't provide. Based on learnings,TEST_STOCK_ZLIBis for manual testing against stock zlib.
d029f76 to
9ff8c74
Compare
mtl1979
left a comment
There was a problem hiding this comment.
Can't use implicit cast to remove constness of pointer.
9ff8c74 to
9f1e911
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
match_tpl.htest/CMakeLists.txttest/test_deflate_reset_deterministic.cc
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: nmoinvaz
Repo: zlib-ng/zlib-ng PR: 2077
File: test/test_crc32_copy.cc:15-29
Timestamp: 2026-01-12T23:00:52.569Z
Learning: In zlib-ng PR #2077, the test data in hash_test_strings_p.h (hash_tests array) was structured to exclude NULL buffer cases. NULL handling is performed at the API level in adler32.c and crc32.c, so test implementations in test_crc32_copy.cc and test_adler32_copy.cc do not need NULL guards for memcmp calls.
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1830
File: arch/generic/generic_functions.h:39-53
Timestamp: 2024-12-20T23:35:59.830Z
Learning: The `longest_match_unaligned_*` functions are templated using the LONGEST_MATCH macro in match_tpl.h, so their implementations are generated rather than explicitly defined.
📚 Learning: 2025-11-15T12:43:50.353Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 2001
File: test/CMakeLists.txt:10-12
Timestamp: 2025-11-15T12:43:50.353Z
Learning: In zlib-ng, the TEST_STOCK_ZLIB CMake variable is intentionally not defined as a CMake option. It's meant to be set manually via command-line (e.g., -DTEST_STOCK_ZLIB=ON) when testing against stock zlib. The testing workflow for stock zlib requires undocumented manual steps and is considered experimental/incomplete.
Applied to files:
test/CMakeLists.txt
📚 Learning: 2026-01-12T23:00:52.569Z
Learnt from: nmoinvaz
Repo: zlib-ng/zlib-ng PR: 2077
File: test/test_crc32_copy.cc:15-29
Timestamp: 2026-01-12T23:00:52.569Z
Learning: In zlib-ng PR #2077, the test data in hash_test_strings_p.h (hash_tests array) was structured to exclude NULL buffer cases. NULL handling is performed at the API level in adler32.c and crc32.c, so test implementations in test_crc32_copy.cc and test_adler32_copy.cc do not need NULL guards for memcmp calls.
Applied to files:
test/CMakeLists.txttest/test_deflate_reset_deterministic.cc
📚 Learning: 2024-12-20T23:35:59.830Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1830
File: arch/generic/generic_functions.h:39-53
Timestamp: 2024-12-20T23:35:59.830Z
Learning: The `longest_match_unaligned_*` functions are templated using the LONGEST_MATCH macro in match_tpl.h, so their implementations are generated rather than explicitly defined.
Applied to files:
match_tpl.h
📚 Learning: 2025-02-21T01:41:50.358Z
Learnt from: KungFuJesus
Repo: zlib-ng/zlib-ng PR: 1872
File: arch/x86/chorba_sse2.c:14-24
Timestamp: 2025-02-21T01:41:50.358Z
Learning: In zlib-ng's SSE2 vectorized Chorba CRC implementation, the code that calls READ_NEXT macro ensures 16-byte alignment, making explicit alignment checks unnecessary within the macro.
Applied to files:
test/test_deflate_reset_deterministic.cc
📚 Learning: 2026-01-12T22:53:18.025Z
Learnt from: nmoinvaz
Repo: zlib-ng/zlib-ng PR: 2077
File: crc32.c:34-45
Timestamp: 2026-01-12T22:53:18.025Z
Learning: In zlib-ng, defensive NULL checks at public API entry points (like crc32() checking buf before calling crc32_z()) are intentional and should not be flagged as redundant, even when the called function also checks. This is an accepted defensive coding practice.
Applied to files:
test/test_deflate_reset_deterministic.cc
📚 Learning: 2025-07-04T20:17:58.531Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1928
File: gzlib.c:90-90
Timestamp: 2025-07-04T20:17:58.531Z
Learning: In zlib-ng's gz_buffer_alloc() function, the condition `(out_mult >= 1 && state->out == NULL)` is necessary because state->out is intentionally left as NULL when out_mult == 0 (write + direct mode), but should be allocated when out_mult >= 1. The condition properly distinguishes between intentionally NULL buffers and allocation failures.
Applied to files:
test/test_deflate_reset_deterministic.cc
📚 Learning: 2026-01-12T02:33:10.554Z
Learnt from: mtl1979
Repo: zlib-ng/zlib-ng PR: 2097
File: zlib.map.in:20-24
Timestamp: 2026-01-12T02:33:10.554Z
Learning: The zlib.map.in file should remain as close as possible to Mark Adler's version of zlib. Changes that would improve readability or consistency (such as adding explicit `global:` keywords) should not be made unless they are first adopted in the upstream zlib project.
Applied to files:
test/test_deflate_reset_deterministic.cc
📚 Learning: 2024-12-25T19:45:06.009Z
Learnt from: samrussell
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_braid_c.c:596-597
Timestamp: 2024-12-25T19:45:06.009Z
Learning: We should verify out-of-bounds pointer arithmetic in chorba_118960_nondestructive() and related loops where "input + i" is used, especially when len < expected block sizes.
Applied to files:
test/test_deflate_reset_deterministic.cc
🧬 Code graph analysis (1)
test/test_deflate_reset_deterministic.cc (2)
test/test_crc32.cc (1)
buffer(44-47)deflate.c (4)
deflate(758-1050)deflateInit2(244-371)deflateEnd(1053-1063)deflateReset(531-536)
🪛 Cppcheck (2.19.0)
test/test_deflate_reset_deterministic.cc
[information] 3-3: Include file
(missingInclude)
[information] 7-7: Include file
(missingInclude)
[information] 10-10: Include file
(missingIncludeSystem)
[information] 11-11: Include file
(missingIncludeSystem)
[information] 12-12: Include file
(missingIncludeSystem)
[information] 13-13: Include file
(missingIncludeSystem)
[information] 14-14: Include file
(missingIncludeSystem)
[information] 16-16: Include file
(missingInclude)
[information] 20-20: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingInclude)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (77)
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: EL10 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC ASAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS GCC Symbol Prefix & Compat (Intel)
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: EL10 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC Compat Symbol Prefix
- GitHub Check: Ubuntu GCC ASAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS GCC Symbol Prefix & Compat (Intel)
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: EL10 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC Compat Symbol Prefix
- GitHub Check: Ubuntu GCC ASAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS GCC Symbol Prefix & Compat (Intel)
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: EL10 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC Compat Symbol Prefix
- GitHub Check: Ubuntu GCC ASAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS GCC Symbol Prefix & Compat (Intel)
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: EL10 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC Compat Symbol Prefix
- GitHub Check: Ubuntu GCC ASAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS GCC Symbol Prefix & Compat (Intel)
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: EL10 GCC S390X DFLTCC ASAN
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC Compat Symbol Prefix
- GitHub Check: Ubuntu GCC ASAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS GCC Symbol Prefix & Compat (Intel)
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC Compat Symbol Prefix
- GitHub Check: Ubuntu GCC ASAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS GCC Symbol Prefix & Compat (Intel)
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC Compat Symbol Prefix
- GitHub Check: Ubuntu GCC ASAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS GCC Symbol Prefix & Compat (Intel)
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC Compat Symbol Prefix
- GitHub Check: Ubuntu GCC ASAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS GCC Symbol Prefix & Compat (Intel)
- GitHub Check: macOS Clang ASAN (Intel)
- GitHub Check: Ubuntu GCC AARCH64 No NEON UBSAN
- GitHub Check: Ubuntu GCC MIPS
- GitHub Check: Ubuntu GCC Compat Symbol Prefix
- GitHub Check: Ubuntu GCC ASAN
- GitHub Check: Ubuntu GCC SSE2 UBSAN
- GitHub Check: Analyze (cpp)
- GitHub Check: macOS GCC Symbol Prefix & Compat (Intel)
🔇 Additional comments (3)
test/CMakeLists.txt (1)
187-187: LGTM!The new test file is correctly placed within the
NOT TEST_STOCK_ZLIBconditional block, consistent with other zlib-ng specific tests that rely on internal behavior or specific compressed output layouts.match_tpl.h (1)
161-165: LGTM! Correct fix for determinism after deflateReset.When
len == lookahead, the match already spans all remaining input, so continuing to search the hash chain is pointless and can produce non-deterministic results due to stale hash table entries afterdeflateReset. The tightened condition ensures early termination in this boundary case, aligning with the OUT assertion that match length should not exceedlookahead.test/test_deflate_reset_deterministic.cc (1)
348-381: Well-structured regression test for issue #2100.The test correctly validates the determinism fix by:
- Establishing baseline output from a fresh stream
- Using a second stream to compress different data (populating hash tables)
- Resetting the second stream and compressing the same input
- Verifying outputs match
The chosen parameters (level 6, window_bits 15, memLevel 8) are good defaults that exercise the standard code paths. The test data appears crafted to trigger the specific hash table residue scenario.
| z_stream->next_out = (uint8_t *)buffer; | ||
| z_stream->avail_out = buffer_size; | ||
| err = PREFIX(deflate)(z_stream, op); | ||
| EXPECT_TRUE(err == Z_OK || err == Z_STREAM_END); |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider using ASSERT_TRUE for earlier test termination on compression errors.
Using EXPECT_TRUE allows the loop to continue even if deflate() returns an unexpected error, potentially causing confusing follow-up failures. ASSERT_TRUE would terminate the test immediately on failure.
That said, this is a minor point since the loop will terminate naturally on Z_STREAM_END and the test outcome will still be reported correctly.
♻️ Suggested change
- EXPECT_TRUE(err == Z_OK || err == Z_STREAM_END);
+ ASSERT_TRUE(err == Z_OK || err == Z_STREAM_END);📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| EXPECT_TRUE(err == Z_OK || err == Z_STREAM_END); | |
| ASSERT_TRUE(err == Z_OK || err == Z_STREAM_END); |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #2102 +/- ##
============================================
+ Coverage 33.76% 81.42% +47.66%
============================================
Files 75 158 +83
Lines 7581 13085 +5504
Branches 1470 3222 +1752
============================================
+ Hits 2560 10655 +8095
+ Misses 4751 1363 -3388
- Partials 270 1067 +797 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
9f1e911 to
0a58ddc
Compare
|
The bug fix might be correct, but is this actually the root cause though? |
I don't think this is the root cause... There is likely off-by-one error elsewhere (somewhere in the compare code) and this PR masks it. |
trifectatechfoundation/zlib-rs#459 has some investigation on the root cause. IIUC 8 byte value is read from the internal buffer: To make this deterministic, we could memset the internal buffer to any pattern (zeros maybe), but that it not a good solution in my opinion for two reasons:
|
So to avoid the garbage bytes, during reset |
|
The high level logic of the relevant part of Looking closer involves checking whether the preliminary length exceeds the remaining length. If so, then garbage bytes must have artificially increased the preliminary length, so the actual matching length is only the remaining length. Furthermore, there is no need to iterate further through the chain, because the match is already as long as possible. The function exits early. (This differs from the original zlib, where the length is only trimmed at this stage, without an early exit.) The problem is that, while comparing whether the current match is longer than the best match so far, the length is still a preliminary length, which may involve comparison of garbage bytes after the data. If a match logically reaches the end of the data, then it is unpredictable whether the preliminary length is equal to the remaining length, or greater. This causes divergence of the behavior in case the best match so far reached the end of the data, while garbage bytes did not match then, so there was no early exit, and we now have another match which logically reaches the end of the data. Depending on garbage bytes, the preliminary length of the second match can be equal to or greater than the remaining length, and thus the new match can be considered either as good as or better than the best match so far. Only a strictly better match is looked at closer, and ultimately this affects the distance used in the answer. After the fix, the early exit applies as soon as the match logically reaches the end of the data, not only when the preliminary length exceeds it, i.e. independently from garbage bytes. The first match which reaches the end of the data is already as long as possible, so we exit early. |
|
@QrczakMK That is some real good detective work, I am impressed 💯 We'll have to do some testing of course, but it makes sense and I agree with that @dzetkulic that resetting those garbage bytes might not be needed and might even help us catch match regressions in the future. |
|
As for testing, we use zlib-rs in many places in Google. It has been originally transcribed from zlib-ng and has the same issue. I tested the analogous fix to zlib-rs, and all our tests passed. |
Dead2
left a comment
There was a problem hiding this comment.
All of my tests pass.
Benchmarking-wise, this has negligible difference. It does look like it is very slightly faster on average (I'd expect so from sometimes not having to look for another match at end of window) over dozens of gbench runs, but benchmarks are too close to really tell for sure.
| # include "zlib-ng.h" | ||
| #endif | ||
|
|
||
| #include <stdio.h> |
There was a problem hiding this comment.
stdio.h, stdint.h, stdlib.h and string.h are already included from zbuild.h so doesn't need to be included here. string is not though.
There was a problem hiding this comment.
string is C++ header, so it makes no sense to have it in zbuild.h which should only contain C headers.
C++ equivalent of string.h is cstring.
There was a problem hiding this comment.
@mtl1979 I didn't mean that we need to add it to zbuild.h, I meant the ones that are already in zbuild.h should be removed from the test source, as they don't need to be parsed twice.
I don't know whether string itself is needed by this test or not.
There was a problem hiding this comment.
Rebased, headers removed.
string is used for the output buffer.
Manual memory management is not needed, and the EXPECT_EQ macro can be used to compare values. If the values don't match, gtest will print them.
There was a problem hiding this comment.
@mtl1979 I didn't mean that we need to add it to
zbuild.h, I meant the ones that are already inzbuild.hshould be removed from the test source, as they don't need to be parsed twice.I don't know whether
stringitself is needed by this test or not.
Most of the C++ headers are designed to be included before the equivalent C header as they include the C header after defining some preprocessor defines. For a lot of standard C headers the C++ equivalent name is prefixed with c and the extension is omitted.
|
It evolved from a particular wrapper for streaming compression. For the curious: https://github.com/google/riegeli/blob/master/riegeli/zlib/zlib_writer.cc Splitting input and output into buffers does not impact the outcome. For simplicity, it might use some appropriate string-to-string one-shot compressor, as long as the |
0a58ddc to
65ea811
Compare
|
Can we shorten the filename of the test.. maybe something like dtrmin or whatever? |
Maybe |
|
In trifectatechfoundation/zlib-rs#461 we might find simpler test data for triggering this. |
|
I was able to manually reduce it to const DATA_A: &[u8] = b"\0AAAA\0AAAAAAAA";
const DATA_B: &[u8] = &[1u8; DATA_A.len() + 1]; |
65ea811 to
b3e834a
Compare
|
Rebased. |
b3e834a to
b0c5e22
Compare
b0c5e22 to
dad8c19
Compare
dad8c19 to
f4aa5bc
Compare
Renamed to |
There was a problem hiding this comment.
Actionable comments posted: 1
📜 Review details
Configuration used: Repository UI
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (3)
match_tpl.htest/CMakeLists.txttest/test_deflate_deterministic.cc
🧰 Additional context used
🧠 Learnings (6)
📓 Common learnings
Learnt from: nmoinvaz
Repo: zlib-ng/zlib-ng PR: 2077
File: test/test_crc32_copy.cc:15-29
Timestamp: 2026-01-12T23:00:59.417Z
Learning: In zlib-ng PR `#2077`, the test data in hash_test_strings_p.h (hash_tests array) was structured to exclude NULL buffer cases. NULL handling is performed at the API level in adler32.c and crc32.c, so test implementations in test_crc32_copy.cc and test_adler32_copy.cc do not need NULL guards for memcmp calls.
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 2046
File: deflate_quick.c:116-116
Timestamp: 2025-12-08T13:43:07.260Z
Learning: In zlib-ng, the compare256() function (called through the function table) returns at most 256, so match_len calculations of the form `compare256(...) + 2` are inherently bounded to a maximum of 258 (which equals STD_MAX_MATCH), making explicit clamping to STD_MAX_MATCH unnecessary after such calculations.
📚 Learning: 2025-11-15T12:43:50.353Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 2001
File: test/CMakeLists.txt:10-12
Timestamp: 2025-11-15T12:43:50.353Z
Learning: In zlib-ng, the TEST_STOCK_ZLIB CMake variable is intentionally not defined as a CMake option. It's meant to be set manually via command-line (e.g., -DTEST_STOCK_ZLIB=ON) when testing against stock zlib. The testing workflow for stock zlib requires undocumented manual steps and is considered experimental/incomplete.
Applied to files:
test/CMakeLists.txttest/test_deflate_deterministic.cc
📚 Learning: 2026-01-12T23:00:59.417Z
Learnt from: nmoinvaz
Repo: zlib-ng/zlib-ng PR: 2077
File: test/test_crc32_copy.cc:15-29
Timestamp: 2026-01-12T23:00:59.417Z
Learning: In zlib-ng PR `#2077`, the test data in hash_test_strings_p.h (hash_tests array) was structured to exclude NULL buffer cases. NULL handling is performed at the API level in adler32.c and crc32.c, so test implementations in test_crc32_copy.cc and test_adler32_copy.cc do not need NULL guards for memcmp calls.
Applied to files:
test/CMakeLists.txttest/test_deflate_deterministic.cc
📚 Learning: 2024-12-20T23:35:59.830Z
Learnt from: Dead2
Repo: zlib-ng/zlib-ng PR: 1830
File: arch/generic/generic_functions.h:39-53
Timestamp: 2024-12-20T23:35:59.830Z
Learning: The `longest_match_unaligned_*` functions are templated using the LONGEST_MATCH macro in match_tpl.h, so their implementations are generated rather than explicitly defined.
Applied to files:
match_tpl.h
📚 Learning: 2025-01-06T07:20:02.028Z
Learnt from: samrussell
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_braid_c.c:0-0
Timestamp: 2025-01-06T07:20:02.028Z
Learning: Business requirement: Additional memory allocations should be avoided, stack allocations might be used instead.
Applied to files:
test/test_deflate_deterministic.cc
📚 Learning: 2024-12-27T10:06:16.184Z
Learnt from: samrussell
Repo: zlib-ng/zlib-ng PR: 1837
File: arch/generic/crc32_braid_c.c:215-217
Timestamp: 2024-12-27T10:06:16.184Z
Learning: There's a business requirement to use stack allocation instead of heap for the Chorba buffer.
Applied to files:
test/test_deflate_deterministic.cc
🪛 Cppcheck (2.19.0)
test/test_deflate_deterministic.cc
[information] 3-3: Include file
(missingInclude)
[information] 7-7: Include file
(missingInclude)
[information] 10-10: Include file
(missingIncludeSystem)
[information] 12-12: Include file
(missingInclude)
[information] 16-16: Include file
(missingIncludeSystem)
[information] 5-5: Include file
(missingInclude)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (10)
- GitHub Check: EL10 Clang S390X DFLTCC MSAN
- GitHub Check: EL10 Clang S390X DFLTCC MSAN
- GitHub Check: EL10 Clang S390X DFLTCC MSAN
- GitHub Check: EL10 Clang S390X DFLTCC MSAN
- GitHub Check: EL10 Clang S390X DFLTCC MSAN
- GitHub Check: EL10 Clang S390X DFLTCC MSAN
- GitHub Check: EL10 Clang S390X DFLTCC MSAN
- GitHub Check: EL10 Clang S390X DFLTCC MSAN
- GitHub Check: EL10 Clang S390X DFLTCC MSAN
- GitHub Check: EL10 Clang S390X DFLTCC MSAN
🔇 Additional comments (3)
match_tpl.h (1)
161-165: Correct fix for nondeterministic behavior.The boundary condition change from
>to>=is the right approach. Whenlen == lookahead, the match exactly reaches the end of available input data. Previously, continuing the search could read garbage bytes past the logical boundary (via 8-byte reads likescan_start = zng_memread_8(scan)), causing nondeterministic match selection when the stream is reused afterdeflateReset. By returning immediately whenlen >= lookahead, subsequent comparisons that would read uninitialized memory are avoided.The comment update accurately reflects the new semantics.
test/test_deflate_deterministic.cc (1)
39-72: Test logic is correct and well-structured.The test correctly validates the deterministic behavior fix by:
- Compressing
data_awith a fresh stream- Compressing larger
data_bto populate the internal window with different content- Resetting the stream and compressing
data_aagain- Asserting the outputs match
This reproduces the exact scenario from issue
#2100where garbage bytes from a previous session could influence match selection.test/CMakeLists.txt (1)
181-191: Correct placement and consistent style.The new test is appropriately placed within the
NOT TEST_STOCK_ZLIBblock since it relies on zlib-ng internal behavior (the specific LONGEST_MATCH implementation). Comment alignment and style are consistent with surrounding entries.
✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.
…deflateReset Co-authored-by: Marcin Kowalczyk <QrczakMK@gmail.com> Signed-off-by: Vladislav Shchapov <vladislav@shchapov.ru>
f4aa5bc to
8355ad9
Compare
|
Do you have any indication on when this fix will make it into a release? |
|
@folkertdev I can't promise anything, but I am hoping to backport this and a few other fixes and make a release later this week. |
Issue: #2100
@QrczakMK