fix(#4317): pass decompressedLength instead of data.size() in LZ4Compression.compress(Binary)#4322
Conversation
…ression.compress(Binary) When data.position() > 0, the compressor received a srcLen larger than the destination buffer, causing ArrayIndexOutOfBoundsException or silent corruption on every compress → decompress round-trip with a non-zero position.
Up to standards ✅🟢 Issues
|
| Metric | Results |
|---|---|
| Coverage variation | ✅ -7.34% coverage variation |
| Diff coverage | ✅ 100.00% diff coverage |
Coverage variation details
Coverable lines Covered lines Coverage Common ancestor commit (d1b1632) 127537 93311 73.16% Head commit (7b01785) 159215 (+31678) 104803 (+11492) 65.82% (-7.34%) Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch:
<coverage of head commit> - <coverage of common ancestor commit>
Diff coverage details
Coverable lines Covered lines Diff coverage Pull request (#4322) 2 2 100.00% Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified:
<covered lines added or modified>/<coverable lines added or modified> * 100%
NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.
There was a problem hiding this comment.
Code Review
This pull request fixes a bug in LZ4Compression.compress where the total size of the backing array was incorrectly used as the source length instead of the actual data length to be compressed. The changes include a one-line fix in the compression logic, a new regression test, and documentation of the issue. Review feedback indicates that the fix should also account for the internal array offset (getContentBeginOffset()) to correctly handle Binary slices, and suggests updating the regression test to use sliced buffers to ensure robustness against this offset issue.
| buffer.position(prefix.length); | ||
|
|
||
| final int decompressedLength = buffer.size() - buffer.position(); | ||
| final Binary compressed = CompressionFactory.getDefault().compress(buffer); | ||
| final Binary decompressed = CompressionFactory.getDefault().decompress(compressed, decompressedLength); |
There was a problem hiding this comment.
To ensure the fix is robust and correctly handles Binary slices (which have a non-zero internal array offset), it is recommended to use buffer.slice() in the regression test. This would have exposed the offset bug identified in LZ4Compression.java.
| buffer.position(prefix.length); | |
| final int decompressedLength = buffer.size() - buffer.position(); | |
| final Binary compressed = CompressionFactory.getDefault().compress(buffer); | |
| final Binary decompressed = CompressionFactory.getDefault().decompress(compressed, decompressedLength); | |
| final Binary slice = buffer.slice(prefix.length); | |
| final int decompressedLength = slice.size(); | |
| final Binary compressed = CompressionFactory.getDefault().compress(slice); | |
| final Binary decompressed = CompressionFactory.getDefault().decompress(compressed, decompressedLength); |
Code ReviewOverview: This PR fixes a bug in LZ4Compression.compress(Binary) where data.size() was passed as srcLen instead of decompressedLength. When position > 0, the compressor read past the intended slice causing ArrayIndexOutOfBoundsException or silent corruption. Fix is one line with a regression test. The Fix (LZ4Compression.java): Correct and minimal. decompressedLength was already computed two lines earlier but not used at the call site. The change is exactly right. Test Coverage (CompressionTest.java): The new test correctly reproduces the bug scenario. Two observations:
Documentation File (docs/4317-lz4-compression-srclen-bug.md): This file should not be committed. It duplicates the PR description and is a debugging artifact. Per-issue analysis docs add long-term noise - the PR body already captures all relevant context. Please remove this file before merging. Summary:
The core fix is solid. The only blocking item is removing docs/4317-lz4-compression-srclen-bug.md. |
…ecompress(Binary) For slice Binaries, position() is relative to the slice start but the backing array offset is tracked separately via getContentBeginOffset(). Both compress and decompress must use getContentBeginOffset() + position() as srcOff. Add a second regression test covering the slice Binary case.
Code Review - PR #4322OverviewThis PR fixes a real and impactful bug in Both bugs are fixed correctly, and regression tests cover both failure modes. The fix is minimal and well-targeted. What is correct
Issues and suggestions[Minor] Test assertions could be more precise. Both new tests use: assertThat(decompressed.getContent()).startsWith(payload);
assertThat(decompressed.size()).isEqualTo(decompressedLength);
assertThat(Arrays.copyOf(decompressed.getContent(), decompressed.size())).isEqualTo(payload);This makes it immediately obvious that we expect an exact match with [Minor] Inline comments in tests explain what, not why. Per the project coding guidelines, comments should only appear when the WHY is non-obvious. The first line of each comment (the issue number reference) is valuable for traceability. The second line (e.g., [Should fix] The The "Review Cycles" and "Final State" sections track head SHAs, bot reviewer names, approval states, and a "timeout" condition. This is review tooling state, not codebase documentation. Looking at the other doc files in No issues found in
SummaryThe core fix is correct and addresses a real data-corruption/crash bug. The two action items before merging are: (1) trim the doc file to remove review cycle tracking metadata, and (2) optionally tighten the test assertions for clarity. The code change itself is ready. |
Code ReviewOverviewThis PR fixes a real bug in The fix is correct and the approach is sound. Issues1. This file contains review-cycle metadata (bot login status, SHA references, PR URL, "Final State: timeout"). That is ephemeral scaffolding, not repository documentation. The existing files in Please drop this file from the commit. 2. Test assertions use Both new tests assert: assertThat(decompressed.getContent()).startsWith(payload);
assertThat(decompressed.size()).isEqualTo(decompressedLength);Because assertThat(decompressed.getContent()).isEqualTo(payload);Minor Observations
SummaryThe core fix (one line in |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #4322 +/- ##
============================================
+ Coverage 64.11% 64.13% +0.02%
- Complexity 423 431 +8
============================================
Files 1645 1645
Lines 127537 127537
Branches 27348 27348
============================================
+ Hits 81766 81797 +31
+ Misses 34257 34234 -23
+ Partials 11514 11506 -8 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Closes #4317
Summary
LZ4Compression.compress(Binary)was passingdata.size()(total backing-array length) as thesrcLenargument to the LZ4 compressor. The correct value — already computed two lines above asdecompressedLength = data.size() - data.position()— was ignored. Whendata.position() > 0, the compressor read past the intended slice, causing eitherArrayIndexOutOfBoundsExceptionor silent output corruption on every compress→decompress round-trip.The companion
decompress(Binary, …)overload already subtracted position correctly, making the asymmetry the direct source of corruption.Change: replace
data.size()withdecompressedLengthat the single call site inLZ4Compression.java.Test plan
CompressionTest#compressionWithNonZeroPosition: creates aBinarywith a non-zero position (simulating a partially-consumed buffer), compresses it, decompresses, and asserts byte-for-byte equality. Confirmed failing before the fix (ArrayIndexOutOfBoundsException) and passing after.CompressionTest#compressionand#emptyCompressioncontinue to pass.mvn test -Dtest=CompressionTest -pl engine🤖 Generated with Claude Code