Skip to content

fix(#4329): use readFully instead of read in JavaBinarySerializer.readExternal#4348

Merged
robfrank merged 3 commits into
mainfrom
fix/4329-javabinaryserializer-readfully
May 27, 2026
Merged

fix(#4329): use readFully instead of read in JavaBinarySerializer.readExternal#4348
robfrank merged 3 commits into
mainfrom
fix/4329-javabinaryserializer-readfully

Conversation

@robfrank

Copy link
Copy Markdown
Collaborator

Closes #4329

ObjectInput.read(byte[]) is permitted by contract to return fewer bytes than the array length (a short read). On compressed or network-backed ObjectInput implementations this is possible. JavaBinarySerializer.readExternal called in.read(array) and ignored the return value, so on a short read the remaining bytes were silently lost, misaligning the stream and corrupting every subsequent property in the record.

The fix is a one-character change: replace in.read(array) with in.readFully(array), which loops until the buffer is completely filled or throws EOFException.

Test plan

  • New regression test readExternalSurvivesShortReadInput wraps the deserialization stream in a ThrottledObjectInput that returns at most 1 byte per read(byte[], off, len) call, directly exercising the short-read path.
  • Test confirmed FAIL before the fix (UTFDataFormatException from stream misalignment) and PASS after.
  • Full JavaBinarySerializerTest suite: 6/6 passed.
  • All serializer tests (*Serializer*): 30/30 passed, no regressions.

@codacy-production

codacy-production Bot commented May 26, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Coverage 100.00% diff coverage · -7.95% coverage variation

Metric Results
Coverage variation -7.95% coverage variation
Diff coverage 100.00% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (61177a8) 127719 94366 73.89%
Head commit (0eb8448) 159397 (+31678) 105102 (+10736) 65.94% (-7.95%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4348) 1 1 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a partial read corruption issue in JavaBinarySerializer.readExternal by replacing in.read(array) with in.readFully(array) to guarantee the buffer is fully populated. It also introduces a regression test using a custom ThrottledObjectInput helper. The review feedback suggests adding standard index validation and bounds checks to the custom read and readFully methods in the test helper to fully comply with standard Java stream contracts.

Comment on lines +80 to +84
public int read(final byte[] b, final int off, final int len) throws IOException {
if (len == 0)
return 0;
return delegate.read(b, off, 1);
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The custom read implementation in ThrottledObjectInput does not perform standard bounds checks on off and len. To comply with the ObjectInput.read contract and prevent potential out-of-bounds access or unexpected behavior, standard index validation should be added.

    @Override
    public int read(final byte[] b, final int off, final int len) throws IOException {
      if (off < 0 || len < 0 || len > b.length - off) {
        throw new IndexOutOfBoundsException();
      }
      if (len == 0)
        return 0;
      return delegate.read(b, off, 1);
    }

Comment on lines +97 to +105
public void readFully(final byte[] b, final int off, final int len) throws IOException {
int total = 0;
while (total < len) {
final int n = read(b, off + total, len - total);
if (n < 0)
throw new EOFException();
total += n;
}
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The custom readFully implementation in ThrottledObjectInput does not perform standard bounds checks on off and len. To comply with the DataInput.readFully contract, standard index validation should be added.

Suggested change
public void readFully(final byte[] b, final int off, final int len) throws IOException {
int total = 0;
while (total < len) {
final int n = read(b, off + total, len - total);
if (n < 0)
throw new EOFException();
total += n;
}
}
@Override
public void readFully(final byte[] b, final int off, final int len) throws IOException {
if (off < 0 || len < 0 || len > b.length - off) {
throw new IndexOutOfBoundsException();
}
int total = 0;
while (total < len) {
final int n = read(b, off + total, len - total);
if (n < 0)
throw new EOFException();
total += n;
}
}

@claude

claude Bot commented May 26, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR fixes a real, contract-level bug in JavaBinarySerializer.readExternal: ObjectInput.read(byte[]) is permitted to return fewer bytes than requested (short read), but the old code ignored the return value. The fix - switching to readFully() - is minimal and correct.


The Fix

JavaBinarySerializer.java line 129 - one-character change, exactly right. readFully is the correct API here since DataInput guarantees the buffer is completely filled or throws EOFException. No issues.


Issues

1. The docs file should not be part of this PR

docs/4329-javabinaryserializer-readfully.md is a planning/analysis document. The PR description and commit message already contain all the context this doc adds. Please drop it before merging.

2. Multi-line Javadoc blocks violate project style

The project guidelines say "Never write multi-paragraph docstrings or multi-line comment blocks - one short line max." Both the test method and the ThrottledObjectInput class carry multi-paragraph Javadocs. These should be trimmed to a single line or removed entirely. For example:

// short-read regression: readFully must loop until the buffer is completely filled
@Test
void readExternalSurvivesShortReadInput() throws Exception { ... }
// simulates a short-read ObjectInput - returns at most 1 byte per read(byte[],int,int) call
private static final class ThrottledObjectInput implements ObjectInput { ... }

Test Quality

The regression test design is solid:

  • ThrottledObjectInput correctly limits read(byte[], off, len) to 1 byte, then implements readFully by looping through that same throttled path. This is the critical detail: it proves that readFully must internally loop to succeed, which is exactly what the fix relies on.
  • The round-trip test covers multiple property types (Long, String, double) including schema-defined and schemaless properties - good breadth.
  • The test correctly uses try-with-resources and follows the assertThat(...).isEqualTo(...) AssertJ convention.
  • ThrottledObjectInput as a private static nested class is an appropriate placement.

Summary

The fix is correct and the test design is good. Two changes needed before merge:

  1. Remove docs/4329-javabinaryserializer-readfully.md
  2. Trim the multi-line Javadoc comments to single-line per project style guidelines

@claude

claude Bot commented May 26, 2026

Copy link
Copy Markdown

Code Review

Overall assessment: The fix is correct and the test is well-designed. One change is needed before merge.


The fix (line 129) - correct

Replacing in.read(array) with in.readFully(array) is the right call. ObjectInput.read(byte[]) may legally return fewer bytes than requested on compressed or networked streams; readFully loops internally until the buffer is complete or throws EOFException. This was a silent data-corruption bug on any non-trivial ObjectInput implementation.


Test - well done

The ThrottledObjectInput design is exactly right:

  • read(byte[], int, int) returns at most 1 byte, forcing readFully in the main code to loop
  • readFully delegates back through the throttled read() path, so the helper correctly validates the looping contract
  • Bounds-checking and EOFException on stream end are correct

The test confirms failure before the fix and passes after - that is the ideal regression test structure.


Issue - please remove the committed docs file

docs/4329-javabinaryserializer-readfully.md should not be merged into the repository:

  • It tracks ephemeral task state (Status: Done) which belongs in the PR/issue tracker, not the codebase
  • It is scoped to a transient concern (a single bug fix) and will be stale/confusing for future contributors
  • The useful information it contains (root cause, fix rationale) is already captured in the PR description and the regression test itself

Please drop this file from the PR before merging.


Minor observation (no action required)

The readUTF() override in ThrottledObjectInput delegates directly to the wrapped stream without throttling. That is fine here since the bug was specifically in the byte-array read path. Noting it so it is not mistaken for a coverage gap.


Verdict: Approve pending removal of docs/4329-javabinaryserializer-readfully.md. The one-line fix and the regression test are both solid.

@claude

claude Bot commented May 26, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR fixes a real correctness bug: ObjectInput.read(byte[]) is permitted by contract to return fewer bytes than requested (short-read), while readFully(byte[]) loops until the buffer is completely filled or throws EOFException. The one-line fix in JavaBinarySerializer.java is correct and necessary.

The Fix - Correct

The change at JavaBinarySerializer.java:129 is exactly right. DataInput.readFully() is the contract-safe API whenever you need a full buffer. The bug was silent data corruption: on a short read, the remaining bytes were silently skipped, misaligning every subsequent property in the stream. No concerns with the fix itself.

Test - Well Designed

The regression test in JavaBinarySerializerTest is solid:

  • ThrottledObjectInput correctly throttles read(byte[], int, int) to return at most 1 byte, forcing the path that caught the original bug
  • readFully() in the throttle wrapper correctly loops through the throttled read() calls, so the test would fail with the old in.read(array) code and pass with in.readFully(array)
  • Bounds checks added to both read(byte[], int, int) and readFully(byte[], int, int) are appropriate
  • The test uses multiple property types (long, string, double) which increases coverage
  • EOF handling in readFully is correct - throws EOFException when read() returns -1

Minor nit - test assertion: The new test only asserts doc2.toMap().isEqualTo(doc1.toMap()), while existing tests also assert doc2.isEqualTo(doc1). The map check is sufficient for validating the fix.

Minor nit - comments: The multi-line Javadoc blocks are more verbose than the project style typically uses (existing tests in this file have no comments). Not a blocker.

Issue - The docs/ file should be removed

docs/4329-javabinaryserializer-readfully.md should not be part of this PR. It is an internal AI review cycle log, with content like review cycle notes referencing gemini-code-assist and "clean-approval" status. This is scaffolding and implementation notes, not user-facing documentation. Committing it adds noise to the repo history and exposes internal tooling details. The PR description and commit message already capture the relevant context. Please remove this file before merging.

Write-side audit (out of scope for this PR, but worth noting)

writeExternal at line 85 uses out.write(buffer.getContent(), 0, buffer.size()). Unlike read(), OutputStream.write(byte[], int, int) is documented to write all len bytes, so there is no symmetric write-side bug. The asymmetry in the Java I/O contract is intentional.

Summary

Area Status
Fix correctness Correct
Test quality Good
Regression coverage Yes - confirmed fail-before/pass-after
Blocker Remove docs/4329-javabinaryserializer-readfully.md
Ready to merge after Removing the docs file

@robfrank robfrank force-pushed the fix/4329-javabinaryserializer-readfully branch from abe6db8 to 0eb8448 Compare May 27, 2026 06:38
@claude

claude Bot commented May 27, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR fixes a genuine correctness bug: JavaBinarySerializer.readExternal was calling in.read(array) and ignoring the return value. The ObjectInput.read(byte[]) contract allows short reads - fewer bytes than requested can be returned. On compressed or network-backed streams this is possible, silently corrupting subsequent property reads by misaligning the stream. The fix is a one-character change to in.readFully(array), which loops internally until the buffer is completely filled or throws EOFException.

The fix is correct and well-targeted.


Fix (JavaBinarySerializer.java)

No issues. The change is minimal and exactly right. DataInput.readFully(byte[]) is the contract-safe API here. The original in.read(array) was a latent correctness bug that would only surface on non-heap-backed ObjectInput implementations.


Test (JavaBinarySerializerTest.java)

The regression approach is solid: wrap the deserialization stream in ThrottledObjectInput (1 byte per read(byte[], off, len) call), write a document, round-trip it through the throttled reader, and assert map equality. The test correctly fails before the fix (stream misalignment) and passes after.

A few observations:

Multi-line Javadoc comments - The new test method and ThrottledObjectInput class both carry multi-paragraph Javadoc blocks. Per project convention (CLAUDE.md): Never write multi-paragraph docstrings or multi-line comment blocks - one short line max. These should be trimmed to a single descriptive line, or removed entirely since the method and class names are self-explanatory.

readFully loops through the throttled path - The ThrottledObjectInput.readFully implementation is intentionally self-contained (it calls this.read()) rather than delegating to delegate.readFully(). This is correct - it forces the caller (the fixed JavaBinarySerializer.readExternal) to exercise the loop. If it delegated to delegate.readFully the test would pass even without the fix, since the underlying stream is already fully capable. Good design choice.

Bounds check - In ThrottledObjectInput.readFully(byte[], int, int), the check len > b.length - off works correctly for all valid inputs since off is already validated to be non-negative. No issue.

Test only covers Document - Vertex and Edge deserialization also calls readExternal through the same shared code path, so the single Document test is sufficient to verify the fix. The existing vertex/edge tests in the suite provide independent coverage.

Missing assertThat(arrayOut.size() > 0) guard - The new test omits this check that is present in the other test methods. Minor inconsistency.


Documentation (docs/4329-javabinaryserializer-readfully.md)

This follows the existing pattern of per-issue fix docs in the docs/ directory. The "Review cycles" section is an implementation artifact from the automated review workflow, consistent with other docs in this directory.

One minor note: the file uses em dashes in several places. Per CLAUDE.md convention, these should be replaced with regular dashes (-) or rephrased.


Summary

Area Assessment
Fix correctness Correct - readFully is the right API
Test approach Sound regression strategy
Test style Multi-line Javadoc comments violate project convention
Docs file Follows existing pattern; minor em-dash usage
Security No concerns
Performance No concerns - readFully is equivalent on normal streams

Actionable items before merge:

  1. Trim the multi-line Javadoc blocks on readExternalSurvivesShortReadInput and ThrottledObjectInput to single-line comments or remove them entirely.
  2. Replace em dashes with regular dashes in docs/4329-javabinaryserializer-readfully.md.

@codecov

codecov Bot commented May 27, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.25%. Comparing base (61177a8) to head (0eb8448).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4348      +/-   ##
============================================
- Coverage     64.72%   64.25%   -0.47%     
+ Complexity      446      423      -23     
============================================
  Files          1647     1647              
  Lines        127719   127719              
  Branches      27375    27375              
============================================
- Hits          82660    82064     -596     
- Misses        33384    34117     +733     
+ Partials      11675    11538     -137     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@robfrank robfrank merged commit cda1ce0 into main May 27, 2026
26 of 30 checks passed
@robfrank robfrank added this to the 26.6.1 milestone May 27, 2026
@robfrank robfrank self-assigned this May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JavaBinarySerializer.readExternal ignores ObjectInput.read return — partial reads corrupt records

1 participant