Skip to content

fix(#4336): use Long.compareUnsigned to detect zigzag overflow for Long.MAX/MIN_VALUE#4361

Merged
robfrank merged 3 commits into
mainfrom
fix/4336-simple8b-zigzag-validation
May 27, 2026
Merged

fix(#4336): use Long.compareUnsigned to detect zigzag overflow for Long.MAX/MIN_VALUE#4361
robfrank merged 3 commits into
mainfrom
fix/4336-simple8b-zigzag-validation

Conversation

@robfrank

Copy link
Copy Markdown
Collaborator

Closes #4336

Summary

Simple8bCodec.encode() silently truncated values near Long.MAX_VALUE and Long.MIN_VALUE instead of throwing IllegalArgumentException. The root cause: zigzagEncode(Long.MAX_VALUE) returns -2 and zigzagEncode(Long.MIN_VALUE) returns -1. Both are negative in signed arithmetic, so the validation guard encoded > MAX_ZIGZAG_VALUE (MAX_ZIGZAG_VALUE = (1L<<60)-1 is a large positive) was always false for these inputs — bypassing validation entirely. The encoder then masked with (1L<<60)-1, silently dropping the top 4 bits and storing a wrong value. The fix replaces the signed comparison with Long.compareUnsigned(encoded, MAX_ZIGZAG_VALUE) > 0, which correctly treats the bit-patterns of the negative encoded values as large unsigned integers exceeding the 60-bit limit.

Test plan

  • Simple8bCodecTest.longMaxValueThrowsLong.MAX_VALUE now correctly throws IllegalArgumentException (was silently corrupted)
  • Simple8bCodecTest.longMinValueThrowsLong.MIN_VALUE now correctly throws IllegalArgumentException (was silently corrupted)
  • Simple8bCodecTest.nearLongMaxValueThrowsLong.MAX_VALUE - 1 also correctly rejected (zigzag-encoded bit-pattern also exceeds 60 bits unsigned)
  • All 16 pre-existing Simple8bCodecTest tests still pass, including boundary acceptance (-(2^59) still accepted as it zigzag-encodes to exactly MAX_ZIGZAG_VALUE)
  • Full timeseries codec and time-series suite (240 tests) passes with zero regressions

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request fixes a bug in Simple8bCodec.encode() where extreme values like Long.MAX_VALUE and Long.MIN_VALUE bypassed validation due to signed comparison of zigzag-encoded values, leading to silent truncation. The fix replaces the signed comparison with Long.compareUnsigned. Tests and documentation were added to verify the fix. The reviewer pointed out a minor discrepancy in the documentation file, which references a test name (extremeNegativeBoundaryStillThrows()) that does not match the actual test name added in the code (nearLongMaxValueThrows()).

Comment thread docs/4336-simple8b-zigzag-validation.md Outdated
### `Simple8bCodecTest.java`
- Added `longMaxValueThrows()` — `Long.MAX_VALUE` must throw `IllegalArgumentException`
- Added `longMinValueThrows()` — `Long.MIN_VALUE` must throw `IllegalArgumentException`
- Added `extremeNegativeBoundaryStillThrows()` — values just outside range from below still throw

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation mentions extremeNegativeBoundaryStillThrows() as being added, but the actual test added in Simple8bCodecTest.java is named nearLongMaxValueThrows(). Please update this line to match the actual test name.

Suggested change
- Added `extremeNegativeBoundaryStillThrows()` — values just outside range from below still throw
- Added nearLongMaxValueThrows() — values just outside range from above still throw

@codacy-production

codacy-production Bot commented May 27, 2026

Copy link
Copy Markdown

Up to standards ✅

🟢 Issues 0 issues

Results:
0 new issues

View in Codacy

🟢 Coverage 100.00% diff coverage · -7.75% coverage variation

Metric Results
Coverage variation -7.75% coverage variation
Diff coverage 100.00% diff coverage

View coverage diff in Codacy

Coverage variation details
Coverable lines Covered lines Coverage
Common ancestor commit (f2a41f8) 127753 94277 73.80%
Head commit (8c31704) 159431 (+31678) 105295 (+11018) 66.04% (-7.75%)

Coverage variation is the difference between the coverage for the head and common ancestor commits of the pull request branch: <coverage of head commit> - <coverage of common ancestor commit>

Diff coverage details
Coverable lines Covered lines Diff coverage
Pull request (#4361) 1 1 100.00%

Diff coverage is the percentage of lines that are covered by tests out of the coverable lines that the pull request added or modified: <covered lines added or modified>/<coverable lines added or modified> * 100%

NEW Get contextual insights on your PRs based on Codacy's metrics, along with PR and Jira context, without leaving GitHub. Enable AI reviewer
TIP This summary will be updated as you push new changes.

@claude

claude Bot commented May 27, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR fixes a real data-corruption bug in Simple8bCodec.encode(): zigzagEncode(Long.MAX_VALUE) returns -2 and zigzagEncode(Long.MIN_VALUE) returns -1, both negative in signed arithmetic. The old guard encoded > MAX_ZIGZAG_VALUE was always false for these inputs (large positive vs. negative), silently bypassing validation and masking away the top 4 bits. The one-line fix is correct and the root cause analysis is well documented.


Fix Correctness - Looks Good

The Long.compareUnsigned approach is correct. An equivalent alternative worth noting is:

if (encoded < 0 || encoded > MAX_ZIGZAG_VALUE)

This also works because valid zigzag-encoded values (for inputs within the supported range) are always non-negative — a negative encoded value is itself proof that the original value was out of range. That said, Long.compareUnsigned is equally correct, arguably more explicit about the bit-pattern semantics, and consistent with the variable name. No objection either way.


Tests - Good Coverage, Minor Wording Nit

Three focused regression tests are added and they cover the right cases. One minor comment:

The Javadoc on nearLongMaxValueThrows() says "values just inside the extreme" which is ambiguous - a reader could parse "just inside the extreme" as "just inside the valid range" (i.e., a boundary value that should be accepted). The intent is "just below Long.MAX_VALUE". A small rewording like "values just below Long.MAX_VALUE" would be clearer.

There is also a minor inconsistency between the PR description's Changes section (which names the test extremeNegativeBoundaryStillThrows) and the actual test method name (nearLongMaxValueThrows). Not a code issue, but worth correcting in the PR body for traceability.


Production Impact - Worth Calling Out Explicitly

The PR body mentions this in the impact analysis section, but it is worth reiterating: nanosecond timestamps in the range (2^59, Long.MAX_VALUE] (i.e., > ~576 years from epoch) were silently corrupted before this fix and will now throw IllegalArgumentException. This is the correct behavior (fail-fast beats silent corruption), but any caller that passes raw nanosecond timestamps without delta-encoding should be aware that this guard may now surface in practice if deltas are large. If there is a higher-level codec layer (e.g. DeltaOfDeltaCodec) that feeds into Simple8bCodec, confirming that its output deltas stay within [-(2^59), (2^59)-1] would be a good follow-up.


No Issues

  • Single-line change, minimal blast radius.
  • Long.compareUnsigned is available since Java 8, well within the Java 21 baseline.
  • The docs/ pattern (per-issue markdown files) matches other files already in that directory.
  • No new dependencies introduced.
  • Existing 16 tests still pass per the PR description.

Summary: The fix is correct and the test coverage is solid. The only actionable items are the wording nit in the nearLongMaxValueThrows comment and the test-name mismatch in the PR body - both minor. Good work tracking down the unsigned/signed comparison trap.

@claude

claude Bot commented May 27, 2026

Copy link
Copy Markdown

Code Review

Overall Assessment

This is a correct, minimal, and well-reasoned bug fix. The root cause analysis is accurate and the solution is the right tool for the job. A few points worth discussing.


The Fix (Simple8bCodec.java) - Approved

The one-line change is correct. The arithmetic proof in the PR description is solid:

  • zigzagEncode(Long.MAX_VALUE) = 0xFFFFFFFFFFFFFFFE (negative in signed, huge in unsigned)
  • zigzagEncode(Long.MIN_VALUE) = 0xFFFFFFFFFFFFFFFF (negative in signed, huge in unsigned)

Both bypass the old signed guard, and Long.compareUnsigned correctly catches them.


Tests - Good coverage with a minor style note

The three new regression tests (longMaxValueThrows, longMinValueThrows, nearLongMaxValueThrows) are well-targeted and verify the exact failure modes described in the issue.

Minor: The multi-line Javadoc comments in the test methods repeat information already captured in the method names and the PR description. One short line per method would be sufficient per project conventions. Not blocking.

Suggestion: Consider adding Long.MIN_VALUE + 1 as a fourth test case. Its zigzag encoding (0xFFFFFFFFFFFFFFFD) also falls in the unsigned-overflow zone and was equally silently corrupted before this fix. It would complete the boundary sweep.


Documentation file (docs/4336-simple8b-zigzag-validation.md) - Recommend dropping

A per-issue markdown file in docs/ tends to:

  1. Accumulate over time without a clear ownership or cleanup path.
  2. Go stale faster than the code it describes.
  3. Duplicate information that already lives in the PR description, commit message, and test Javadoc.

The most durable place for this explanation is as a comment on the comparison line in encode() itself:

// Unsigned comparison required: zigzagEncode(Long.MAX/MIN_VALUE) returns negative signed values
// that look smaller than MAX_ZIGZAG_VALUE in signed arithmetic but exceed it unsigned.
if (Long.compareUnsigned(encoded, MAX_ZIGZAG_VALUE) > 0)

That makes the non-obvious invariant visible to any future reader right at the call site, which is more useful than a detached markdown file.


Potential follow-up (not blocking this PR)

Any data already written to disk before this fix with Long.MAX_VALUE or Long.MIN_VALUE (silently corrupted to a masked value) will decode to a wrong result silently. This PR correctly changes the write path to fail-fast going forward, but there is no migration or detection path for pre-existing corrupt data. Worth a note in the issue or release notes so operators are aware.


Summary

The core fix is correct and the tests are solid. I would recommend dropping the documentation file (or folding its key insight into a single inline comment on the comparison), and optionally adding Long.MIN_VALUE + 1 to the test suite. Everything else is good to go.

@claude

claude Bot commented May 27, 2026

Copy link
Copy Markdown

Code Review

Overview

This PR fixes a real data-corruption bug in Simple8bCodec.encode(). The root cause is a signed vs unsigned comparison issue: zigzagEncode(Long.MAX_VALUE) returns -2 and zigzagEncode(Long.MIN_VALUE) returns -1, both of which are negative in signed arithmetic, so the old guard encoded > MAX_ZIGZAG_VALUE (where MAX_ZIGZAG_VALUE is a large positive) was always false for these inputs. The fix replacing it with Long.compareUnsigned(encoded, MAX_ZIGZAG_VALUE) > 0 is mathematically correct and is the minimal change needed.


The Fix (Simple8bCodec.java)

The one-line change is correct. Manual verification:

  • zigzagEncode(Long.MAX_VALUE) = 0xFFFFFFFFFFFFFFFE - unsigned value far exceeds MAX_ZIGZAG_VALUE = 0x0FFFFFFFFFFFFFFF - correctly rejected
  • zigzagEncode(Long.MIN_VALUE) = 0xFFFFFFFFFFFFFFFF - same, correctly rejected
  • Boundary -(2^59) still zigzag-encodes to exactly MAX_ZIGZAG_VALUE - compareUnsigned returns 0, still accepted

No issues with the production code change.


Test Coverage (Simple8bCodecTest.java)

The three new tests are sound and consistent with the existing test style (assertThatThrownBy + isInstanceOf + hasMessageContaining). Good.

One minor style note from the project's CLAUDE.md:

"Never write multi-paragraph docstrings or multi-line comment blocks - one short line max."

The three new test methods each carry a 3-4 line Javadoc block. These should be trimmed to a single-line comment, e.g.:

// Regression: zigzagEncode(Long.MAX_VALUE) = -2 (negative signed) bypassed the old > guard.
@Test
void longMaxValueThrows() { ... }

Same for longMinValueThrows and nearLongMaxValueThrows.


Documentation File (docs/4336-simple8b-zigzag-validation.md) - Needs Removal

This file should not be committed to the repository. It contains:

  • Internal review-cycle notes ("Review Cycles" / "Cycle 1", "Cycle 2")
  • A self-referencing PR link
  • Bot review feedback log

This is working-document content that belongs in the PR description (where it already lives verbatim) and not in the tracked codebase. Please drop this file before merging.


Summary

Item Status
One-line fix in Simple8bCodec.java Correct - approve as-is
Three new regression tests Good - minor style cleanup on Javadoc length
docs/4336-simple8b-zigzag-validation.md Should be removed - internal review log, not repo content

The core fix is solid. Please remove the docs file and trim the multi-line Javadoc blocks to single lines, then this is ready to merge.

@codecov

codecov Bot commented May 27, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 64.36%. Comparing base (f2a41f8) to head (8c31704).

Additional details and impacted files
@@             Coverage Diff              @@
##               main    #4361      +/-   ##
============================================
- Coverage     64.65%   64.36%   -0.30%     
- Complexity      423      455      +32     
============================================
  Files          1647     1647              
  Lines        127753   127753              
  Branches      27382    27382              
============================================
- Hits          82599    82224     -375     
- Misses        33507    33958     +451     
+ Partials      11647    11571      -76     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@robfrank robfrank merged commit f8f7a75 into main May 27, 2026
26 of 30 checks passed
@robfrank robfrank deleted the fix/4336-simple8b-zigzag-validation branch May 27, 2026 09:46
@lvca lvca added this to the 26.6.1 milestone May 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Simple-8b codec validation bypassed for Long.MAX_VALUE / Long.MIN_VALUE — silent value truncation

2 participants