Skip to content

Add JsonReadFeature.ALLOW_HEXADECIMAL_NUMBERS for JSON5-style hex literals (#707)#1613

Merged
cowtowncoder merged 14 commits into
FasterXML:3.xfrom
seonwooj0810:feat/707-allow-hex-numbers
May 28, 2026
Merged

Add JsonReadFeature.ALLOW_HEXADECIMAL_NUMBERS for JSON5-style hex literals (#707)#1613
cowtowncoder merged 14 commits into
FasterXML:3.xfrom
seonwooj0810:feat/707-allow-hex-numbers

Conversation

@seonwooj0810

@seonwooj0810 seonwooj0810 commented May 26, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements JSON5 hexadecimal integer literals (0x / 0X prefix, optional sign, [0-9a-fA-F]+) across all four JSON parsers, gated behind a new opt-in JsonReadFeature.ALLOW_HEXADECIMAL_NUMBERS. Closes #707 (hex part; whitespace remains).

Per @pjfanning's review:

  • JSON5 grammar followed verbatim0x|0X + [0-9a-fA-F]+, optional leading sign, no fraction / no exponent / no _ separators.
  • Independent of ALLOW_LEADING_ZEROS_FOR_NUMBERS — leading zeros within hex digits (e.g. 0x007F) are always permitted when this feature is enabled, regardless of the leading-zero feature. The hex check fires before _verifyNoLeadingZeroes() in every parser, and the leading-zero feature is never consulted on the hex path. Covered by unsignedHexWithLeadingZeros.

The + variant additionally requires ALLOW_LEADING_PLUS_SIGN_FOR_NUMBERS (matching how + is treated for decimals, per #774). Covered by plusSignHexRequiresLeadingPlusFeature and plusSignHexWithPlusFeature.

Implementation per parser:

Parser Strategy
ReaderBasedJsonParser Hex branch in _parseNumber2, before _verifyNoLeadingZeroes.
UTF8StreamJsonParser Same pattern, byte-oriented; peek before _verifyNoLeadingZeroes in both _parseUnsignedNumber and _parseSignedNumber.
UTF8DataInputJsonParser Intercept the existing 'x'/'X' arm of _handleLeadingZeroes (no peek needed — DataInput already consumed it).
NonBlockingUtf8JsonParserBase New minor states MINOR_NUMBER_HEX_PREFIX and MINOR_NUMBER_HEX_DIGITS so suspension works at every byte boundary; helpers _startHexNumber / _startHexNumberWithSign / _finishHexDigits drive the state machine.

Shared decoding lives in JsonParserBase._parseHexInt:

  • int for ≤ 7 hex digits (always fits in signed 32-bit),
  • long for ≤ 16 hex digits with a top-nibble check at 16,
  • BigInteger otherwise — eagerly decoded because the lazy _numberString path expects base-10.

getString() returns the original literal (sign + 0x/0X + digits), matching the precedent set for + by #784. getIntValue() / getLongValue() / getBigIntegerValue() return the decoded value.

The bookkeeping additions on JsonParserBase are limited:

  • _numberIsHex field (cleared by resetInt / resetFloat / resetAsNaN),
  • resetIntHex(neg, hexDigitLen) helper alongside resetInt.

I put these on JsonParserBase only (since only JSON consumes it today).

Test plan

  • mvn clean verify — 1697 tests, 0 failures, 0 regressions.
  • NonStandardHexNumbers707Test (12 cases) — unsigned, uppercase X, leading zeros, negative, + with/without plus-feature, long range (0x80000000), BigInteger range (0x1ffffffffffffffff), in-array, feature-off rejection, empty-prefix rejection. Runs across ALL_MODES (Reader, UTF8 stream, throttled, DataInput).
  • AsyncHexNumbers707Test (10 cases) — drives the non-blocking parser with 1-byte and 3-byte feeds to exercise suspension/resumption at every position; covers the same shapes plus a plainZeroStillWorks smoke test to verify the bare-0 path didn't regress.

@seonwooj0810 seonwooj0810 force-pushed the feat/707-allow-hex-numbers branch from 92f36fa to 0e27cba Compare May 26, 2026 12:00
Comment thread src/main/java/tools/jackson/core/json/JsonParserBase.java Outdated
seonwooj0810 added a commit to seonwooj0810/jackson-core that referenced this pull request May 26, 2026
…erXML#707)

Address review feedback on FasterXML#1613: parse the buffered hex digits as a
BigInteger via a new NumberInput.parseBigIntegerWithRadix(char[], ...)
overload that hands the slice straight to FastDoubleParser, eliminating
the intermediate String allocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
seonwooj0810 added a commit to seonwooj0810/jackson-core that referenced this pull request May 26, 2026
…erXML#707)

Address review feedback on FasterXML#1613: parse the buffered hex digits as a
BigInteger via a new NumberInput.parseBigIntegerWithRadix(char[], ...)
overload that hands the slice straight to FastDoubleParser, eliminating
the intermediate String allocation.
@seonwooj0810 seonwooj0810 force-pushed the feat/707-allow-hex-numbers branch from ba2c3ea to 45e02b3 Compare May 26, 2026 13:48
@seonwooj0810

Copy link
Copy Markdown
Contributor Author

@pjfanning Addressed your review feedback in 45e02b3 (moved BigInteger parsing into NumberInput.parseBigIntegerWithRadix(char[], ...) and pass the char slice directly to FastDoubleParser, avoiding the intermediate String). Could you take another look when you have a moment?

Comment thread src/test/java/tools/jackson/core/unittest/io/NumberInputTest.java Outdated
@cowtowncoder

Copy link
Copy Markdown
Member

Hmmmh. This is HUGE. Not a big fan based on size alone... :-(

Will give some more thought.

@seonwooj0810

Copy link
Copy Markdown
Contributor Author

Quick note on the "fail when feature is not enabled" request — that scenario was actually already covered from the initial commit by hexRejectedWhenFeatureDisabled in both NonStandardHexNumbers707Test and AsyncHexNumbers707Test (each runs through all the hex prefix shapes with ALLOW_HEXADECIMAL_NUMBERS disabled and asserts they fail). I should have called that out explicitly in my reply — apologies for the noise.

Also pushed 4e8e060 to add invalid-hex digit coverage (e.g. 0xZZ) to the new bigIntegerWithRadixFromCharArray* tests, in case that was your intent rather than feature-disabled. Happy to add more if I'm still missing something.

@seonwooj0810

Copy link
Copy Markdown
Contributor Author

@cowtowncoder Totally hear you on the size — it's something I've been chewing on too. Let me break down what's in the +940 to make it easier to evaluate, and offer a couple of options.

Where the lines actually go:

Bucket Lines What it is
Tests ~383 (40%) NonStandardHexNumbers707Test, AsyncHexNumbers707Test, plus the NumberInput / BigIntegerParser char[]-overload coverage @pjfanning asked for
Feature definition + utils ~70 (7%) JsonReadFeature constant, NumberInput.parseBigIntegerWithRadix(char[],...), BigIntegerParser.parseWithFastParser(char[],...)
Parser plumbing ~484 (51%) Hex parsing wired into 7 parser variants

Why parser plumbing is 51%: the feature has to land in each concrete parser independently because the input sources (Reader, byte stream, DataInput, async buffer) don't share a unified read primitive. I tried to keep as much as possible in JsonParserBase (+89 for shared state and entry points), but the per-variant additions are doing input-source-specific work — Reader uses _loadMore(), DataInput uses readUnsignedByte() with IOException, async needs MINOR state restart points, etc.

The one outlier is NonBlockingUtf8JsonParserBase at +123. That's not duplication — async needs MINOR_NUMBER_HEX_PREFIX/MINOR_NUMBER_HEX_DIGITS states and the _finishHexDigits(requireFirst) / _completeHexNumber() restart-able split that sync parsers don't need. Without it, hex parsing would silently break on partial buffers.

What I can trim (good faith, ~15–20 lines): _isHexDigit is defined four times across parsers, and the "prefix must be followed by at least one hex digit" message is duplicated. I can hoist the digit check to ParserBase (with a char→int widening for ReaderBasedJsonParser) and extract the message as a constant. Happy to do this if you'd like — it's a small win at the cost of adding one method to the base class.

Suggested review path if it helps to chunk it:

  1. JsonReadFeature.java — feature intent (1 constant)
  2. NumberInput.java + BigIntegerParser.java — the new char[] overloads (used by all parsers)
  3. JsonParserBase.java — shared state / entry points
  4. One of the concrete parsers (ReaderBasedJsonParser.java is the simplest) — to understand the per-variant pattern
  5. The remaining parsers are mechanical applications of the same pattern, just adapted to their input source

Alternative if size is still the blocker: I could split into a prelim PR that lands just the utils (NumberInput/BigIntegerParser char[] overloads + JsonReadFeature constant) ahead of this one. It wouldn't change net diff but would give you two ~half-size reviews instead of one large one. Let me know if that's worth doing or if you'd rather keep it as one atomic change.

Whichever direction you'd like to take, let me know and I'll make it happen.

Comment thread src/main/java/tools/jackson/core/base/ParserBase.java Outdated
Comment thread src/main/java/tools/jackson/core/json/async/NonBlockingUtf8JsonParserBase.java Outdated
Comment thread src/main/java/tools/jackson/core/json/UTF8DataInputJsonParser.java Outdated
seonwooj0810 added a commit to seonwooj0810/jackson-core that referenced this pull request May 27, 2026
…asterXML#707)

Addresses two review comments on PR FasterXML#1613:

  - ParserBase.java:183 (cowtowncoder): hex literal handling is JSON-only,
    not shared by binary/text format backends that also extend ParserBase.
  - NonBlockingUtf8JsonParserBase.java:1810 (cowtowncoder): the per-parser
    _isHexDigit helper should live in a shared base for reuse.

Promote JSON-only hex state to JsonParserBase
---------------------------------------------
- _numberIsHex field and resetIntHex(...) move from ParserBase to
  JsonParserBase. ParserBase no longer references hex at all.
- ParserBase.resetInt loses `final` so JsonParserBase can override it to
  clear _numberIsHex. resetFloat/resetAsNaN no longer touch the flag:
  _numberIsHex is only read in the VALUE_NUMBER_INT branch of
  _parseNumericValue/_parseIntValue, both reached only via resetInt or
  resetIntHex (both of which now correctly initialize the flag), so
  leaving it stale after a float/NaN reset is benign.

Hoist duplicated per-parser helpers to JsonParserBase
-----------------------------------------------------
- _isHexDigit(int) was duplicated four times across ReaderBasedJsonParser,
  UTF8StreamJsonParser, UTF8DataInputJsonParser, and
  NonBlockingUtf8JsonParserBase. Replaced with a single protected static
  on JsonParserBase. Char-stream parser's existing `_isHexDigit(char)`
  call site keeps working via implicit char->int widening.
- "Hexadecimal number prefix '0X' must be followed by at least one hex
  digit (...)" error message was duplicated at the same four call sites.
  Extracted as protected static _hexPrefixNotFollowedMessage(char) on
  JsonParserBase.

Net effect: -76 / +79 lines, identical externally observable behavior.
Verified by full test suite (1702 pass, 2 pre-existing skips), with
particular attention to NonStandardHexNumbers707Test and
AsyncHexNumbers707Test.
seonwooj0810 added a commit to seonwooj0810/jackson-core that referenced this pull request May 27, 2026
)

Addresses cowtowncoder's follow-up suggestion on PR FasterXML#1613:

  UTF8DataInputJsonParser.java:1218 (cowtowncoder):
  "Or better yet, check out what CharTypes.java already has;
   maybe charToHex() would be useful (gives -1 for non-hex digit)"

CharTypes.charToHex(int) is a single masked array lookup that returns
the digit value (0..15) for valid hex characters and -1 otherwise. That
single primitive subsumes both helpers introduced for this feature:

  - _isHexDigit(int)  -> CharTypes.charToHex(c) >= 0
  - _hexDigit(char)   -> CharTypes.charToHex(c)

Both are removed; the four concrete parser scan loops and
JsonParserBase._parseHexInt now go through CharTypes.charToHex directly,
matching the pattern already used elsewhere in the JSON parsers for
\\uXXXX escape decoding.

Verified by full test suite (1702 pass, 2 pre-existing skips).
@seonwooj0810

Copy link
Copy Markdown
Contributor Author

@cowtowncoder Pushed a4bab461 and 52e7a897 to address your three inline comments:

a4bab461 — Move hex parser state to JsonParserBase, dedupe per-parser helpers

  • ParserBase.java:183: _numberIsHex field and resetIntHex(...) promoted to JsonParserBase. ParserBase no longer references hex at all. resetInt loses final so JsonParserBase can override it to clear _numberIsHex; resetFloat/resetAsNaN keep final (the flag is only read in the VALUE_NUMBER_INT branch, both reached only via resetInt/resetIntHex, so a stale value after a float reset is benign).
  • NonBlockingUtf8JsonParserBase.java:1810: the four duplicated _isHexDigit definitions consolidated as one protected static on JsonParserBase (then superseded entirely by the next commit — see below).
  • The four duplicated hex-prefix error messages extracted to JsonParserBase._hexPrefixNotFollowedMessage(char).

52e7a897 — Use CharTypes.charToHex for JSON5 hex literal scan/decode

  • UTF8DataInputJsonParser.java:1218: great suggestion — CharTypes.charToHex(int) is a single masked array lookup that returns the digit value (0..15) or -1, so it subsumes both helpers introduced for this feature:
    • _isHexDigit(int)CharTypes.charToHex(c) >= 0
    • _hexDigit(char)CharTypes.charToHex(c) (no separate decode needed)
  • Both helpers removed; the four concrete parser scan loops and JsonParserBase._parseHexInt now go through CharTypes.charToHex directly, matching the pattern already used elsewhere in these parsers for \uXXXX escape decoding.

Net diff is now +923 / -4 across 16 files (was +940 / -3). Modest line reduction but materially cleaner structure: zero hex references in cross-format ParserBase, zero per-parser duplication, and the digit-validation/digit-decode pair fused into one well-established primitive.

Full test suite: 1702 pass / 2 pre-existing skips, including NonStandardHexNumbers707Test (12 tests) and AsyncHexNumbers707Test (10 tests). Happy to keep iterating.

Comment thread src/main/java/tools/jackson/core/base/ParserBase.java
Comment thread src/main/java/tools/jackson/core/io/BigIntegerParser.java
Comment thread src/main/java/tools/jackson/core/json/JsonParserBase.java
Comment thread src/main/java/tools/jackson/core/json/JsonParserBase.java
// valid in hex regardless of ALLOW_LEADING_ZEROS_FOR_NUMBERS.
if (_inputPtr < _inputEnd || _loadMore()) {
char peek = _inputBuffer[_inputPtr];
if ((peek == 'x' || peek == 'X')

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Idea: add a helper method ("_checkHexNumbersAllowed()" etc) in JsonParserBase called after "0x" or "0X" seen, passing next character -- if feature NOT enabled, can give specific error message including feature to enable to allow hexadecimal numbers: either returns (indicating feature enabled) or throws exception (is not).

Reduces duplication slightly but more importantly gives better exception in case feature disabled.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 4e5945f. Added JsonParserBase._checkHexNumbersAllowed(int) that throws StreamReadException naming the feature when JsonReadFeature.ALLOW_HEXADECIMAL_NUMBERS is disabled. All eight call sites across the four JSON parsers (Reader, UTF8 stream, UTF8 DataInput, non-blocking UTF8) now go through it, eliminating duplicated isEnabled(...) checks. Existing feature-disabled tests tightened to assert the error message includes the feature name.

@cowtowncoder

Copy link
Copy Markdown
Member

@seonwooj0810 wrt:

Drive-by fix (please call out if you'd prefer this split)

yes, please, that'd be worth fixing separately, can merge quickly.

@cowtowncoder cowtowncoder added the cla-needed PR looks good (although may also require code review), but CLA needed from submitter label May 27, 2026
@cowtowncoder

Copy link
Copy Markdown
Member

@seonwooj0810 Ok, aside from review (with which are making good progress), one process thing: unless we have asked for it before (and got it), we'd need CLA. It's from:

https://github.com/FasterXML/jackson/blob/main/CLA-jackson-2026.pdf

and is usually done by printing, filling & signing, scan/photo, email to cla at fasterxml dot com.

Only needs to be sent once and is good for all future contributions.

seonwooj0810 added a commit to seonwooj0810/jackson-core that referenced this pull request May 27, 2026
Two small review-feedback nits from FasterXML#1613:

- ParserBase.resetInt was made non-final earlier in this branch so that
  JsonParserBase can override it to clear the hex flag. Add a comment
  noting the prior `final` modifier and why it was relaxed.
- BigIntegerParser.parseWithFastParser(char[],int,int,int) is new in 3.2;
  add the `@since 3.2` Javadoc tag to match the existing convention.
seonwooj0810 added a commit to seonwooj0810/jackson-core that referenced this pull request May 27, 2026
…error

Per @cowtowncoder review on FasterXML#1613: when a number literal starts with `0x` /
`0X` but `JsonReadFeature.ALLOW_HEXADECIMAL_NUMBERS` is not enabled, the
user previously got a generic "unexpected character" error that did not
mention the feature to enable. Centralize the feature check in a single
`JsonParserBase._checkHexNumbersAllowed(int)` helper that throws a
`StreamReadException` naming the feature, so the diagnostic is actionable:

    Unexpected character ('x') (code 120): hexadecimal number literals
    require enabling `JsonReadFeature.ALLOW_HEXADECIMAL_NUMBERS`

All eight call sites across the four JSON parsers (Reader, UTF8 stream,
UTF8 DataInput, non-blocking UTF8) now go through the helper, eliminating
the duplicated `isEnabled(ALLOW_HEXADECIMAL_NUMBERS)` check at every
parser-specific entry point.

Existing feature-disabled tests are tightened to assert that the new error
message includes the feature name in addition to the original "Unexpected
character ('x'" prefix.
seonwooj0810 added a commit to seonwooj0810/jackson-core that referenced this pull request May 27, 2026
Per @cowtowncoder review on FasterXML#1613: previously the maximum-integer-length
constraint was applied only when `resetIntHex` was called -- i.e. after
the full hex literal had been buffered. A pathological input with millions
of digits would therefore allocate millions of chars before being rejected.

Move the validation into the hex digit accumulation loop so it fires at
every `_textBuffer` segment boundary (default first segment is 500 chars).
With the default `maxNumberLength=1000`, a 10-million-digit hex literal is
now caught at ~4 KiB rather than after fully buffering.

Add `hexNumberLengthConstraint` to NonStandardHexNumbers707Test, exercising
an 8000-digit literal with `maxNumberLength=100` against all sync modes;
each backend must throw `StreamConstraintsException` with the standard
"exceeds the maximum" message. The async path is exercised by the same
validation primitive in `_finishHexDigits`.
@seonwooj0810

Copy link
Copy Markdown
Contributor Author

@cowtowncoder Pushed commits 8e5a4b07, c759815d, 0f5a076e, f306baa1 addressing the rest of your review (and split out the drive-by as #1614 per your request). Summary by item:

8e5a4b07 — Align MINOR_NUMBER_PLUSZERO numbering with the split

c759815d — Document resetInt relaxation and char[] BigInteger parser

0f5a076e_checkHexNumbersAllowed helper for actionable feature-disabled error

  • New helper in JsonParserBase; called after 'x'/'X' is seen following a leading 0. If ALLOW_HEXADECIMAL_NUMBERS is disabled, throws via _reportUnexpectedChar with the feature name embedded:
    Unexpected character ('x') (code 120): hexadecimal number literals
    require enabling `JsonReadFeature.ALLOW_HEXADECIMAL_NUMBERS`
    
  • All eight call sites across the four parsers now route through the helper — no more duplicated isEnabled(ALLOW_HEXADECIMAL_NUMBERS) checks at parser-specific entry points.
  • hexRejectedWhenFeatureDisabled (sync + async) tightened to assert the feature-name message in addition to the original "Unexpected character ('x'" prefix.

f306baa1 — Validate hex digit length at every segment boundary

  • Per your comment about the 10M-digit case: validation now fires at every _textBuffer segment boundary in the hex-digit accumulation loop of all four parsers, not just at final resetIntHex. Default first segment is 500 chars, so with maxNumberLength=1000 a giant literal is caught long before it allocates.
  • Added hexNumberLengthConstraint test (8000-digit literal with maxNumberLength=100) exercising all sync modes; async path covered by the same primitive inside _finishHexDigits.

One item I want to flag rather than implement here: your suggestion to move hex decoding into TextBuffer itself (this comment) feels like it would meaningfully grow this PR — and since you've already noted the size concern, I'd prefer to spin it out into a follow-up PR after this lands. Happy to do it either way, just call out whether you'd like it here or separately.

Net effect after this round: +1003 / -22 across 18 files (was +923 / -4 across 16 files). The bump is roughly evenly split between the new helper + boundary test (+~80) and the same churn spread across the four parsers — but the cross-parser duplication is now down (one feature check, one boundary check primitive each).

Full test suite: 1703 pass / 2 pre-existing skips (was 1702 — added hexNumberLengthConstraint).

Process note: CLA will follow by email shortly; same signature covers both this PR and #1614.

seonwooj0810 and others added 9 commits May 27, 2026 13:43
…erals (FasterXML#707)

Implements JSON5 hexadecimal integer literals (0x/0X prefix, optional
sign, [0-9a-fA-F]+) across all four JSON parsers:

* ReaderBasedJsonParser - hex branch in _parseNumber2 (before
  _verifyNoLeadingZeroes, so ALLOW_LEADING_ZEROS_FOR_NUMBERS is not
  consulted on the hex path).
* UTF8StreamJsonParser - same pattern, byte-oriented.
* UTF8DataInputJsonParser - intercept in the existing 'x'/'X' branch
  inside _parseUnsignedNumber and _parseSignedNumber.
* NonBlockingUtf8JsonParserBase - new minor states
  MINOR_NUMBER_HEX_PREFIX and MINOR_NUMBER_HEX_DIGITS so suspension
  works at every byte boundary; helpers _startHexNumber(WithSign) and
  _finishHexDigits drive the state machine.

Shared decoding lives in JsonParserBase._parseHexInt: int (<=7 hex
digits), long (<=16 with top-nibble check), or BigInteger (eagerly
decoded; the lazy _numberString path expects base-10 and would
mis-read hex).

A new boolean _numberIsHex field plus resetIntHex helper on ParserBase
records that the buffered literal is hex; the textual representation
preserved in _textBuffer is the original literal (sign + 0x/0X +
digits), matching the precedent set by FasterXML#784 for the '+' sign.

Per pjfanning's review on FasterXML#707:
* JSON5 grammar is followed verbatim; leading zeros within hex digits
  are always allowed and do not consult ALLOW_LEADING_ZEROS_FOR_NUMBERS
  (see ALLOW_HEXADECIMAL_NUMBERS javadoc and the
  unsignedHexWithLeadingZeros test).
* The '+' variant additionally requires
  ALLOW_LEADING_PLUS_SIGN_FOR_NUMBERS, matching how '+' is handled for
  decimal numbers; this is exercised by
  plusSignHexRequiresLeadingPlusFeature.

Drive-by fix: while testing the async path I uncovered a pre-existing
inconsistency where the explicit '+' sign was lost on the
+0... resumption path because MINOR_NUMBER_ZERO was reused for both
"plain 0" and "+0" suspension. Added MINOR_NUMBER_PLUSZERO so the sign
survives byte-by-byte feeds; the sibling test
AsyncNonStandardNumberParsingTest.leadingPlusSignInDecimalEnabled2 was
asserting the buggy behavior (its peer leadingPlusSignInDecimalEnabled
already asserted the correct, sign-preserving result) and has been
updated to match the blocking parsers. Happy to split this into a
separate PR if preferred.

Tests:
* NonStandardHexNumbers707Test - 12 cases covering unsigned/uppercase/
  negative/+sign/long range/BigInteger range/in-array/feature-off
  rejection/empty-prefix rejection across ALL_MODES (Reader, UTF8
  stream, UTF8 throttled, DataInput).
* AsyncHexNumbers707Test - 10 cases driving the non-blocking parser
  with 1-byte and 3-byte feeds to exercise suspension/resumption at
  every position.

Full mvn verify: 1697 tests, 0 failures, 0 regressions.
…erXML#707)

Address review feedback on FasterXML#1613: parse the buffered hex digits as a
BigInteger via a new NumberInput.parseBigIntegerWithRadix(char[], ...)
overload that hands the slice straight to FastDoubleParser, eliminating
the intermediate String allocation.
Cover offset=0 with both full-array and truncated-length cases, plus
negative-sign inputs for the new parseBigIntegerWithRadix(char[], ...)
overload.
Verify both useFastParser branches throw NumberFormatException on
non-hex characters in the new parseBigIntegerWithRadix(char[], ...)
overload.
…asterXML#707)

Addresses two review comments on PR FasterXML#1613:

  - ParserBase.java:183 (cowtowncoder): hex literal handling is JSON-only,
    not shared by binary/text format backends that also extend ParserBase.
  - NonBlockingUtf8JsonParserBase.java:1810 (cowtowncoder): the per-parser
    _isHexDigit helper should live in a shared base for reuse.

Promote JSON-only hex state to JsonParserBase
---------------------------------------------
- _numberIsHex field and resetIntHex(...) move from ParserBase to
  JsonParserBase. ParserBase no longer references hex at all.
- ParserBase.resetInt loses `final` so JsonParserBase can override it to
  clear _numberIsHex. resetFloat/resetAsNaN no longer touch the flag:
  _numberIsHex is only read in the VALUE_NUMBER_INT branch of
  _parseNumericValue/_parseIntValue, both reached only via resetInt or
  resetIntHex (both of which now correctly initialize the flag), so
  leaving it stale after a float/NaN reset is benign.

Hoist duplicated per-parser helpers to JsonParserBase
-----------------------------------------------------
- _isHexDigit(int) was duplicated four times across ReaderBasedJsonParser,
  UTF8StreamJsonParser, UTF8DataInputJsonParser, and
  NonBlockingUtf8JsonParserBase. Replaced with a single protected static
  on JsonParserBase. Char-stream parser's existing `_isHexDigit(char)`
  call site keeps working via implicit char->int widening.
- "Hexadecimal number prefix '0X' must be followed by at least one hex
  digit (...)" error message was duplicated at the same four call sites.
  Extracted as protected static _hexPrefixNotFollowedMessage(char) on
  JsonParserBase.

Net effect: -76 / +79 lines, identical externally observable behavior.
Verified by full test suite (1702 pass, 2 pre-existing skips), with
particular attention to NonStandardHexNumbers707Test and
AsyncHexNumbers707Test.
)

Addresses cowtowncoder's follow-up suggestion on PR FasterXML#1613:

  UTF8DataInputJsonParser.java:1218 (cowtowncoder):
  "Or better yet, check out what CharTypes.java already has;
   maybe charToHex() would be useful (gives -1 for non-hex digit)"

CharTypes.charToHex(int) is a single masked array lookup that returns
the digit value (0..15) for valid hex characters and -1 otherwise. That
single primitive subsumes both helpers introduced for this feature:

  - _isHexDigit(int)  -> CharTypes.charToHex(c) >= 0
  - _hexDigit(char)   -> CharTypes.charToHex(c)

Both are removed; the four concrete parser scan loops and
JsonParserBase._parseHexInt now go through CharTypes.charToHex directly,
matching the pattern already used elsewhere in the JSON parsers for
\\uXXXX escape decoding.

Verified by full test suite (1702 pass, 2 pre-existing skips).
Two small review-feedback nits from FasterXML#1613:

- ParserBase.resetInt was made non-final earlier in this branch so that
  JsonParserBase can override it to clear the hex flag. Add a comment
  noting the prior `final` modifier and why it was relaxed.
- BigIntegerParser.parseWithFastParser(char[],int,int,int) is new in 3.2;
  add the `@since 3.2` Javadoc tag to match the existing convention.
…error

Per @cowtowncoder review on FasterXML#1613: when a number literal starts with `0x` /
`0X` but `JsonReadFeature.ALLOW_HEXADECIMAL_NUMBERS` is not enabled, the
user previously got a generic "unexpected character" error that did not
mention the feature to enable. Centralize the feature check in a single
`JsonParserBase._checkHexNumbersAllowed(int)` helper that throws a
`StreamReadException` naming the feature, so the diagnostic is actionable:

    Unexpected character ('x') (code 120): hexadecimal number literals
    require enabling `JsonReadFeature.ALLOW_HEXADECIMAL_NUMBERS`

All eight call sites across the four JSON parsers (Reader, UTF8 stream,
UTF8 DataInput, non-blocking UTF8) now go through the helper, eliminating
the duplicated `isEnabled(ALLOW_HEXADECIMAL_NUMBERS)` check at every
parser-specific entry point.

Existing feature-disabled tests are tightened to assert that the new error
message includes the feature name in addition to the original "Unexpected
character ('x'" prefix.
Per @cowtowncoder review on FasterXML#1613: previously the maximum-integer-length
constraint was applied only when `resetIntHex` was called -- i.e. after
the full hex literal had been buffered. A pathological input with millions
of digits would therefore allocate millions of chars before being rejected.

Move the validation into the hex digit accumulation loop so it fires at
every `_textBuffer` segment boundary (default first segment is 500 chars).
With the default `maxNumberLength=1000`, a 10-million-digit hex literal is
now caught at ~4 KiB rather than after fully buffering.

Add `hexNumberLengthConstraint` to NonStandardHexNumbers707Test, exercising
an 8000-digit literal with `maxNumberLength=100` against all sync modes;
each backend must throw `StreamConstraintsException` with the standard
"exceeds the maximum" message. The async path is exercised by the same
validation primitive in `_finishHexDigits`.
@seonwooj0810 seonwooj0810 force-pushed the feat/707-allow-hex-numbers branch from f306baa to d1bf5c2 Compare May 27, 2026 04:46
@seonwooj0810

Copy link
Copy Markdown
Contributor Author

Quick update batch on this PR:

  • CLA: signed and sent — should be on file shortly (if not already).
  • Rebased onto 3.x post-Preserve explicit "+" sign across buffer boundaries for "+0..." on async parser #1614 merge. The MINOR_NUMBER_PLUSZERO=33 ordinal coordination you flagged is now consistent with upstream (HEX_PREFIX=34, HEX_DIGITS=35); the renumber commit (8e5a4b07) auto-dropped during rebase as its patch was already upstream. Local mvn test is clean (1706 tests, 0 failures).
  • Review items from your 03:15–03:28 round:
    • ParserBase:401 (final comment), BigIntegerParser:40 (@since 3.2) — addressed in 7acfb5c
    • JsonParserBase:229 (buffer-boundary length validation) — addressed in d1bf5c2
    • ReaderBasedJsonParser:1589 (_checkHexNumbersAllowed helper) — addressed in 4e5945f
    • JsonParserBase:368 (decode in TextBuffer) — left a per-thread reply leaning toward keeping this scoped out and revisiting as a follow-up given the PR-size concern; happy to file a tracking issue if you'd prefer.

Ready for another look when you have time.

@cowtowncoder cowtowncoder added cla-received PR already covered by CLA (optional label) and removed cla-needed PR looks good (although may also require code review), but CLA needed from submitter labels May 27, 2026
@cowtowncoder

Copy link
Copy Markdown
Member

CLA received.

@cowtowncoder cowtowncoder left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM: planning to merge later tonight.

@github-actions

Copy link
Copy Markdown
Contributor

📈 Overall Code Coverage

Metric Coverage Change
Instructions coverage 83.01% 📈 +0.000%
Branches branches 75.74% 📈 +0.030%

Overall project coverage from JaCoCo test results. Change values compare against the latest base branch build.

@seonwooj0810

Copy link
Copy Markdown
Contributor Author

Thanks @cowtowncoder and @pjfanning for the thorough review. Moving the hex bookkeeping into JsonParserBase, reusing CharTypes.charToHex(), and the buffer-boundary length check all made the PR meaningfully better. Really appreciate the time you both put in.

@cowtowncoder

Copy link
Copy Markdown
Member

@seonwooj0810 Great collaboration, will get this merged tonight.

@cowtowncoder cowtowncoder merged commit 0978417 into FasterXML:3.x May 28, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-received PR already covered by CLA (optional label)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add JsonReadFeature.ALLOW_HEXADECIMAL_NUMBERS for JSON5-style hexadecimal integer literals

3 participants