Skip to content

Add support for DECIMAL logical type in fixed-length byte arrays when parsing Parquet files#6369

Merged
san81 merged 1 commit intoopensearch-project:mainfrom
divbok:main
Jan 15, 2026
Merged

Add support for DECIMAL logical type in fixed-length byte arrays when parsing Parquet files#6369
san81 merged 1 commit intoopensearch-project:mainfrom
divbok:main

Conversation

@divbok
Copy link
Copy Markdown
Contributor

@divbok divbok commented Dec 29, 2025

Description

Adds proper deserialization of DECIMAL logical types stored as GenericFixed in Parquet files. Converts fixed byte arrays to BigDecimal using the schema's scale, then serializes as numeric values. Includes fallback handling for non-decimal fixed arrays as JSON byte objects.

This resolves issues where mysql decimal columns with large precision were not handled.

Issues Resolved

Resolves #6339

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Copy link
Copy Markdown
Collaborator

@oeyh oeyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix! Were you able to verify the changes with export and stream data?


if (value instanceof Map) {
Object data = ((Map<?, ?>)value).get(BYTES_KEY);
if (data instanceof byte[]) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is data always of byte[] type? Do we need an else block here?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not encounter the Map use case. In my tests the values were an ArrayList type

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still have the else condition to log this case if it ever happens. Otherwise you will get the exception on line 139, which would make it seem that we didn't find the right type.

@divbok
Copy link
Copy Markdown
Contributor Author

divbok commented Jan 5, 2026

Thanks for the fix! Were you able to verify the changes with export and stream data?

Yes, I was able to verify the changes.

oeyh
oeyh previously approved these changes Jan 8, 2026
Arguments.of(MySQLDataType.BIT, "bit_col", Map.of("bytes", new byte[]{ 1, 2, 3, 4 }), new BigInteger("16909060")), // Direct BigInteger interprets the bytes in big-endian order.

//DECIMAL tests represented as byte arrays
Arguments.of(MySQLDataType.DECIMAL, "decimal_col", new ArrayList<>(List.of(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16)), new BigDecimal("1339673755198158349044581307228491536")),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be worth testing negative and fractional values as well, converted to BigDecimal?


private Number handleByteArray(final Object value) {
if (value instanceof byte[]) {
return new BigDecimal(new BigInteger((byte[]) value));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each of these conditions at some point calls this: new BigDecimal(new BigInteger(someBytes). Refactor this out into a method so that changes to one affect others.


if (value instanceof Map) {
Object data = ((Map<?, ?>)value).get(BYTES_KEY);
if (data instanceof byte[]) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still have the else condition to log this case if it ever happens. Otherwise you will get the exception on line 139, which would make it seem that we didn't find the right type.

@divbok
Copy link
Copy Markdown
Contributor Author

divbok commented Jan 14, 2026

Previous implementation failed to apply the correct decimal scale during deserialization, resulting in incorrect numeric values. The fixed length byte array when passed directly to MYSQL Numeric handler loses the scale information. This change extracts the scale from the Avro schema's Decimal logical type and applies it when converting fixed byte arrays to BigDecimal, ensuring accurate decimal representation in the output JSON when reading from the Parquet files.

@divbok divbok changed the title Handling mysql decimal data types with precision 19 or higher Add support for DECIMAL logical type in fixed-length byte arrays when parsing Parquet files Jan 14, 2026
Copy link
Copy Markdown
Collaborator

@oeyh oeyh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the change. With this change, what's the type that MySQL NumericHandler will get for Decimals?

byte[] tokenBytes = new byte[] { 34, 92, 13, 10, 9};
Schema tokenSchema = new Schema.Parser().parse(
"{ \"type\": \"record\", \"name\": \"MyRecord\", \"fields\": [" +
"{\"name\": \"token\", \"type\": {\"type\":\"fixed\",\"name\":\"TokenFixed\",\"size\":4}}" +
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Size should be 5?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will make this change.

if (decimalScale != null) {
BigInteger unscaledValue = new BigInteger(bytes);
BigDecimal decimal = new BigDecimal(unscaledValue, decimalScale);
buffer.append(decimal.doubleValue());
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Downstream Json parser may treat it as Double but I suggest we preserve the precision here with decimal.toPlainString() or decimal.toString()

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the change

@divbok
Copy link
Copy Markdown
Contributor Author

divbok commented Jan 14, 2026

Thanks for making the change. With this change, what's the type that MySQL NumericHandler will get for Decimals?

It will be Number

Signed-off-by: Divyansh Bokadia <dbokadia@amazon.com>
@san81 san81 merged commit c25f82f into opensearch-project:main Jan 15, 2026
49 of 52 checks passed
ashrao94 pushed a commit to ashrao94/data-prepper that referenced this pull request Jan 22, 2026
san81 pushed a commit to san81/data-prepper that referenced this pull request Jan 27, 2026
simonelbaz pushed a commit to simonelbaz/data-prepper that referenced this pull request Jan 31, 2026
…arch-project#6369)

Signed-off-by: Divyansh Bokadia <dbokadia@amazon.com>
Signed-off-by: Simon ELBAZ <elbazsimon9@gmail.com>
simonelbaz pushed a commit to simonelbaz/data-prepper that referenced this pull request Jan 31, 2026
simonelbaz pushed a commit to simonelbaz/data-prepper that referenced this pull request Jan 31, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG] Rds does not handle mysql decimal data types when precision value is 19 or higher

4 participants