Skip to content

GH-40133: [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length#40132

Merged
pitrou merged 1 commit intoapache:mainfrom
pitrou:minor-parquet-print-flba-length
Feb 19, 2024
Merged

GH-40133: [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length#40132
pitrou merged 1 commit intoapache:mainfrom
pitrou:minor-parquet-print-flba-length

Conversation

@pitrou
Copy link
Copy Markdown
Member

@pitrou pitrou commented Feb 19, 2024

In ParquetFilePrinter, when printing the type of the column, also print its byte width if the type is FIXED_LEN_BYTE_ARRAY.

Before:

Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY)
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY)
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))

After:

Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY(2) / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY(2) / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY(5))
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY(5))
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3))

In `ParquetFilePrinter`, when printing the type of the column, also print its byte width if the type is FIXED_LEN_BYTE_ARRAY.

Before:
```
Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY)
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY)
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY / Decimal(precision=7, scale=3) / DECIMAL(7,3))
```

After:
```
Column 0: float16_plain (FIXED_LEN_BYTE_ARRAY(2) / Float16)
Column 1: float16_byte_stream_split (FIXED_LEN_BYTE_ARRAY(2) / Float16)
Column 2: float_plain (FLOAT)
Column 3: float_byte_stream_split (FLOAT)
Column 4: double_plain (DOUBLE)
Column 5: double_byte_stream_split (DOUBLE)
Column 6: int32_plain (INT32)
Column 7: int32_byte_stream_split (INT32)
Column 8: int64_plain (INT64)
Column 9: int64_byte_stream_split (INT64)
Column 10: flba5_plain (FIXED_LEN_BYTE_ARRAY(5))
Column 11: flba5_byte_stream_split (FIXED_LEN_BYTE_ARRAY(5))
Column 12: decimal_plain (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3))
Column 13: decimal_byte_stream_split (FIXED_LEN_BYTE_ARRAY(4) / Decimal(precision=7, scale=3) / DECIMAL(7,3))
```
@pitrou pitrou requested a review from wgtmac as a code owner February 19, 2024 15:39
@pitrou pitrou changed the title MINOR: [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length GH-40133: [C++][Parquet][Tools] Print FIXED_LEN_BYTE_ARRAY length Feb 19, 2024
@pitrou pitrou requested a review from mapleFU February 19, 2024 15:41
@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #40133 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 19, 2024
@pitrou pitrou merged commit 2456258 into apache:main Feb 19, 2024
@pitrou pitrou removed the awaiting committer review Awaiting committer review label Feb 19, 2024
@pitrou pitrou deleted the minor-parquet-print-flba-length branch February 19, 2024 16:16
@mapleFU
Copy link
Copy Markdown
Member

mapleFU commented Feb 19, 2024

LGTM. having a length for FLBA would be nice.

@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit 2456258.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 3 possible false positives for unstable benchmarks that are known to sometimes produce them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[C++][Parquet][Tools] Print FLBA type width when printing column types

3 participants