-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Describe the bug, including details regarding any error messages, version, and platform.
I have seen an issue when reading a Parquet file created by Hudi. There is a nesting list as below:
optional group a (LIST) {
repeated group array (LIST) {
repeated int32 array;
}
}
The C++ parquet reader infers its schema as array<struct<array:array<int>>>. The root cause is here:
arrow/cpp/src/parquet/arrow/schema.cc
Lines 657 to 663 in 12dddfc
| // We distinguish the special case that we have | |
| // | |
| // required/optional group name=whatever { | |
| // repeated group name=array or $SOMETHING_tuple { | |
| // required/optional TYPE item; | |
| // } | |
| // } |
I think we need to regard them as a nesting two-level list, meaning that the correct interpretation is array<array<int>>.
Component(s)
C++, Parquet