Skip to content

[C++][Parquet] Fix schema conversion from two-level encoding nested list #43994

@wgtmac

Description

@wgtmac

Describe the bug, including details regarding any error messages, version, and platform.

I have seen an issue when reading a Parquet file created by Hudi. There is a nesting list as below:

  optional group a (LIST) {
    repeated group array (LIST) {
      repeated int32 array;
    }
  }

The C++ parquet reader infers its schema as array<struct<array:array<int>>>. The root cause is here:

// We distinguish the special case that we have
//
// required/optional group name=whatever {
// repeated group name=array or $SOMETHING_tuple {
// required/optional TYPE item;
// }
// }

I think we need to regard them as a nesting two-level list, meaning that the correct interpretation is array<array<int>>.

Component(s)

C++, Parquet

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions