-
Notifications
You must be signed in to change notification settings - Fork 4k
Closed
Description
Describe the enhancement requested
Parquet has added a new type for semi-structured data called Variant which is defined here:
- Variant encoding spec: https://github.com/apache/parquet-format/blob/main/VariantEncoding.md
- Variant shredding spec: https://github.com/apache/parquet-format/blob/main/VariantShredding.md
As it is common for engines to read data from Parquet into Arrow for in memory processing it is useful to have support for Variant in Arrow. @CurtHagenlocher proposes adding native Variant support in the Arrow format itself here:
An alternate approach is to add a Canonical Extension Type
@zeroshade wrote up a proposal
- Mailing List Discussion: https://lists.apache.org/thread/w06cxdojjcmry4m9vb0bo7owd1jsbtz5
- Google Document: https://docs.google.com/document/d/1pw0AWoMQY3SjD7R4LgbPvMjG_xSCtXp3rZHkVp9jpZ4/edit?usp=sharing
And implemented an implementation in Go
This ticket tracks the idea of adding Variant as an official extension type
See also @neilechao 's PR to add variant read support to parquet
Component(s)
Format