-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Labels
questionFurther information is requestedFurther information is requested
Description
Which part is this question about
A MapArray is represented in parquet as follows, see here.
<map-repetition> group <name> (MAP) {
repeated group key_value {
required <key-type> key;
<value-repetition> <value-type> value;
}
}
However, it is represented in arrow as
Field::new(
<name>,
DataType::Map(
Box::new(Field::new("entries", DataType::Struct(..), <nullable>))
<is_sorted>
)
<nullable>
)
The confusion lies in the fact we now have nullability at three levels:
- Within the MapArray
- Within the StructArray
- Within the key and value arrays
When the original parquet data only has nullability at two levels.
The writer logic appears to ignore the nullability of the inner StructArray, which doesn't feel right.
Describe your question
- Should the inner StructArray always be not nullable?
- Could we store
DataTypeinstead ofFieldinDataType::Map - Could we store the child
DataTypedirectly inDataType::Map
Additional context
#1642 also relates to the semantics of MapArrays
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested