-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Closed
Labels
Description
In Manifest list files according to standard Iceberg writers may store min/max values for each column in each data file: https://iceberg.apache.org/spec/#manifest-lists.
For example Spark store them by default, and this information can be very useful to speedup wide range of queries. This is how it looks like:
"lower_bounds": {
"array": [
{
"key": 1,
"value": "\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000"
},
{
"key": 2,
"value": "vasya"
},
{
"key": 3,
"value": "Ò\u0002<U+0096>I\u0000\u0000\u0000\u0000"
},
{
"key": 4,
"value": "½N\u0000\u0000"
},
{
"key": 5,
"value": "'B"
}
]
},
"upper_bounds": {
"array": [
{
"key": 1,
"value": "\u0001\u0000\u0000\u0000\u0000\u0000\u0000\u0000"
},
{
"key": 2,
"value": "vasya"
},
{
"key": 3,
"value": "Ò\u0002<U+0096>I\u0000\u0000\u0000\u0000"
},
{
"key": 4,
"value": "½N\u0000\u0000"
},
{
"key": 5,
"value": "'B"
}
]
},
Merge Tree MinMax indices can be reused to implement this feature https://github.com/ClickHouse/ClickHouse/blob/master/src/Storages/MergeTree/MergeTreeIndexMinMax.h.
Reactions are currently unavailable
