-
Notifications
You must be signed in to change notification settings - Fork 8.3k
A new aggregate function, estimateCompressionRatio #70801
Description
Use case
To check how the data compresses without writing it.
Describe the solution you'd like
A parametric aggregate function
estimateCompressionRatio('codec', block_size_bytes)(column)
Both parameters are optional.
The function accepts a single column. It will serialize it using binary serialization and put it into a compressing buffer of the specified size on top of a Null destination buffer. So, the buffer is compressed, and the results are discarded, but the size is counted.
It returns the ratio between the uncompressed and compressed data (e.g., 2 means the data is compressed twice).
Describe alternatives you've considered
I thought we could allow multiple columns and use the Native format. However, it will require constructing temporary columns and using temporary blocks in memory before serialization.