Skip to content

A new aggregate function, estimateCompressionRatio #70801

@alexey-milovidov

Description

@alexey-milovidov

Use case

To check how the data compresses without writing it.

Describe the solution you'd like

A parametric aggregate function

estimateCompressionRatio('codec', block_size_bytes)(column)

Both parameters are optional.

The function accepts a single column. It will serialize it using binary serialization and put it into a compressing buffer of the specified size on top of a Null destination buffer. So, the buffer is compressed, and the results are discarded, but the size is counted.

It returns the ratio between the uncompressed and compressed data (e.g., 2 means the data is compressed twice).

Describe alternatives you've considered

I thought we could allow multiple columns and use the Native format. However, it will require constructing temporary columns and using temporary blocks in memory before serialization.

Metadata

Metadata

Labels

featurewarmup taskThe task for new ClickHouse team members. Low risk, moderate complexity, no urgency.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions